Python Performance Tips You Must Know
Daniel Hayes
Full-Stack Engineer · Leapcell
##x Comprehensive Guide to Python Code Performance Optimization
Python, as a dynamically typed interpreted language, may indeed have a slower execution speed compared to statically typed compiled languages like C. However, through certain techniques and strategies, we can significantly enhance the performance of Python code.
This article will explore how to optimize Python code to make it run faster and more efficiently. We will utilize Python's timeit
module to accurately measure the execution time of the code.
Note: By default, the timeit
module repeats the execution of the code one million times to ensure the accuracy and stability of the measurement results.
def print_hi(name): print(f'Hi, {name}') if __name__ == '__main__': # Execute the print_hi('leapcell') method t = timeit.Timer(setup='from __main__ import print_hi', stmt='print_hi("leapcell")') t.timeit()
How to Calculate the Running Time of a Python Script
In the time
module, time.perf_counter()
provides a high - precision timer, which is suitable for measuring short - time intervals. For example:
import time # Record the start time of the program start_time = time.perf_counter() # Your code logic #... # Record the end time of the program end_time = time.perf_counter() # Calculate the running time of the program run_time = end_time - start_time print(f"Program running time: {run_time} seconds")
I. I/O - Intensive Operations
I/O - intensive operations (Input/Output Intensive Operation) refer to programs or tasks that spend most of their execution time waiting for input/output operations to complete. I/O operations include reading data from disk, writing data to disk, network communication, etc. These operations usually involve hardware devices, so their execution speed is limited by hardware performance and I/O bandwidth.
Their characteristics are as follows:
- Waiting Time: When a program executes an I/O operation, it often needs to wait for data to be transferred from an external device to memory or from memory to an external device, which may cause the program's execution to be blocked.
- CPU Utilization Efficiency: Due to the waiting time of I/O operations, the CPU may be idle during this period, resulting in low CPU utilization.
- Performance Bottleneck: The speed of I/O operations often becomes the bottleneck of program performance, especially when the data volume is large or the transmission speed is slow.
For example, using the I/O - intensive operation print
and running it one million times:
import time import timeit def print_hi(name): print(f'Hi, {name}') return if __name__ == '__main__': start_time = time.perf_counter() # Execute the print_hi('leapcell') method t = timeit.Timer(setup='from __main__ import print_hi', stmt='print_hi("leapcell")') t.timeit() end_time = time.perf_counter() run_time = end_time - start_time print(f"Program running time: {run_time} seconds")
The running result is 3s.
And when executing a method without using I/O operations, that is, calling the print_hi('xxxx')
empty method without using print()
, the program is significantly faster:
def print_hi(name): # print(f'Hi, {name}') return
Optimization Methods for I/O - Intensive Operations
If necessary in the code, such as file reading and writing, the following methods can be used to improve efficiency:
- Asynchronous I/O: Use an asynchronous programming model such as
asyncio
, which allows the program to continue executing other tasks while waiting for I/O operations to complete, thereby improving CPU utilization. - Buffering: Use a buffer to temporarily store data and reduce the frequency of I/O operations.
- Parallel Processing: Execute multiple I/O operations in parallel to improve the overall data processing speed.
- Optimize Data Structures: Select appropriate data structures to reduce the number of data reads and writes.
II. Using Generators to Generate Lists and Dictionaries
In Python 2.7 and subsequent versions, improvements have been introduced to list, dictionary, and set generators, making the construction process of data structures more concise and efficient.
1. Traditional Method
def fun1(): list=[] for i in range(100): list.append(i) if __name__ == '__main__': start_time = time.perf_counter() t = timeit.Timer(setup='from __main__ import fun1', stmt='fun1()') t.timeit() end_time = time.perf_counter() run_time = end_time - start_time print(f"Program running time: {run_time} seconds") # Output result: Program running time: 3.363 seconds
2. Optimizing the Code with Generators
Note: For the convenience of the following content, the code part of the main function main
is omitted.
def fun1(): list=[ i for i in range(100)] # Program running time: 2.094 seconds
As can be seen from the above derivation formula program, in addition to being more concise and easier to understand, it is also faster. This makes this method the preferred method for generating lists and loops.
III. Avoid String Concatenation, Use join()
join()
is a string method in Python used to concatenate (or splice) elements in a sequence into a string, usually using a specific delimiter. Its advantages are usually as follows:
- High Efficiency:
join()
is an efficient method for concatenating strings, especially when dealing with a large number of strings. It is usually faster than using the+
operator or%
formatting. When concatenating a large number of strings, thejoin()
method usually saves more memory than concatenating one by one. - Conciseness:
join()
makes the code more concise and avoids repeated string concatenation operations. - Flexibility: Any string can be specified as the delimiter, which provides great flexibility for string splicing.
- Wide Application: It can be used not only for strings but also for sequence types such as lists and tuples, as long as the elements can be converted into strings.
For example:
def fun1(): obj=['hello','this','is','leapcell','!'] s="" for i in obj: s+=i # Program running time: 0.35186 seconds
Using join()
to achieve string concatenation:
def fun1(): obj=['hello','this','is','leapcell','!'] "".join(obj) # Program running time: 0.1822 seconds
Using join()
reduces the execution time of the function from 0.35 seconds to 0.18 seconds.
IV. Use Map
Instead of Loops
In most scenarios, the traditional for
loop can be replaced by the more efficient map()
function. map()
is a built - in higher - order function in Python that can apply a specified function to various iterable data structures such as lists, tuples, or strings. The main advantage of using map()
is that it provides a more concise and efficient way of data processing, avoiding the writing of explicit loop code.
Traditional Loop Method
def fun1(): arr=["hello", "this", "is", "leapcell", "!"] new = [] for i in arr: new.append(i) # Program running time: 0.3067 seconds
Using the map()
Function to Do the Same Function
def fun2(x): return x def fun1(): arr=["hello", "this", "is", "leapcell", "!"] map(fun2,arr) # Program running time: 0.1875 seconds
After comparison, using map()
saves nearly half of the time and greatly improves the running efficiency.
V. Choose the Right Data Structure
Choosing the appropriate data structure is crucial for improving the execution efficiency of Python code. Various data structures are optimized for specific operations. A reasonable choice can accelerate the retrieval, addition, and removal of data, thereby enhancing the overall operation efficiency of the program.
For example, when judging elements in a container, the lookup efficiency of a dictionary is higher than that of a list, but this is in the case of a large amount of data. The opposite is true for a small amount of data.
Testing with a Small Amount of Data
def fun1(): arr=["hello", "this", "is", "leapcell", "!"] 'hello' in arr 'my' in arr # Program running time: 0.1127 seconds def fun1(): arr={"hello", "this", "is", "leapcell", "!"} 'hello' in arr 'my' in arr # Program running time: 0.1702 seconds
Using numpy
to Randomly Generate 100 Integers
import numpy as np def fun1(): nums = {i for i in np.random.randint(100, size=100)} 1 in nums # Program running time: 14.28 seconds def fun1(): nums = {i for i in np.random.randint(100, size=100)} 1 in nums # Program running time: 13.53 seconds
It can be seen that in the case of a small amount of data, the execution efficiency of list
is higher than that of dict
, but in the case of a large amount of data, the efficiency of dict
is higher than that of list
.
If there are frequent addition and deletion operations and the number of elements added and deleted is large, the efficiency of list
is not high. At this time, collections.deque
should be considered. collections.deque
is a double - ended queue that has the characteristics of both a stack and a queue and can perform insertion and deletion operations with a complexity of $O(1)$ at both ends.
Usage of collections.deque
from collections import deque def fun1(): arr=deque()# Create an empty deque for i in range(1000000): arr.append(i) # Program running time: 0.0558 seconds def fun1(): arr=[] for i in range(1000000): arr.append(i) # Program running time: 0.06077 seconds
The lookup operation of list
is also very time - consuming. When it is necessary to frequently look up certain elements in a list
or access these elements in an ordered manner, bisect
can be used to maintain the order of the list
object and perform binary search in it to improve the lookup efficiency.
VI. Avoid Unnecessary Function Calls
In Python programming, optimizing the number of function calls is crucial for improving code efficiency. Excessive function calls not only increase overhead but may also consume additional memory, thus slowing down the running speed of the program. To improve performance, we should try to reduce unnecessary function calls and attempt to combine multiple operations into one, thereby reducing execution time and resource consumption. Such optimization strategies help us write more efficient and faster code.
VII. Avoid Unnecessary import
Although Python's import
statement is relatively fast, each import
involves finding the module, executing the module code (if it has not been executed yet), and putting the module object into the current namespace. These operations all require a certain amount of time and memory. When you import modules unnecessarily, you will increase these overheads.
VIII. Avoid Using Global Variables
import math size=10000 def fun1(): for i in range(size): for j in range(size): z = math.sqrt(i) + math.sqrt(j) # Program running time: 15.6336 seconds
Many programmers initially write some simple scripts in the Python language. When writing scripts, they are usually accustomed to writing them as global variables directly, such as the code above. However, due to the different implementation methods of global variables and local variables, the code defined in the global scope runs much slower than that defined in a function. By putting the script statements into a function, a speed increase of usually 15% - 30% can be achieved.
import math def fun1(): size = 10000 for i in range(size): for j in range(size): z = math.sqrt(i) + math.sqrt(j) # Program running time: 14.9319 seconds
IX. Avoid Module and Function Attribute Access
import math # Not recommended def fun2(size: int): result = [] for i in range(size): result.append(math.sqrt(i)) return result def fun1(): size = 10000 for _ in range(size): result = fun2(size) # Program running time: 10.1597 seconds
Each time the .
(attribute access operator) is used, specific methods such as __getattribute__()
and __getattr__()
are triggered. These methods perform dictionary operations, so they bring additional time overhead. By using the from import
statement, attribute access can be eliminated.
from math import sqrt # Recommended: Import only the modules you need def fun2(size: int): result = [] for i in range(size): result.append(sqrt(i)) return result def fun1(): size = 10000 for _ in range(size): result = fun2(size) # Program running time: 8.9682 seconds
X. Reduce Calculations in Inner for
Loops
import math def fun1(): size = 10000 sqrt = math.sqrt for x in range(size): for y in range(size): z = sqrt(x) + sqrt(y) # Program running time: 14.2634 seconds
In the above code, sqrt(x)
is located in the inner for
loop and will be recalculated every time the loop runs, adding unnecessary time overhead.
import math def fun1(): size = 10000 sqrt = math.sqrt for x in range(size): sqrt_x=sqrt(x) for y in range(size): z = sqrt_x + sqrt(y) # Program running time: 8.4077 seconds
Leapcell: The Best Serverless Platform for Python app Hosting
Finally, I would like to introduce the best platform for deploying Python applications: Leapcell
1. Multi - Language Support
- Develop with JavaScript, Python, Go, or Rust.
2. Deploy unlimited projects for free
- Pay only for usage — no requests, no charges.
3. Unbeatable Cost Efficiency
- Pay - as - you - go with no idle charges.
- Example: $25 supports 6.94M requests at a 60ms average response time.
4. Streamlined Developer Experience
- Intuitive UI for effortless setup.
- Fully automated CI/CD pipelines and GitOps integration.
- Real - time metrics and logging for actionable insights.
5. Effortless Scalability and High Performance
- Auto - scaling to handle high concurrency with ease.
- Zero operational overhead — just focus on building.
Explore more in the documentation!
Leapcell Twitter: https://x.com/LeapcellHQ