According to the Popularity of Programming Language Index, the powerful but simple object-oriented programming language known as Python is now the...
Python's `concurrent.futures` For Beginner-Friendly Concurrent Programming
Python’s `concurrent.futures` package provides a straightforward and user-friendly way to speed up code without in-depth knowledge of concurrent programming. In this article, we will explore how to use `concurrent.futures` to improve the efficiency of our code and help you decide if `concurrent.futures` is the solution for you.
Your Python project is running frustratingly slow, leaving you searching for ways to improve its performance. You’ve come across concurrent programming as a solution but it appears daunting, as a beginner to the concept. Luckily, Python’s `concurrent.futures` package provides a straightforward and user-friendly way to speed up code without in-depth knowledge of concurrent programming. In this article, we will explore how to use `concurrent.futures` to improve the efficiency of our code and help you decide if `concurrent.futures` is the solution for you.
Concurrent programming refers to the technique that allows multiple tasks to be executed simultaneously using tools such as threads and processes. Through concurrent programming we can significantly increase the efficiency of a Python project and see our speed issues disappear! Fortunately, with the introduction of Python’s `concurrent.futures` package in version 3.2, creating, managing, and synchronizing threads and processes has never been easier. This Python package provides a high-level abstraction on top of the standard Threading and Multiprocessing modules, enabling developers to write concurrent code with minimal effort. In this article we’ll explore several examples of how to utilize `concurrent.futures` to speed up your project.
The very first question that comes to my mind when determining whether to use `concurrent.futures` or not is whether the project at hand is running slowly in an area where the same task is run many times. For example, let’s look at a project that relies on a number of third-party APIs. In order to ensure that the data is correct, your program may send a request to an endpoint on each API periodically to ensure that each API is connected and working in harmony with your project. A very simple example of this in code could look like this.
In this example, we have two functions. The first, “fetch_one_url” takes a URL argument, sends a request to the given endpoint, and returns the status of that endpoint. In the second, “fetch_all_url” we cycle through the list of endpoints, calling the first function each time in order to get the status of each API in our list. Now, we only have 3 example APIs listed here so I doubt this would take very long to execute in a real-world situation. Consider, however, that we have hundreds or even thousands of APIs that we are communicating with and that it takes a relatively long time to retrieve the status of each. Here’s where we may begin to see a decline in performance. Luckily for us, this is a perfect scenario in which to implement the `concurrent.futures` package! Let’s see how this same example looks using `concurrent.futures`.
This example looks very similar to the one above. The first function, “fetch_one_url” remains the same. It makes a request to the endpoint that it is given and returns the status. If we take a close look at the second function this time around, we can see that we are now utilizing the ThreadPoolExecutor from `concurrent.futures`. This opens multiple threads to run the “fetch_one_url” task simultaneously so that our program is not blocked by each individual request and can, instead, run multiple tasks at the same time.
Before we look into this example much closer I would like to address the second question that I ask myself when implementing `concurrent.futures`. Once we’ve determined that our program is, in fact, running the same task over and over, causing speed issues, we need to determine what kind of task it is that we are running over and over. In this case, our task is to make a request to a third-party API. This is a common example of an I/O-bound task. I/O bound tasks include reading or writing to a file, waiting for network communication, or interactions with a database. As was shown in the above example, `concurrent.futures` utilizes the ThreadPoolExecutor to speed up I/O bound tasks.
The other type of task that `concurrent.futures` can assist with are CPU-bound tasks. CPU-bound tasks include tasks that require a significant amount of computational processing power and wait on the execution of their instructions rather than waiting for input/output operations to complete. A simple and common example of a CPU-bound task that can be sped up with `concurrent.futures` would be checking whether each number in a list is a prime number. In this case, we would substitute the ThreadPoolExecutor for a ProcessPoolExecutor.
As you can see, this example is very similar to the second example in which we used a ThreadPoolExecutor. It takes a list of numbers and checks whether each one is prime or not concurrently so as not to block our program more than is necessary. While both examples behave in a similar manner they illustrate how `concurrent.futures` can help us easily write concurrent code. Without a deep background of understanding of how threads and processes work, we can simply ask what kind of task it is that we are running over and over, I/O or CPU bound, and easily apply the correct type of executor.
The final aspect of `concurrent.futures` that I would like to touch on is the use of the map and submit methods. So far in this article, I have been using the map method to iterate over URLs and numbers, applying a function to each. The map method takes an iterable or arguments and applies a function to each concurrently. The results are returned in the same order and inside of a list. This is generally fine, however, when using the map method, if an exception occurs it is not possible to handle the exception and continue on to the rest of the values in the iterable. This is where the submit method comes in handy. Submit takes a single function and its arguments and schedules the task to run in the future. It also returns a Future object that can be used to check its status or retrieve its result later.
In this example, we can see that with "submit" we can potentially have more control over our tasks. We can individually handle exceptions, run other code while our tasks are running, and handle the results individually as well. Finally, with "submit," we are able to handle results as we get them. When using the map method we must wait for all of our results before handling them.
In summary, `concurrent.futures` can be a powerful tool to speed up your Python projects by performing multiple tasks simultaneously. Being a powerful tool, this package offers more customization than was shown in these examples. However, I believe it’s important to start with the basics. Get used to analyzing the type of task being run and applying the correct type of executor. Try utilizing both the map and submit methods and see which one works best for different scenarios. Once this is all comfortable and familiar feel free to begin to explore deeper concepts such as determining how many works should be run or the difference between threads and processes. Hopefully, these examples and brief overview will help get you off the ground and running with concurrent programming and enhance your Python project!