Multiprocessing vs. Multithreading

Posted on Thu 31 August 2023 in SWE

Multiprocessing vs Multithreading

Python is like that charming friend who always makes you smile with its simplicity and readability.
But what happens when you need to deal with multiple tasks at once?
Well, you're in for a treat because Python has some concurrency tricks up its sleeve!
In this blog post, we're going to dive headfirst into the wild world of Python concurrency, while keeping an eye on the notorious GIL!

Let's define some terms first:

Thread - A thread is a virtual version of a CPU core. Each CPU core can have up to two threads if your CPU has multi/hyper-threading enabled.
Process - An instance of a computer program that is being executed by one or many threads.
Multithreading - The ability of a CPU to provide multiple threads of execution concurrently, supported by the operating system.
Multiprocessing - Ability of a system to support more than one processor or the ability to allocate tasks between them.
Global Interpreter Lock (GIL) - is a type of process lock which is used by python whenever it deals with processes. Generally, Python only uses one thread to execute the set of written statements. This means that in python only one thread will be executed at a time. By only allowing a single thread to be used every time we run a Python process, this ensures that only one thread can access a particular resource at a time.

Choosing the Right Concurrency Model

Concurrency is a crucial aspect of modern software development, enabling programs to perform multiple tasks simultaneously and efficiently. Python provides two primary ways to achieve concurrency: multiprocessing and multithreading.
Python also has asyncio framework which provides a way to write asynchronous, non-blocking code, but due to the simplicity of the article, we will skip it for now. Asyncio will be covered in a separate article.
In this blog post, we'll explore the differences between multiprocessing and multithreading and help you make an informed decision on which one to use for your specific use case.

Understanding Concurrency

Before diving into multiprocessing and multithreading, it's essential to understand the concept of concurrency.
Concurrency allows multiple tasks to run in overlapping time periods, potentially making the program more responsive and efficient. It's particularly valuable in scenarios where you want to take advantage of multiple CPU cores or perform I/O-bound operations without blocking the main program's execution.
Concurrency should not be confused or mixed with parallelism.
Parallelism is about multiple cpu's running different tasks exactly at the same time.
Real life analogy of concurrency and parallelism: Concurrency is having only one hand. Start doing something with your hand, stop, do another thing, stop, go back to the first thing. So you are sharing one hand among different activities. Parallelism is having both of your hands. Start brushing your teeth with the right hand, style your hair with left hand.

With that in mind, let's see how Python implements concurrency.

Multiprocessing

Multiprocessing is a concurrency model that involves running multiple processes, each with its own memory space. These processes can run independently on separate CPU cores, making it an excellent choice for CPU-bound tasks that can be parallelized.
Let's take a look at a simple example of multiprocessing in Python:

# CPU-bound function
def sum_square(custom_range: tuple) -> int:
    final = 0
    for i in range(custom_range[0], custom_range[1]):
        final += i * i
    return final

@timing
def sum_square_serial():
    return sum_square((0, 400_000_000))

First, we define a CPU-bound function sum_square that calculates the sum of squares of all numbers in a given range. Function takes a tuple as an argument, which represents the range of numbers to be summed. After that, we define a function sum_square_serial which calls sum_square function with a range of numbers from 0 to 400 million. This function is decorated with timing decorator, which is used to measure the execution time of the function. Don't worry about the timing decorator, it's just a simple decorator that prints the execution time of the function. You will find the full code at the end of the article. As name suggests, sum_square_serial function will execute sum_square function in a serial manner, meaning that it will execute the function on a single CPU core.
Let's see how long it takes to execute sum_square_serial function.

if __name__ == "__main__":
    result_serial = sum_square_serial()

Result of the execution on my machine: Finished in 16.03 seconds. Pretty slow, right?
Well, let's see how we can speed up the execution of this function by using multiprocessing.

@timing
def sum_square_multiprocessing():
    with ProcessPoolExecutor() as pp_executor:
        mp_results = pp_executor.map(sum_square,
                                     [(0, 100_000_000),
                                      (100_000_000, 200_000_000),
                                      (200_000_000, 300_000_000),
                                      (300_000_000, 400_000_000)])
    return sum(mp_results)

First, we define a function sum_square_multiprocessing which is also decorated with timing decorator. This function will execute sum_square function in a parallel manner, meaning that it will execute the function on multiple CPU cores. We use ProcessPoolExecutor from concurrent.futures module to create a pool of processes. After that, we use map function to map sum_square function to a list of tuples, where each tuple represents a range of numbers to be summed. Finally, we sum the results of the map function and return the final result.
Now let's see how long it takes to execute sum_square_multiprocessing function.

if __name__ == "__main__":
    result_multiprocessing = sum_square_multiprocessing()

Result of the execution on my machine: Finished in 4.01 seconds. Voilà, that's almost 4 times faster than the serial execution! Cool, right? Well not so fast, there is a catch. Don't forget that each process has its own memory space, which means that we need to copy the data to each process. This can be a problem if you are dealing with large amounts of data. So be careful when using multiprocessing, it can be a double-edged sword. But don't afraid of using it. Software engineering is all about trade-offs. You get some, you lose some. The better you understand the trade-offs, the better software engineer you are.

Multithreading

Multithreading, on the other hand, involves running multiple threads within a single process. Each thread shares the same memory space and can run concurrently, making it suitable for I/O-bound tasks and scenarios where you need to maintain shared state between threads.

So let's take a look at a simple example of multithreading in Python:

def download_image(img_url: str, save_loc: Path, image_name):
    img_url = img_url.replace("\n", "")
    img_bytes = requests.get(img_url).content
    save_loc.mkdir(parents=True, exist_ok=True)
    with open(save_loc / image_name, "wb") as img_file:
        img_file.write(img_bytes)

First, we define a function download_image which takes an image URL, save location, and image name as arguments. This function downloads the image from the given URL and saves it to the specified location. That's it, pretty simple.
Now let's see how long it takes to download 99 images in a serial manner.

@timing
def download_images_serial():
    for i, img_url in enumerate(IMAGE_URLS):
        download_image(img_url, IMAGE_FOLDER / "serial", f"image_{i}.jpg")

if __name__ == "__main__":
    download_images_serial()

We created download_images_serial function, so we can decorate it with timing decorator and measure the execution time. Function iterates over the list of image URLs and calls download_image function for each image URL. Nothing fancy here. Result of the execution on my machine: Finished in 9.91 seconds. Not bad, but we can do better.
Let's see how we can speed up the execution of this function by using multithreading.

@timing
def download_images_multithreading():
    with ThreadPoolExecutor() as executor:
        for i, url in enumerate(IMAGE_URLS):
            executor.submit(download_image, url, IMAGE_FOLDER / "mt", f"image_{i}.jpg")

In the example above, we created download_images_multithreading function, and used ThreadPoolExecutor from concurrent.futures module to create a pool of threads. Here we also iterate over the list of image URLs and call download_image function for each image URL, but this time we used submit function to submit the function to the executor. Pretty simple. Now let's see how long it takes to download 99 images using multithreading.
Result of the execution on my machine: Finished in 0.84 seconds. Wow, that's almost 12 times faster than the serial execution! We did it, we improved the performance of our program by using multithreading.

Key points to consider when using multithreading in Python

Concurrent Execution: Multithreading enables concurrent execution of tasks within a single process, which can be beneficial for I/O-bound operations where threads may spend time waiting for external resources.
Shared Memory: Threads share memory, which can lead to data races and race conditions if not synchronized properly.
GIL Limitations: The GIL can restrict true parallelism in multithreading, especially for CPU-bound tasks. This means that Python threads may not fully utilize multiple CPU cores.
Simplified Communication: Threads can communicate easily through shared data structures like lists, queues, and variables.
Lower Overhead: Creating and managing threads typically has lower overhead compared to processes since they share the same Python interpreter.
Best for I/O-bound Tasks: Multithreading is well-suited for tasks that spend a significant amount of time waiting for external resources, such as reading/writing files, network operations, or database queries.

Few more words about Python Concurrency

Python is NOT a single-threaded language.
Python processes typically use a single thread because of the GIL.
Despite the GIL, libraries that perform computationally heavy tasks like numpy, scipy and pytorch utilise C-based implementations under the hood, allowing the use of multiple cores.
Python multiprocessing uses pickle to serialise/deserialize objects when passing them between processes, requiring each process to create its own copy of the data, adding substantial memory usage.
If we want some functionality to be executed by multiple threads, and that functionality is not read-only (there are some global values which threads sharing), then it is necessary to introduce a mechanism/logic that will solve the problem of changing these values at the same time. We need to introduce some kind of locking mechanism that does not allow other threads to run until the previous thread is completed.

In the full code example, you will find a few more examples of multithreading and multiprocessing in Python.
You will also find the results of the execution on my machine, so you can compare them with your results.

Full code and results of the execution can be found here.

I hope you find this article useful.
If you would like to get more content like this directly to your inbox, consider subscribing to my newsletter.

Happy coding, and may your code always run concurrently! 🚀🐍