Python ships with a
multiprocessing module that allows your code to run functions in parallel by offloading calls to available processors.
In this guide, we will explore the concept of Pools and what a
A Python snippet to play with
Let’s take the following code.
This function will take about 5*5seconds to complete (25seconds?)
We loop through 5 times and call a function that calculates something for us. We use
time.sleep to pretend like the function is doing more work than it is. This gives us a good reason to look into doing things in parallel.
Multiprocessing is pretty simple. Do all the above, but instead of doing all the operations on a single process, rather hand off each one to somewhere that can do it simultaneously.
Now they will all run in parallel, the whole thing will complete in around 5seconds.
But what if you had 1000 items in your loop? ..and only 4 processors on your machine?
This is where Pools shine.
Multiprocessing was easy, but Pools is even easier!
Let’s convert the above code to use pools:
So what’s actually happening here?
We create a
multiprocessing.Pool() and tell it to use 1 less CPU than we have. The reason for this is to not lock up the machine for other tasks.
So let’s say we have 8 CPUs in total, this means the pool will allocate 7 to be used and it will run the tasks with a max of 7 at a time. The first CPU to complete will take the next task from the queue, and so it will continue until all 1000 tasks have been completed.
Note that: if you only have 2 processors, then you might want to remove the
-1 from the
multiprocessing.cpu_count()-1. Otherwise, it will only do things on a single CPU!