When Your Python Code Is Much Faster With PyPy


Python is a very powerful language, there are so many libraries available for it.

However, many developers will complain about its speed by comparison to certain other languages, for example, C or C++.

This is because Python is an interpreted language by design, as opposed to being compiled. Meaning that each time it executes, the instructions need to be converted right there on the spot, making for slower overall execution times.

There are ways to make it faster, for example, the Pypy project which uses a Just-in-Time (JIT) compiler which runs standard python applications much faster than simply using Python on its own. For the most part, Pypy is somewhat of a miracle drop-in replacement, but there are times when it’s not actually faster. In this writeup, I aim to introduce Pypy and show some areas where it excels, but also where it has very little benefit.

An introduction to Pypy

"PyPy is a fast, compliant alternative implementation of the Python language."

Pypy.org

As per the Pypy website:

It is sold as having several advantages and distinct features:

  • Speed: thanks to its Just-in-Time compiler, Python programs often run faster on PyPy.
  • Memory usage: memory-hungry Python programs (several hundreds of MBs or more) might end up taking less space than they do in CPython.
  • Compatibility: PyPy is highly compatible with existing python code. It supports cffi and can run popular python libraries like twisted and django.
  • Stackless: PyPy comes by default with support for stackless mode, providing micro-threads for massive concurrency.
  • As well as other features.

Over the years, I have heard many great things about this project, and have used it here and there. Even the creator of Python seems to praise it:

"If you want your code to run faster, you should probably just use PyPy.?

Guido van Rossum (creator of Python)

A sample python benchmark script

In order to run some tests, let’s first get a standard python script we can use to test with. To save ourselves a couple of minutes, I grabbed one from StackOverflow.

def test():
    """Stupid test function"""
    lst = []
    for i in range(100): lst.append(i)

if __name__ == '__main__':
    import timeit
    print(timeit.timeit("test()", setup="from __main__ import test"))

What this does, is time how long it takes to append a hundred integers to a list. Simple enough.

In order to not mess with our wider Python environment, we will run all our tests in a newly created python virtual environment.

Opening a terminal, we can run the following bash which will create a place for us to run our experiments from, and go in there:

mkdir -p ~/src/tests/pypbenching
cd $_

Now we can create a python virtual environment and activate it.

virtualenv -p python3 venv
. venv/bin/activate

At this stage, we place the python benchmarking code from above into a file called test1.py. We can see it is in there if we cat it:

$ cat test1.py

def test():
    """Stupid test function"""
    lst = []
    for i in range(100): lst.append(i)

if __name__ == '__main__':
    import timeit
    print(timeit.timeit("test()", setup="from __main__ import test"))

Now run it with standard Python3 to see how it performs.

python test1.py

On my machine, I got the following output:

$ python test1.py

6.288925628

Let’s automatically do this 3 times to make sure we are getting a fair assessment:

for i in {1..3}; do python test1.py; done

Once again, on my machine this yielded the following output:

$ for i in {1..3}; do python test1.py; done

7.296439644
6.893949936
7.1336815289999995

So now we know what to beat!

As I’m on a Mac, let’s install pypy3 using Homebrew. We install pypy3 as opposed to pypy because we are running python3.

If we used pypy it would only be compatible for Python2 and we don’t want that.

brew install pypy3

You can also install Pypy on Windows, Linux and other systems, for more on this, read more on the Pypy downloads site.

Running the benchmark on Python

Now that we are all setup, let’s run our Python benchmark again:

$ python test1.py

6.534598418

Now run it 3 times for consistency:

$ for i in {1..3}; do python test1.py; done

6.984767166
7.322036358
6.84931141

Running the benchmark on Pypy

Now that we know how Python performs, let’s give Pypy3 a try with the exact same tests:

pypy3 test1.py

0.36386730521917343

That’s incredibly fast! Let’s run it 3 times as we did with Python.

for i in {1..3}; do pypy3 test1.py; done

0.47344279661774635
0.5113503690809011
0.4751729490235448

Pretty amazing if you ask me!

Complicating matters a bit

So we have discovered that Pypy is pretty fast for simple test comparisons, but what about comparing something else, like some regular looping and global counts?

Use the below code and place it in a file called test2.py:

i = 0

def run():
  global i
  i += 1
  print(f"hello {i}")

for _ in range(0, 1000):
  run()

This time around we will time it using the CLI’s time command. Let’s try with Pypy first this time!

$ time pypy3 test2.py
hello 1
hello 2
hello 3
hello 4
hello 5
...
hello 996
hello 997
hello 998
hello 999
hello 1000
pypy3 test2.py  0.10s user 0.03s system 97% cpu 0.137 total
$ time python test2.py
hello 1
hello 2
hello 3
hello 4
hello 5
...
hello 993
hello 994
hello 998
hello 999
hello 1000
python test2.py  0.02s user 0.01s system 90% cpu 0.029 total

Let’s change things around a little and try again; put the following code in a file called test3.py.

i = 0

def run():
  global i
  i *= 1

for _ in range(0, 10000000):
  run()
$ time python test3.py

python test3.py  1.46s user 0.01s system 99% cpu 1.491 total
$ time pypy3 test3.py

pypy3 test3.py  0.10s user 0.03s system 99% cpu 0.128 total

Let’s try a best of 10 on both cases to see how that runs:

$ for i in {1..10}; do time python test3.py; done

python test3.py  1.45s user 0.01s system 99% cpu 1.474 total
python test3.py  1.44s user 0.01s system 99% cpu 1.459 total
python test3.py  1.42s user 0.01s system 99% cpu 1.447 total
python test3.py  1.41s user 0.01s system 99% cpu 1.435 total
python test3.py  1.36s user 0.01s system 99% cpu 1.377 total
python test3.py  1.47s user 0.01s system 99% cpu 1.497 total
python test3.py  1.48s user 0.01s system 99% cpu 1.495 total
python test3.py  1.56s user 0.01s system 99% cpu 1.581 total
python test3.py  1.42s user 0.01s system 99% cpu 1.436 total
python test3.py  1.43s user 0.01s system 99% cpu 1.450 total
$ for i in {1..10}; do time pypy3 test3.py; done

pypy3 test3.py  0.10s user 0.04s system 98% cpu 0.141 total
pypy3 test3.py  0.08s user 0.03s system 99% cpu 0.103 total
pypy3 test3.py  0.08s user 0.03s system 100% cpu 0.105 total
pypy3 test3.py  0.08s user 0.02s system 98% cpu 0.104 total
pypy3 test3.py  0.08s user 0.03s system 100% cpu 0.104 total
pypy3 test3.py  0.08s user 0.03s system 98% cpu 0.105 total
pypy3 test3.py  0.10s user 0.03s system 98% cpu 0.127 total
pypy3 test3.py  0.08s user 0.03s system 98% cpu 0.107 total
pypy3 test3.py  0.10s user 0.03s system 99% cpu 0.128 total
pypy3 test3.py  0.09s user 0.03s system 98% cpu 0.118 total

We can clearly see that Pypy3 knocked the socks off of Python3 once again, consistently.

Bonus tests with Multiprocessing

Let’s have a go with the following Multiprocessing code; place it in a file called multi.py:

import multiprocessing

def runner(k):
  lst = []
  for i in range(0, 10000): lst.append(i)
  print(k)

processes = []
for i in range(10):
  p = multiprocessing.Process(target=runner, args=(i,))
  processes.append(p)
  p.start()

for j in range(len(processes)):
  processes[j].join()

Running regular good old Python:

$ time python multi.py

0
1
2
3
4
5
6
7
8
9
python multi.py  0.06s user 0.04s system 143% cpu 0.068 total

Now the same test with Pypy:

$ time pypy3 multi.py

0
1
2
3
4
5
6
7
8
9
pypy3 multi.py  0.15s user 0.09s system 152% cpu 0.154 total

It’s almost 3 times slower! Let’s comment out the print method and run it 10 times each.

import multiprocessing

def runner(k):
  lst = []
  for i in range(0, 10000): lst.append(i)
  #print(k)

processes = []
for i in range(10):
  p = multiprocessing.Process(target=runner, args=(i,))
  processes.append(p)
  p.start()

for j in range(len(processes)):
  processes[j].join()

First we run Python:

$ for i in {1..10}; do time python multi.py; done

python multi.py  0.06s user 0.04s system 144% cpu 0.069 total
python multi.py  0.06s user 0.04s system 146% cpu 0.066 total
python multi.py  0.06s user 0.03s system 143% cpu 0.063 total
python multi.py  0.05s user 0.03s system 140% cpu 0.061 total
python multi.py  0.06s user 0.03s system 143% cpu 0.063 total
python multi.py  0.06s user 0.03s system 143% cpu 0.063 total
python multi.py  0.06s user 0.03s system 142% cpu 0.062 total
python multi.py  0.05s user 0.03s system 143% cpu 0.057 total
python multi.py  0.06s user 0.04s system 155% cpu 0.066 total
python multi.py  0.06s user 0.04s system 144% cpu 0.065 total

Then Pypy:

$ for i in {1..10}; do time pypy3 multi.py; done

pypy3 multi.py  0.14s user 0.09s system 148% cpu 0.155 total
pypy3 multi.py  0.14s user 0.08s system 149% cpu 0.146 total
pypy3 multi.py  0.14s user 0.08s system 149% cpu 0.151 total
pypy3 multi.py  0.14s user 0.08s system 146% cpu 0.153 total
pypy3 multi.py  0.14s user 0.08s system 151% cpu 0.145 total
pypy3 multi.py  0.15s user 0.09s system 151% cpu 0.162 total
pypy3 multi.py  0.15s user 0.10s system 159% cpu 0.157 total
pypy3 multi.py  0.14s user 0.09s system 151% cpu 0.151 total
pypy3 multi.py  0.15s user 0.10s system 153% cpu 0.163 total
pypy3 multi.py  0.15s user 0.08s system 145% cpu 0.157 total

I’m not sure whether to congratulate Python, or complain about Pypy in this instance!?

In Summary

There were a few discrepancies, initially, I thought it was down to rendering issues using the print() function, until I tested with the Multiprocessing tests.

Overall Pypy3 is a lot faster than each of our test cases using regular Python3, barring a few exceptions.

I really wanted to run some tests using Asyncio but couldn’t as Pypy supports Python 3.6 and Asyncio was only introduced in Python 3.7, so hopefully in the next Pypy release, I will be able to update this post with the findings.

For now, I will continue to use Python3, but always test the execution of my application in Pypy to see if there are speed improvements that I can get for free.

Unfortunately, I’m left a bit dumbfounded as to exactly where the rule and the exception lie with all of this. Anyone care to educate me further?

Featured image: SUPERFAST Trailer (Fast and Furious Spoof Movie)