I have turned to pypy quite extensively for pure-Python text processing tasks, where I often get a 10-100x speed up just by changing the command line invocation. For example, I wrote a proof of concept Rabin-Karp hashing approximate string matching algorithm to perform plagiarism analysis while I was working at Udacity in around 2017. The system never went into production, but pypy was super helpful in crunching all historical user submissions for analysis.
I’ve also had great success using pypy to accelerate preprocessing steps (when they don’t rely on incompatible c libraries) for machine learning pipelines. It’s the most painless performance enhancement trick in my toolbox before I reach for concurrency (in which case I reach for joblib or Dask).
The one oddity I’ve noticed is that using tuples (including things like named tuples) often speeds up CPython by a lot, but even plain tuples can slow down pypy on the same code—in some cases pypy winds up slower than CPython.
In any case, I’m low key in love with pypy, even though I can’t use it for _everything_.
For algorithmic code PyPy can provide substantial speedups over CPython. I've used PyPy in code fingerprinting large bioinformatics files and seen big speedups. I've also tried porting a webapp processing JSON from CPython and seen no perceptible speedup.
I recently discovered that pypy3 can run all my day to day Python code. It has some issues with slightly different behavior from cpython when using threads but other than that I see a 4x speedup on most of my slowest pure python workloads (parsing large rdf files and reserializing them after computing a total order on all their nodes). Huge win for productivity.
I use pypy as a drop-in replacement for CPython for some small data crunching scripts of my hobby projects. Might not count as "real work", but getting "free" speed ups is very nice and I'm very grateful for the PyPy project for providing a performant alternative to CPython.
I was close to trying pypy on a production django deployment (which gets ~100k views a month), but given that the tiny AWS EC2 instance we're running it on is memory bound, the increased pypy memory usage made it impractical to do so.
I use it at work for a script that parses and analyzes some log files in an unusual format. Wrote a naive parser with a parsing combinator library. It was too slow to be usable with CPython. Tried PyPy and got a 50x speed increase (yes, 50 times faster). Very happy with the results, actually =)
Have you used pypy recently? I've found that memory usage in particular is much better as of around 1.9, compared to previous releases. Still worse than CPython, for sure, but some of my code is around 10x faster under pypy (all depends on what I'm doing, though, for sure; this stuff is numerically heavy).
Yes, PyPy is fantastic for long-running processes that aren't primarily wrappers around C code. In my experience, the speedups you see in its benchmarks translate to the real world very well.
PyPy is great -- while I still use CPython for our more complex webapp and associated tools that have heavy dependencies on C-extensions; I increasingly use PyPy for the more mundane cpu/data heavy lifting I do. It's typical to get 2X the performance (comparable to some compiled languages) and still use much of our utility code, configs, etc.
I didn't hear about PyPy before, but I think you're doing great work.
I would be interested in seeing benchmarks where PyPy is compared with more recent versions of CPython. https://www.pypy.org/ currently shows a comparison with CPython 3.7, but recent releases of CPython (3.11+) put a lot of effort into performance which is important to take into account.
I run it in a production environment (side project).
I also use it locally when developing when necessary.
It really does speed up loops by 5x or so.
So when you're trying to say... test 100 million+ iterations of something, pypy will run that in something like 2 minutes versus cpython can take me 15 minutes.
Honestly it's an amazing performance gain for 0 effort, and I have yet to run into a limitation with it.
I found that PyPy sometimes has unexpected slowdowns. When we were porting from Python to PyPy on some offline processing tools, the most crazy one was building strings via += and sum(arrays,[]), which is much slower than cpython.
I’ve also had great success using pypy to accelerate preprocessing steps (when they don’t rely on incompatible c libraries) for machine learning pipelines. It’s the most painless performance enhancement trick in my toolbox before I reach for concurrency (in which case I reach for joblib or Dask).
The one oddity I’ve noticed is that using tuples (including things like named tuples) often speeds up CPython by a lot, but even plain tuples can slow down pypy on the same code—in some cases pypy winds up slower than CPython.
In any case, I’m low key in love with pypy, even though I can’t use it for _everything_.
reply