What would be the point of filing bugs against PyPy stating that it is slower or more resource-hungry on programs XYZ, when this is ultimately due to PyPy's core design decisions which are a point of pride and never going to be abandoned?
Expecting some positive outcome from that seems incredibly unrealistic to me. I believe it is a waste of time. It isn't my responsibility to provide reasons to PyPy why I'm not using it. Let PyPy - a project receiving no small amount of promotion - show that it is better for my purposes, before I invest big in switching over projects.
What you COULD say is that it doesn't make sense that in practice, people are treated like idiots and flamed if they publicly mention that they don't find in practice that PyPy is always or even generally better than CPython.
Or you could say that they should both work for most purposes, and that the choice is nuanced (measure the difference yourself), and in just a few respects PyPy is not as mature (not surprising given the lengths of the projects' histories). And PyPy is a work in progress and you expect it to get better if it isn't better than CPython now, for some specific purpose.
I don't expect you to say either of those things, because it seems important to the PyPy project to promote it over CPython and if that means selectively mentioning only the cases which are in PyPy's favor, or softly suppressing dissent, then so be it. That is how it seems to me, and I don't understand why it has to be that way.
I am sympathetic because PyPy is very good and is improving fast. but...
That doesn't change how the PyPy project tends to represent itself, which almost always comes across as something like "6x speedup for everything (excluding JIT warmup)". If you want everyone to adopt PyPy instead of CPython then it is part of your job to find the cases where PyPy is not actually faster rather than saying it is the user's fault. And it is not your job to select only benchmarks which tell the story you want.
If the difference between interpreters is nuanced then that nuance should be expressed so that people can make intelligent decisions rather than dismissing one or another interpreter as "slow".
I think your point about "should work for most purposes" express my view as well. Choice is tricky and especially programs violently optimized for CPython might find PyPy's characteristics strangely different. I don't see any particular design decisions that would prevent PyPy outperforming CPython on everything in the long run, but this is certainly not the case right now.
Maybe our PR got too strong or something, but I think "measure yourself and report if it's too slow" was always our motto. In fact we're definitely more interested about hearing when people find their programs slow, rather than fast because it gives us more optimization opportunities. It's however entirely pointless without a way for me to reproduce it, since I have entirely no clue.
In short - I think we violently agree and if PyPy's PR is not up to the standard and fairness you would expect, I apologize.
PyPy to me looks like a great effort and a technological tour de force. But I wish they would talk _more_ about why and when PyPy will be _slower_ than CPython on their site . Here is why.
I tried to run some of my scripts on PyPy and performance was invariably worse (about 50% worse). And my first reaction was: PyPy is not delivering on its promises. Only later, on some forum I read that PyPy does not perform well on large dictionaries (and this is essentially what I do in my scripts). Have I known it in advance, my first impression of PyPy would be much better.
IHMO, PyPy isn't a strong counter-argument for Python's slowness.
Yes, PyPy is probably faster than CPython. But what then? In most cases, CPython is THE PYTHON. PyPy is still in its early age and has a lot to demonstrate itself as a serious alternative to CPython.
On the other hand, programmer could get out-of-box performance boost from adapting Go without worrying the potential compatibility issues.
PyPy is a very interesting and promising project. I highly respect that. I just don't like the idea of merely using it to dodge the blame of performance slowness, when what people are really talking about is CPython.
>I would certainly prefer it if PyPy was 100% compatible with the CPython C API even if it was at 80% (maybe even 60%) of the CPython C API speed because then I don't even have to think. I'd be using PyPy because it's faster overall and I can do the analyses I want faster.
While part of me agrees with this, if PyPy starts sacrificing performance for CPython compatibility then pretty soon it'll degenerate into CPython.
> PyPy is still significantly faster than CPython while (afaik) allowing that sort of stuff to go on
First of all that's only true when it managed to jit the code, secondly only until you try to do any of those slow things. For instance the C ABI emulation they have both cannot support all of CPython and wrecks performance. The same is true if you try to do fancy things with sys._getframe which a lot of code does in the wild (eg: all of logging).
In addition PyPy has to do a lot of special casing for all the crazy things CPython does. I recommend looking into the amount of engineering that went into it.
I think that's because when new codebases are written, pypy isn't on the latest version of the Python spec, so people choose CPython. If it got a little more support it could get to parity or n-1, and far more projects would start with it and stick with it.
I don't use it myself, I just see the relatively small difference in investment it would be between a great project that's slightly languishing in obscurity despite incredible talent and effort, and a genuine full alternative to CPython.
>I'm an actual developer (not a scientific programmer) and it took me a little while to understand what PyPy does.
I'm a scientist, not a scientific programmer or a developer and this is all I really care about: PyPy is currently--in it's partially implemented state--much, much faster than CPython on the vast majority of things it can do. If I am able to use PyPy's NumPy and it's faster than traditional NumPy I will do so as long as the opportunity cost doesn't outweigh the speed increases (NumPy is pretty useless to me--maybe not some--without SciPy and matplotlib).
I don't care that PyPy is written in RPython any more than I care that it has a JIT, or that CPython is written in C. I also don't care how that JIT works or how CPython compiles to byte code or how Jython does magic to make my Python code run on the JVM. I do care that it "works," as a scientist. I do care that they are "correct" implementations, as a scientist. As an individual, I am interested in the inner workings of PyPy and CPython, CoffeeScript, Go, and Brain Fuck, but when I'm working on research the only thing that actually is important as far as the language implementation is concerned is that it just works. The interpreter is just a brand of the particular tool that I'm using.
I would certainly prefer it if PyPy was 100% compatible with the CPython C API even if it was at 80% (maybe even 60%) of the CPython C API speed because then I don't even have to think. I'd be using PyPy because it's faster overall and I can do the analyses I want faster.
Anyway, I think if you're explaining all of what you mentioned to a NumPy user or a PyPy NumPy user you'd be doing it wrong. Or maybe the PyPy folks would be doing it wrong. Because this is how that conversation would go with my peers.
Sad Panda: "Ugh my code is running slowly I think I have to jump into C"
Me: "Have you tried PyPy's NumPy yet?"
Sad Panda: "What's that?"
Me: "It's faster Python and NumPy. Go here [link] and download it see if it runs your code faster"
Sad Panda: "Okay I'll do that"
..a while later..
Sad Panda: "It was a little faster, but I ended up getting one of the CS guys to help me run it on a tesla cluster with OpenCL. But I think I can use it on the spiny lumpsucker data I'm collecting."
I have to echo this sentiment here. Every time I see a post about PyPy being fast, I think, "Hmm, perhaps I should try out this package I'm working on and see if it performs better." After getting a PyPy environment working---sometimes by installing forks that are PyPy compatible---I almost always end up with real word uses that are noticeably slower with PyPy as opposed to regular ol' CPython.
I may not be coding to PyPy's strengths, but I've gone through this process on several different packages that I've released and I tend to see similar results each time. I want to try and use PyPy to make my code faster, but it just doesn't seem to do it with real code I'm using.
The main reason I'm not even considering PyPy is that it lags behind CPython by a couple of versions. I'm getting ready to move to 3.7 while PyPy is still on 3.5.
Just because pypy is faster than cpython on a handful of test cases doesn't make it worthy of corporate sponsorship. I would sponsor development if it made a noticeable improvement.
Not to belittle the project, but I tried a vanilla order entry implementation on my test box in nasdaq. Essentially it runs an epoll loop using a C extension to take advantage of the myricom DBL calls. RT latency was roughly 5 usec faster using cpython. As for other parts of the system (eg feed handler), the performances were comparable.
For other applications (eg compliance reporting) pypy is 2x cpython speed, but I could care less about that timing (even if it ran 1000x slower than cpython, it wouldn't matter)
TL;DR: it needs to be useful. And I just don't see the usefulness for my Python applications.
I've seen the promotional materials on PyPy's performance (I find them particularly persuasive on odd synthetic benchmarks) but thank you for the link.
I think you are a nice guy doing very useful work. You deserve to be proud of your work. But I do think you should be aware that an aggressive social orthodoxy has formed around the performance of PyPy. (It isn't unusual that I was downvoted here for suggesting PyPy isn't always faster, for example; it's the same thing in other fora and offline).
I think the PyPy's approach for generating Python interpreters is a great, clever idea. I think the project is more exciting than Shedskin was, etc. I'm impressed at the progress that has been made in expanding functionality and library support. I am looking forward (although with a little skepticism) to the day when it's really better at everything and is on my phone and everywhere. Sounds good.
But I am concerned about a cultural shift in Python, and unnecessary increases in complexity which come along with it. Many of the things which drew me to Python years ago come from its roots in the Unix/C world. I only recently began to hear dogmatic arguments that JIT is always faster than ASM, and memory is cheap so it makes no difference to use 2-4x as much, and being able to hook up to C isn't so important, and everyone should be writing incomprehensible, heavily-threaded programs using the world's biggest gc and synchronization mechanisms which require a lot of close supervision by people with pompous job titles. And you end up managing concerns of this type more than you spend writing domain code. And you do it all not because it is the cleanest and most direct way to get a good result, but because it's what is understood to be the right thing.
So I think a lot depends on whether PyPy drinks too much of its own kool-aid. It could second-system Python to death. It could succumb entirely to Enterprisitis. I don't need CPython, but I hope PyPy will smell like it. If the culture and the working environment continue to Java-ize, I will probably jump ship to Go or Ruby or whatever has a good library and most of the virtues I currently get from Python. I don't think that going more complicated is the best way to make software faster or better and it is important to my quality of life that I feel good about the code I am writing.
PyPy was never able to get fast enough to replace CPython in spite of its lack of compatible C API. CPython is trying to move fast without breaking C API, and 2--9% improvement is in fact very encouraging for that and other reasons (see my other comment).
The author complains about CPU-bound performance and then mentions PyPy but claims it's ecosystem is barren enough that the point stands. This claim is presented without evidence.
In practice it doesn't really show up: the cases where code is actually CPython-specific usually imply all the work is being done in highly optimized C bits. Hence, those programs, despite being run on CPython, aren't CPU-bound.
Finally: while this FUD is really pernicious because it used to be hard/impossible to get scientific code like numpy to run on PyPy, that hasn't been the case in a long time. Right now you can just pip install numpy and it'll just work.
Devs are already aware of PyPy. If it was as easy as being aware, everyone would switch tomorrow. The problem is that tons of code uses C extensions created with swig or python boost and those take a lot of time to port to PyPy. Another problem is that PyPy is sometimes slower/less memory efficient than CPython when you wrote the code that is optimized for CPython. That is abusing dictionaries and very limited OO use.
That's a whole other issue. PyPy and CPython have conflicting aims, which is partly why it never replaced CPython as the defacto default Python runtime. Also, PyPy is much slower for stuff that's only going to run for a short time. Still, many of the lessons learned in PyPy can be used to improve CPython performance.
a) that surprises me, because in my short experience with it, I found that PyPy is more about simpler idioms if you are aiming for performance, a bit C-ish in style
b) it's been something like... 3 years (I think) since I last used PyPy for some serious performance testing of code. And even then it was a horrible proof of concept that barely ran, but helped me point out that "hey, we optimized it to 3.5x times faster with CPython and PyPy is getting to 4.5x over the original code, but it's throwing a thousand errors and warnings on screen and we can't be sure it'll work in production"[0].
Probably things have changed a bit on the PyPy side, though I don't think RPython has changed much in concept.
Also I see I never got to explain the pythonic note on my previous comment: basically, whatever "pythonic" means to the team. In general I try to go for the definition that is "it's easy to read and understand what's going on with the code, and provides easy-to-use interfaces for programmers". Not an easy to do thing, but at least trying to do it helps a lot.
[0] the job was that there was a large program that was bottlenecking a process in a lab, and somebody threw the idea of "oh we should use pypy because it'll surely make things run faster". And then it was my responsibility to pick that skeleton up and take care of it.
Oh and the code was pretty inefficient to begin with, which is why we got to speed it up so much without even going into C. A quick test with Cython put it in the 7-10x range of speedups, but the client wasn't willing to handle that, so they just kept it in CPython.
What you COULD say is that it doesn't make sense that in practice, people are treated like idiots and flamed if they publicly mention that they don't find in practice that PyPy is always or even generally better than CPython.
Or you could say that they should both work for most purposes, and that the choice is nuanced (measure the difference yourself), and in just a few respects PyPy is not as mature (not surprising given the lengths of the projects' histories). And PyPy is a work in progress and you expect it to get better if it isn't better than CPython now, for some specific purpose.
I don't expect you to say either of those things, because it seems important to the PyPy project to promote it over CPython and if that means selectively mentioning only the cases which are in PyPy's favor, or softly suppressing dissent, then so be it. That is how it seems to me, and I don't understand why it has to be that way.
reply