I think it can't use the same recipe. Sam's approach for CPython uses biased reference counting. Internally, Pypy uses a tracing garbage collector, not reference counting. I don't know how difficult it would be to make their GC thread-safe. Probably you don't want to "stop the world" on every GC pass so I guess changes are non-trivial.
Sam's changes to CPython's container objects (dicts, lists), to make them thread safe might also be hard to port directly to Pypy. Pypy implements those objects differently.
PyPy uses a modified Mark and Sweep garbage collector, CPython uses Reference Counting. C extensions such as NumPy (and so Pandas, sklearn etc) are compiled expecting Reference Counting. A translation layer is needed for memory management from PyPy to extensions like NumPy and that introduces overhead (historically - a lot of overhead).
BTW, PyPy (unlike CPython) indeed great for multiprocessing.
CPython's ref-counting forces copy-on-write in RAM even when no write operation have been even applied at all. So, if you have a huge read-only object in RAM forget about multiprocessing. The object will be cloned for each process.
Most of the Python implementations other than CPython use GC rather than reference counting. PyPy does. Iron Python did.[1]
There is a version of PyPy without a GIL[2], but it runs much slower on ordinary code and is still under development. The developers are looking for financial support.[3] The approach is to identify large blocks of code as transactions, and run them in parallel. If they try to access the same data, one transaction fails and is backed out. It's like database rollback.
But you have to write your code like this:
from transaction import TransactionQueue
tr = TransactionQueue()
for key, value in bigdict.items():
tr.add(func, key, value)
tr.run()
> In CPython it's not a problem due to refcounting, but in pypy, the file might be closed at some later stage.
Technically the guarantee in CPython is "soft" so it can be a problem, just rarely is: if the file gets referenced from an object accessible from a cycle the file won't be released (and closed) until the cycle-breaking GC is triggered.
"Its ability to change what any object is, on the fly"
No, it doesn't. It changes the reference. If things are referred in the right way, it is possible to statistically compile them. PyPy relies on this info too, but JIT compiling.
I was specifically thinking of `gc` at the time of writing, which is a part of python, not CPython. (though it only has reason to exist as a part of CPython)
but yeah, for... probably 90%+ of use-cases, it is the PyPy<=>CPython differences that are more notable
isn't it CPython specific though, and may not be observed when running on PyPy?
reply