Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

> Python's reference counting mechanism will cause everything to be copied anyway.

isn't it CPython specific though, and may not be observed when running on PyPy?



sort by: page size:

CPython is mostly reference-counted.

Eh, PyPy does not use reference counting. Are you saying PyPy + fork() is what you want?

I think it can't use the same recipe. Sam's approach for CPython uses biased reference counting. Internally, Pypy uses a tracing garbage collector, not reference counting. I don't know how difficult it would be to make their GC thread-safe. Probably you don't want to "stop the world" on every GC pass so I guess changes are non-trivial.

Sam's changes to CPython's container objects (dicts, lists), to make them thread safe might also be hard to port directly to Pypy. Pypy implements those objects differently.


PyPy uses a modified Mark and Sweep garbage collector, CPython uses Reference Counting. C extensions such as NumPy (and so Pandas, sklearn etc) are compiled expecting Reference Counting. A translation layer is needed for memory management from PyPy to extensions like NumPy and that introduces overhead (historically - a lot of overhead).

BTW, PyPy (unlike CPython) indeed great for multiprocessing.

CPython's ref-counting forces copy-on-write in RAM even when no write operation have been even applied at all. So, if you have a huge read-only object in RAM forget about multiprocessing. The object will be cloned for each process.


Most of the Python implementations other than CPython use GC rather than reference counting. PyPy does. Iron Python did.[1]

There is a version of PyPy without a GIL[2], but it runs much slower on ordinary code and is still under development. The developers are looking for financial support.[3] The approach is to identify large blocks of code as transactions, and run them in parallel. If they try to access the same data, one transaction fails and is backed out. It's like database rollback.

But you have to write your code like this:

    from transaction import TransactionQueue

    tr = TransactionQueue()
    for key, value in bigdict.items():
        tr.add(func, key, value)
    tr.run()

[1] http://doc.pypy.org/en/latest/cpython_differences.html [2] http://doc.pypy.org/en/latest/stm.html [3] http://pypy.org/tmdonate2.html

> In CPython it's not a problem due to refcounting, but in pypy, the file might be closed at some later stage.

Technically the guarantee in CPython is "soft" so it can be a problem, just rarely is: if the file gets referenced from an object accessible from a cycle the file won't be released (and closed) until the cycle-breaking GC is triggered.


Makes sense. The optimisation only applies when the reference count is one, and PyPy doesn’t use reference counting.

Because CPython is the reference implementation.

Except the reference implementation uses a JIT and is really fast.

Still waiting for PyPy to eventually replace CPython as the reference implementation. Even though they still have a bit of catching up to do.


That's CPython specific though, PyPy doesn't need it.

It could be even more, but cpython (being the reference implementation) has been kept simple.

PyPy manages far more.


That's 3.9.12 of PyPy, not CPython

Only when comparing to CPython

Will CPython always remain the reference implementation?

> Smarter implementations, notably PyPy, will do escape analysis, and never actually end up allocating those objects.

Objects in a collection (which is what we're talking about here) escape kind-of by default.

Until type-specialized collections are merged in PyPy (if they are not yet), it'll have the same issue as CPython.


"Its ability to change what any object is, on the fly"

No, it doesn't. It changes the reference. If things are referred in the right way, it is possible to statistically compile them. PyPy relies on this info too, but JIT compiling.


I was specifically thinking of `gc` at the time of writing, which is a part of python, not CPython. (though it only has reason to exist as a part of CPython)

but yeah, for... probably 90%+ of use-cases, it is the PyPy<=>CPython differences that are more notable


The other major python implementation, pypy, effectively implements CPython. (And insofar as it doens't, it's heading that way.)
next

Legal | privacy