Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

> Those people/use-cases don't care about the GIL.

This is not true. The primary funding and motivation for the GIL removal work comes from the numerical computing community. The PEP (https://peps.python.org/pep-0703/) contains direct quotes from folks working on numpy, scipy, PyTorch, scikit-learn, etc. and also practitioners from places like Meta, DeepMind and so on, describing the practical constraints that the GIL places on many workloads.



sort by: page size:

It's exactly the high-performance native extensions that require the GIL removal to allow for a full utilization of the machine. Although already successful as glue for these libraries, Python is ultimately hamstrung by the GIL on many-core machines. Using the GIL removal to start writing algorithms in interpreted Python is not the intention.

I warmly recommend reading the motivation section of the PEP. It is incredibly well written, quoting issues from PyTorch, scikit and NumPy:

https://peps.python.org/pep-0703/


> But consider that in perhaps 99% of current python code high performance (of the pure python code) is not the primary concern.

Then why do we care about GIL removal at all?


> The purpose of this document and associated code is to convince you of a few things:

> That it is feasible to remove the GIL from Python. > That the tradeoffs are manageable and the effort to remove the GIL is worthwhile.

...

> Removing the GIL will require making trade-offs and taking on substantial risk:

> Removing the GIL will be a large, multi-year undertaking with increased risk for introducing bugs and regressions. > Removing the GIL means making trade-offs between single and multi-threaded performance. I believe we can make a future CPython without the GIL that is faster for both single and multi-threaded workloads than current CPython, but if we keep the GIL and focus solely on single-threaded workloads, we would achieve even better single-threaded performance. (See the “Performance” section for more details.)

I don't want to be too cynical (I am after all not critiquing the details), but this is basically how every remove the GIL project starts. many intelligent people have tried. Many have failed. There are just so many trade-offs, compatibility issues, etc. I'm not personally against this project, but really I'd rather read a document describing the successful result rather than outlining the goals.

That said work on cpython that tries to clean up internal state has gone on for many years and is great. That sort of thing might make removing the GIL more reasonable in the future, but also makes the code easier to understand and work with in the present.

Regardless of my negativity, I wish the author good luck!


> Nobody has ever chosen Python for its runtime performance.

No, they choose it for the ease of using its performant extensions. And those extensions are fundamentally limited in performance by the existence of the GIL. And the authors of those extensions (and their employers) are behind the work to get rid of it.

There are three groups here:

1. "Pure" Python users, whose code makes little/no use of extensions. GIL removal is an immediate win for these uses.

2. "Good" extension users/authors, whose code will support and benefit from no-GIL. GIL removal is an immediate win for these uses.

3. "Bad" extension users/authors, whose code doesn't/can't support no-GIL. GIL removal probably doesn't break existing code, but makes new uses potentially unsafe.

Maintaining the technical debt of the GIL indefinitely so that group 3 never needs to address its own technical debt is not a good tradeoff for groups 1 and 2 which actively want to move the language forwards.


> GIL in python is something that was talked about 5 years ago and will still be talked in 5 years. It limits the ability of python to reap the benefits of better hardware.

It's also a huge source of misattributed problems — I've seen more cases where a complaint about the GIL was really “my algorithm depended on a single locked data structure” or “I was calling a C library which isn't thread-safe” than where the GIL was actually the limiting factor. That's not to say that there aren't real problems for people who want to run pure-Python computational code (not e.g. libraries like Numpy or Pillow) but it also seems to be popular as the bogeyman to blame when someone doesn't want to run a profiler.

> This may be minor, but I really missing ruby rich set of collection methods: take_while, group_by, sample. Yet I can see a point in extracting that to a external library.

See https://docs.python.org/3/library/itertools.html and https://docs.python.org/3/library/random.html for the sample function. I believe the difference is mostly that the Ruby methods are on Array but the Python ones are seen as processors for iterables so they're in a separate part of the standard library.


> The idea is that it's an investment to be recouped for various workloads due to the improved ability to exploit concurrency over a GIL-ful interpreter.

This idea has been rejected by the core team though, getting rid of the gil with a major performance hit is not considered acceptable when most existing Python code is not significantly affected by the GIL.


> If it's such a huge game changer why don't some of the large enterprises which rely on Python fund this work?

Because lots of code in python relies on the GIL to act as a synchronization primitive for threaded code (either explicitly or implicitly) and removing the GIL will break that code in subtle ways and there’s no appetite in those companies to fix that.


"It cannot be understated, removing the GIL will be a HUGE deal for Python! It's an ugly wart which has existed for about 30 years, and nobody else has produced and delivered a working solution to the community."

If it's such a huge game changer why don't some of the large enterprises which rely on Python fund this work?


> Their choice to not release the GIL during their operations is not necessairly a limitation of Python.

Um, what? They do not hold on to the GIL when doing compute intensive operations.


> It cannot be understated, removing the GIL will be a HUGE deal for Python! It's an ugly wart which has existed for about 30 years

I would argue the opposite: it's the secret to Python's success. It might even be my top example of how "worse is better" plays out in real life.

I agree, the GIL feels like an ugly hack. The software engineer in me wants to hate it so much. And, now that I'm on the far side of a successful transition to a data science role, one might think that I hate it even more, yeah? Because the work I'm doing depends so very heavily on compute performance and parallelism.

But it turns out, nah, I'm coming to like it. It's an ugly hack, but it's the best kind of ugly hack: one that gets the job done.

Because I'm pretty sure that the GIL is the secret sauce that makes Python C extensions work so well. Without it, it would be much more difficult to write safe, correct C extensions. Doing it without introducing race conditions that destroy memory safety would be a black art. So people would probably do it less. And that probably means no robust Python scientific computing or data science ecosystem, because that stuff is all C extensions.

We could instead use a C FFI, like it's done in other languages. Java, for example. But Java having to use an FFI and Python being able to use its C extension mechanism is exactly why Python has eaten all of Java's Wheaties in the data space. The marshaling costs of talking back and forth across that boundary are just too high. Copying goes up, locality of reference goes down, cache misses go up. You saturate the memory bus more quickly. Once you've done that, it doesn't matter how many threads you have running on how many cores. The bottleneck is happening outside the CPU. Top will happily tell you those cores are working hard, but it turns out that what they're working so hard at is sitting around and waiting.

This isn't just theoretical. Last year I replaced a heavily parallelized compute-heavy batch process written in Java with a Python implementation that got the work done in less time despite being single-threaded. Sure, the Python implementation was basically a little scripting on top of a library written in C++, and the Java one was pure Java. But that's kind of the whole point. I also know that, back when I wrote the original Java implementation, I tried the same trick of farming the calculation out to a C++ library, and it actually made the throughput even worse. JNI is a harsh master.

And besides, as others have said, numpy & friends give me most the parallelism I actually need, for free.

Maybe it hurts other people more? Maybe Web developers? But there's a part of me that thinks, if you're trying to do that kind of work at scale, making Python go faster is perhaps barking up the wrong tree. There are plenty of other languages that are statically typed (reducing pointer chasing and branching can increase your throughput without giving Amazon more money in the process) and don't even need a global interpreter lock in the first place because they're not interpreted, either.


> I don't understand why people care about the CPython GIL

1. pypy has a GIL

2. not every python program is compatible with pypy.

3. not everyone can program in C.

4. it would be a free speed up for troves of multithreaded code.

that being said, the GIL is not going anywhere, there are enough workarounds and the task is monumental.


Python is used by a lot of "semi-technical" people (data scientists, researchers, hobbyists, etc). Removing the GIL isn't going to make their life any easier.

> or is there a fundamental requirement, e.g. C programs unavoidably interact directly with the GIL?

Both C programs can use the GIL for thread safety and can make assumptions about the safety of interacting with a Python object.

Some of those assumptions are not real guarantees from the GIL but in practise are good enough, they would no longer be good enough in a no-GIL world.

> I know that the C-API is only stable between minor releases [0] compiled in the same manner [1], so it's not like the ecosystem is dependent upon it never changing.

There is a limited API tagged as abi3[1] which is unchanging and doesn't require recompiling and any attempt to remove the GIL so far would break that.

> so it's not like the ecosystem is dependent upon it never changing

But the wider C-API does not change much between major versions, it's not like the way you interact with the garbage collector completely changes causing you to rethink how you have to write concurrency. This allows the many projects which use Python's C-API to relatively quickly update to new major versions of Python.

> I have found a no-GIL interpreter that works with numpy, scikit, etc. [2][3] so it doesn't seem to be a hard limit.

The version of nogil Python you are linking is the product of years of work by an expert funded to work full time on this by Meta, the knowledge is sourcing many previous attempts to remove the GIL including the "gilectomy". Also you are linking to the old version based on Python 3.9, there is a new version based on Python 3.12[2]

This strays away from the points I was making, but with this specific attempt to remove the GIL if it is adopted it is unlikely to be switched over in a "big bang", e.g. Python 3.13 followed by Python 4.0 with no backwards compatibility on C extensions. The Python community does not want to repeat the mistakes of the Python 2 to 3 transition.

So far more likely is to try and find a way to have a bridge version that supports both styles of extensions. There is a lot of complexity in this though, including how to mark these in packaging, how to resolve dependencies between packages which do or do not support nogil, etc.

And even this attempt to remove the GIL is likely to make things slower in some applications, both in terms of real-world performance as some benchmarks such as MyPy show a nearly 50% slowdown and there may be even worse edge cases not discovered yet, and in terms of lost development as the Faster CPython project will unlikely be able to land a JIT in 3.13 or 3.14 as they plan right now.

[1]: https://docs.python.org/3/c-api/stable.html#c.Py_LIMITED_API [2]: https://github.com/colesbury/nogil-3.12


PEP-703 contains a whore Motivation section. Long enough to require a summary:

> Python’s global interpreter lock makes it difficult to use modern multi-core CPUs efficiently for many scientific and numeric computing applications. Heinrich Kuttler, Manuel Kroiss, and Pawel Jurgielewicz found that multi-threaded implementations in Python did not scale well for their tasks and that using multiple processes was not a suitable alternative.

> The scaling bottlenecks are not solely in core numeric tasks. Both Zachary DeVito and Pawel Jurgielewicz described challenges with coordination and communication in Python.

> Olivier Grisel, Ralf Gommers, and Zachary DeVito described how current workarounds for the GIL are “complex to maintain” and cause “lower developer productivity.” The GIL makes it more difficult to develop and maintain scientific and numeric computing libraries as well leading to library designs that are more difficult to use.


> Yeah, I know about that argument but it just doesn't make sense to me. Removing the GIL means that 1) you make your language runtime more complex and 2) you make your app more complex.

#2 need not be true; e.g., the approach proposed here is transparent to most Python code and even minimized impact on C extensions, still exposing the same GIL hook functions which C code would use in the same circumstances, though it has slightly different effect.


I agree that the GIL has become a problem for a variety of high-performance tasks, but, I’m curious, what kind of problems have you encountered with numerical computation? I contribute to both NumPy and TensorFlow, two libraries with different processing models, and I don’t see any obvious area where removing the GIL would provide substantial benefits. However, I’ll readily admit that I don’t think about this too often and it’s entirely possible I’m missing something obvious! Maybe Julia could provide some guidance around this.

I would also bet (but not too much) that we eventually see major progress in removing the GIL. I really don’t think it’ll be around forever!


> "And why has no one attempted something like this before?"

"However, if you've been around the Python community long enough, you might also know that the GIL was already removed once before--specifically, by Greg Stein who created a patch against Python 1.4 in 1996." (Also mentioned in the OP)

More info can be seen at http://dabeaz.blogspot.nl/2011/08/inside-look-at-gil-removal...


> "It's about time Python rids itself of that needless limitation."

I want to correct one thing that I see plastered all over this thread.

The GIL isn't a programming language construct. It's an implementation detail. The GIL isn't a "Python limitation" in any way, because it has nothing to do with Python.

CPython (aka. Cython), probably the most popular and widely used VM for Python, was built around a GIL.

There are other VMs out there that don't have a GIL (ie. Jython, IronPython) and can be used out of the box.


From the article:

Back in reality, though, complaining about the GIL as though its a serious barrier to adoption amongst developers that know what they’re doing often says more about the person doing the complaining than it does about CPython.

It was a solid, well-argued piece up to this point. You do yourself and the Python community a disservice by writing off your critics as ignorant. It sounds petulant and childish, and is wrong.

There are valid arguments on both sides of the GIL argument, but neither side's advocates are ignorant or bad programmers.

next

Legal | privacy