Hacker Read

mlthoughts2018 | karma 4165 | avg karma 1.32 · 2021-01-18 13:28:57+00:00

Cython - in fact I think in 2021 if you want to write a pure C or pure C++ program, Cython is the best way to go, and just disable use of CPython.

The “need to rewrite” is actually a sort of advantage with Cython. You only target small pieces of your program to be compiled to C or C++ for optimization, and the rest where runtime is already fast enough or otherwise doesn’t matter, you seamlessly write in plain Python.

Using extension modules is just a time-tested, highly organized, modular, robust design pattern.

Julia and others do themselves a disservice by trying to make “the whole language automatically optimized” which counter-intuitively is worse than make the language overall optimized for flexibility instead of speed, yet with an easy system to patch optimization modules anywhere they are needed.

reply

szemet | karma 1146 | avg karma 4.44 · 2021-01-18 13:59:47+00:00

> Using extension modules is just a time-tested, highly organized, modular, robust design pattern.

I really don't get this. I'am fully on the side that limitations may increase design quality. E.g I accept the argument that Haskell immutability often leads to good design, I also believe the same true for Rust ownership rules (it often forces a design where components have a well defined responsibility: this component only manages resource X starting from { until }.)

But having a performance boundary between components, why would that help?

E.g. This algorithm will be fast with floats but will be slow with complex numbers. Or: You can provide X,Y as callback function to our component, it will be blessed and fast, but providing your custom function Z it will be slow.

So you should implement support for callback Z in a different layer but not for callback X,Y, and you should rewrite your algorithm in a lower level layer just to support complex numbers. Will this really lead to a better design?

reply

mlthoughts2018 | karma 4165 | avg karma 1.32 · 2021-01-18 08:19:01

> “But having a performance boundary between components, why would that help?”

It helps precisely so you don’t pay premature abstraction costs to over-generalize the performance patterns.

One of my biggest complaints with Julia is that zealots for the language insist these permeating abstractions are costless, but they totally aren’t. Sometimes I’m way better off if not everything up the entire language stack is differentiable and carries baggage with it needed for that underlying architecture. But Julia hasn’t given me the choice of this little piece that does benefit from it vs that little piece that, by virtue of being built on top of the same differentiability, is just bloat or premature optimization.

> “you should rewrite your algorithm in a lower level layer just to support complex numbers.”

Yes, precisely. This maximally avoids premature abstraction and premature extensibility. And if, like in Cython, the process of “rewriting” the algorithm is essentially instantaneous, easy, pleasant to work with, then the cost is even lower.

This is why you have such a spectrum in Python.

1. Create restricted computation domains (eg numpy API, pandas API, tensorflow API)

2. Allow each to pursue optimization independently, with clear boundaries and API constraints if you want to hook in

3. When possible, automate large classes of transpilation from outside the separate restricted computation domains to inside them (eg JITs like numba), but never seek a pan-everything JIT that destroys the clear boundaries

4. For everything else (eg cases where you deliberately don’t want a JIT auto-optimizing because you need to restrict the scope or you need finer control), use Cython and write your Python modules seamlessly with some optimization-targeting patches in C/C++ and the rest in just normal, easy to use Python.

reply

chalst | karma 4793 | avg karma 2.85 · 2021-01-18 14:41:31+00:00

> One of my biggest complaints with Julia is that zealots for the language insist these permeating abstractions are costless, but they totally aren’t.

This sounds like it might be interesting, but your later comments about overhead and abstraction costs sounds like you maybe don't understand what Julia's JIT is actually doing and how it leverages multiple dispatch and unboxing. Could you be a bit more concrete?

reply

mlthoughts2018 | karma 4165 | avg karma 1.32 · 2021-01-18 14:53:00+00:00

No I think that’s what I’m saying. When raising the issue that using multiple dispatch this way is premature abstraction that has intrinsic costs, all I get is the religious pamphlet about multiple dispatch.

oivey | karma 1635 | avg karma 3.77 · 2021-01-18 09:44:14

In practice the multiple dispatch overhead is elided by the compiler. If it can’t be you’re doing something truly dynamic, which is generally unavoidably slower. It’s still a better place to be than everything being a generic Object type.

mlthoughts2018 | karma 4165 | avg karma 1.32 · 2021-01-18 17:31:20+00:00

The nice thing about Cython is that you can have both - all the multiple dispatch you want with fused types, or escape that paradigm to do other things if you desire. It gives a lot of surgical control.

oivey | karma 1635 | avg karma 3.77 · 2021-01-18 18:13:42+00:00

I don’t think that is true. As far as I know, Cython let’s you do function overloading and single dispatch via class inheritance. I think you also miss out on the type inference that lets you do things like pipe dual numbers through functions without any dispatch related overhead.

galangalalgol | karma 3742 | avg karma 1.77 · 2021-01-18 15:19:39+00:00

Does compiling with cython decrease the ffi overhead of the calls into native code? My problems with numpy have always been that I have to make a lot of calls on small bits of data and the ffi overhead eats all my performance gains. If I put more logic on the native side and made fewer bigger calls it would be faster, but that often doesn't make sense, or is a slope where putting the logic unto native pulls a data structure over or another related bit of logic until I just have a tiny bit of python left.

cb321 | karma 954 | avg karma 1.58 · 2021-01-18 20:52:57+00:00

Probably. Cython compiles a C-style superset of Python into C. Then a C compiler compiles that to a Python-importable DLL/.so. So, the overhead to call a C function is no more than declaring its types (programmer person overhead) and then, in the generated C, the native C-linkage function can be called like any other. Now, just one C function calling another from another translation unit (i.e. object file or shared lib) can be "high" overhead (nothing like Py FFI), but you may also be able to eliminate that with modern compilers with link-time-optimization with some build environment care.

cycomanic | karma 10617 | avg karma 3.44 · 2021-01-18 10:56:58

I have been using pythran for the last year and the nice thing is that you hardly have to rewrite anything but get speeds which are often as fast (or sometimes faster) than c modules.

The problem with cython is that to really get the performance benefits your code looks almost like C.

I agree with you on the optimize the bits that matter, often the performance critical parts are very small fractions of the overall code base.

reply