Hacker Read

mehrdadn | karma 286 | avg karma 0.06 · 2021-03-04 11:00:29+00:00

I'm not trying to write the most horrible comparison at all. Perhaps the most important thing to keep in mind here is that this is a general computer science paper, and my comparison of C++ and Python is just intended to serve as a familiar (and vivid) illustrative example of the general phenomenon I'm trying to describe. The paper is emphatically not intended to be a "Python vs. C++" paper. Everything you see there that is "concrete" (the language, the running times, etc.) is intended to be a mere illustration of the overarching concept (design decisions & their consequences) being discussed, and it could manifest itself in any language.

The context to keep in mind when reading the paper is: When designing a programming language & its standard library (or any API), we need to define an interface we can use as a building block, and we're analyzing the consequences of our choice of building blocks. In particular, we first examine the case of comparison-based data structures, which requires defining ordering primitives. In C++, the primitive is the < operator. In Python 2, it's cmp(). (In Python 3, it's a mix of < and ==, whose implications I discuss as well.) We assume user-provided types implement that basic interface, and we implement everything else we need on top of that.

So the question I'm analyzing in that example is: What happens if my primitive comparison operation is a 2-way comparison (lt(), like in C++) and then I implement 3-way comparison in terms of that (such as when I need it for for a binary search tree)? Now, what if we do the opposite: what happens if instead my primitive comparison operation is a 3-way comparison (cmp(), like in Python 2) and I only need to perform a 2-way comparison later? What are the trade-offs?

To do this, I take both approaches, implementing each in terms of the other, and compare how they behave complexity-wise. The conclusion I come to is that the choice of the primitive (which is often made by the language designers) isn't merely a question of aesthetics, but rather, it actually affects the time complexity of common operations (like traversing search trees). Similarly, the decision to cache a hash code doesn't just result in a constant-factor improvement, but it can actually change the time complexity of a program. And so on.

I think if you re-read the paper with these in mind, it should hopefully make more sense. The rest of what you said doesn't enter the picture at all... these are already balanced binary trees, the decision to use less<> is fundamentally independent of what C++ stdlib implementation you use, and the time of the vector concatenation isn't even being measured. Those things are unrelated to the point of the paper entirely. I was just trying to minimize the extraneous lines of code so we can focus on the heart of the problem instead of getting distracted by boilerplate.

reply

attractivechaos | karma 696 | avg karma 3.5 · 2021-03-04 14:56:08+00:00

Thanks for the interesting case. I guess many readers have read too much into lt2 and lt3 but overlooked the code under 2.4.2: C++ could actually be that bad. In that code, you only showed the timing in comments. The result may be worth a table. Perhaps another way to structure the manuscript is to give the surprising result of 2.4.2 first and then to explain what makes that happen.

mehrdadn | karma 286 | avg karma 0.06 · 2021-03-04 21:11:30+00:00

Yeah—I think I laid out the paper more like a story, but I might indeed need to change that as it appears it leaves people confused before they get to the punchline. Thanks for the feedback! Glad you found it interesting.