Hacker Read

torus · 2015-05-18 08:10:59

While Haskell is undoubtedly better suited to writing concise implementations of sorting algorithms, the C++ versions given could have been written in a much shorter and clearer fashion.

vidarh | karma 41717 | avg karma 2.6 · | 2015-05-18 14:03:26+00:00

For C/C++ all that is usually needed is to implement a few helper functions.

E.g. the quicksort example is one of my favourite pet peeves, because the reason the C version sorts in place is that the archetypical C implementation after Hoare sorts in place.

Many of the FP versions of quicksorts like this end up requiring a lot of additional memory. Maybe it is becoming reasonable to assume they will optimise it into an in-place sort these days, but it didn't use to be.

Meanwhile, with a couple of small, generic helper functions, you can get down to nearly the same complexity in C/C++. For C++ the only helper you absolutely need to cut down on the verbosity has been a part of the standard library since C++98

I wrote a blog post about this back in 2005 [1] about one of the other typical Haskell quicksort examples, and referenced an article by Alexandrescu that gave an example implementation of quicksort in C++ with the (generic) partitioning, as well as the pivot selection split out that looks like this:

        template <class Iter>
        void Sort(Iter b, Iter e)
        {
          const pair<Iter, Iter> midRange = Partition(b, e, SelectPivot(b, e));
          Sort(b, midRange.first);
          Sort(midRange.second, e);
        }

You can do the same in C easily enough. So what Haskell had over C/C++ in this respect was mainly a more expressive standard library. C++ has long had std::partition(), so it can be done entirely with the standard library too if you're willing to do the same (bad) fixed/dumb pivot selection that the typical Haskell versions (as well as the archetypical C version) use.

Unfortunately the link to the CUJ article is dead, as it was a quite fascinating walk through optimising sorting (including better pivot selection).

[1] http://www.hokstad.com/archives/2005/03/about_haskell_a

reply

Nitramp | karma 1920 | avg karma 4.27 · | 2010-12-11 05:36:39

That's actually a nice nab for functional programmers. They run around all the time telling how nice their language of choice is, showing off quicksort in three lines of code.

But please don't have the audacity of pointing out that original quicksort was supposed to be an efficient in place sort, not a mental exercise in beauty of a language. Then you get a Haskell version that is in place, and supposedly fast. It's about 10 lines longer than the C version, uses Monads, ST, recursion, "unsafeRead/Write", "rscan", "lscan", "swap", "sloop", etc., all together about 20 concepts, built-in functions, etc.

Also, of course, if it really should be fast, you have to use several compiler hints and macros to make it type specialize and all. Obviously.

The C version is shorter, uses nothing but array accesses, do/while, if, int comparisons, and recursion. If you want to explain the C version to a novice programmer, it will maybe take you half a day. The inplace Haskell version? Uhhh, yeah.

I can't help it, but things like these leave me with the impression of Haskell being a theoretically really nice language whose utility in actually getting things done seems somewhat questionable.

reply

anon1385 | karma 10161 | avg karma 7.48 · | 2013-11-30 15:46:23

It doesn't seem that meaningful to compare a totally naive C implementation to an optimised Haskell one either. (Well, other than to show how fast C is even without any effort to optimise the code). Saying you have to keep the algorithm the same seems like a rather arbitrary limitation on the comparisons, especially since some algorithms are much less of a natural fit in some languages (see: quicksort in Haskell vs C [1]).

I would suggest it's best to show (at least) 4 versions of the code to compare two language: the naive implementation in both languages, and the really optimised version in both languages (plus intermediate levels of optimisation if you feel dedicated). That gives a picture of how fast the typical code one would write will be, as well as the potential for optimisation if you are experiencing bottlenecks, and how much effort it takes to optimise the code.

[1] http://augustss.blogspot.co.uk/2007/08/quicksort-in-haskell-...

reply

ijt | karma 35 | avg karma 3.18 · | 2012-03-19 06:55:56+00:00

The difference is not only due to identifier length. Haskell's sort is a function in the mathematical sense. By looking at its type, I can see the input and output at a glance and understand how to use it. It's not immediately obvious from the signature of the C++ sort procedure that the "first" and "last" arguments are being used for both input and output. When you abbreviate the identifiers for STL sort it's even less obvious what's going on.

zerr | karma 4218 | avg karma 1.52 · | 2017-07-16 19:03:10+00:00

There are cases for more performance as well, e.g. C's qsort vs C++'s std::sort - for the latter, the comparison function can be inlined (e.g. default less operator).

veyron | karma 2817 | avg karma 2.28 · | 2012-03-17 09:26:36

FTA:

     By inlining C++ is able to run just the raw algorithm. On the opposite the C code has a number of indirections

The typical C developer is aware of the deficiencies and would write a custom macro to do the sort (I have a whole macro library for many small functions)

I agree that the language provides convenience, but the statement that sorting as a concept is faster in C++ than in C seems to be a stretch.

reply

comex | karma 19146 | avg karma 4.18 · | 2012-09-18 15:19:42

Just a nitpick: sorting is a bad example because qsort could be made just as fast if it were defined in the header and marked inline. C++ provides more flexibility (you don't have to make everything inline, you can have a family of "real" functions indexed by type), but it isn't really necessary for sorting.

wting | karma 2907 | avg karma 7.67 · | 2014-05-07 18:35:18

A large reason why the naive, imperative quick sort performs well in practice is that it takes advantage of memory locality but the Haskell one doesn't.

agentultra | karma 10491 | avg karma 3.58 · | 2022-06-29 12:36:09

Here's a direct translation of a C++ implementation into Haskell with no optimization: https://koerbitz.me/posts/Efficient-Quicksort-in-Haskell.htm...

Personally I think the Haskell code is a bit nicer. Mutation is inherently unsafe and you need to be more careful when performing such computations. Haskell code makes this painful to read by planting red-flag words like, "unsafe" around.

In C++ mutation is the norm and so it's much easier skip past such code. No indication that these operations are unsafe. The experienced reader will scrutinize this carefully or gloss over it at their own peril. The code is terse on the subject as the syntax maximizes efficiency for this case.

Here's another: https://stackoverflow.com/a/5269180

If you absolutely need in-place mutation, it's available. It's just not the default.

reply

lobster_johnson | karma 10160 | avg karma 2.95 · | 2017-09-29 04:20:43+00:00

Haskell is beautiful, but it's worth mentioning out that Haskell's quicksort is not written that way, because in this case beautiful means incredibly slow (and also because that's not even quicksort, which is an algorithm defined by how it modifies data in-place). The real quicksort implementation uses mutable arrays to perform an in-place sort, and isn't beautiful at all.

merijnv | karma 1071 | avg karma 3.4 · | 2010-12-11 22:15:12+00:00

> That means there is a lot of overhead for doing something simple like an in-place qsort

Well, the problem not so much that there is a lot of overhead for doing an in-place qsort. The problem is that "in-place" anything is impossible in Haskell, it violates referential transparency.

Merge sort however needs no in-place update and can trivially (in Haskell world) be parallelized (merge sort is associative, you can use it as the reduce step of MapReduce). So you are correct in concluding that mergesort is used in Haskell.

You state "something simple like an in-place qsort", your statement about qsort being simple makes some assumptions about your programming languages world, which are not the same for all languages especially not Haskell. No whether all this is a pro and a con depends on what sort of coding you're doing of course.

I think that people call Haskell's code more elegant because they (like me) find C++ template and type annotations to be annoyingly verbose and messy for no good reason. For example for the two quicksort functions:

"template<typename RandomAccessIterator> void quicksort(RandomAccessIterator first, RandomAccessIterator last)"

vs

"quicksort :: (Ord a) => [a] -> [a]" (where "Ord a" means that a is an instance of the typeclass of ordered items).

reply

jstimpfle | karma 3971 | avg karma 1.31 · | 2017-12-27 14:00:55+00:00

I've grown to like the plain C version. It has a very modular interface (doesn't require the sorted items to implements a certain protocol, i.e. "operator<"), and doesn't compile new code for each new type. (Often code size is the most important).

I think the worst case I measured is a factor of 2 in std::sort vs qsort on bare integers. On larger types it will be less. That's acceptable to me. Sorting is seldom the bottleneck (it's only important to do sort in many situations).

reply

comex | karma 19146 | avg karma 4.18 · | 2014-12-23 07:26:13+00:00

Bjarne should know better than to directly compare the performance of std::sort and qsort. One is typically printed in full in a header file, while the other is typically compiled separately. If qsort were found in a header file, it could be inlined into identical code to the C++ version, regardless of the fact that there are void pointers lying around everywhere.

There are caveats: the compiler might not choose to inline unless you force it to, and if you do that then you'll end up with duplicate code in the case of multiple calls with the same comparison function, while C++ can automagically merge duplicates (although you probably want to write a wrapper function anyway, and C++ will still waste compile time generating the duplicates if the calls are in different source files). Also, if the sorting function calls a secondary function in multiple textual locations, and that function is significant enough that inlining it would produce wasteful code, the pure inlining-based approach will be insufficient (but I don't think most sorting algorithms do this).

In other words, C++ makes it easier to do this sort of thing. No surprise! It certainly makes it prettier. But when it comes to performance, in practice the above would likely not be a big deal for qsort, so the difference between the two functions is really more a matter of convention regarding the implementation location. Benchmarking the two and explaining only that type safety "makes for excellent inlining and good optimizations" is simply misleading.

reply

tombert | karma 15742 | avg karma 4.07 · | 2019-10-28 14:34:39+00:00

I'm genuinely curious; how often do you actually end up writing sorts in Haskell? I just use the built-in sort function, which is fast enough.

Maybe your use-case is different than mine, but I typically use Haskell (or another functional language) for network applications, which typically don't benefit from tight-loops since the network is almost always the bottleneck anyway. For these kinds of applications, having proper concurrency and IO handling matters a lot more than worrying about whether your loops result in a cache-miss.

reply

akho | karma 620 | avg karma 1.47 · | 2021-08-28 08:11:07

The C code implement quicksort, the Haskell code does not. It has different (worse) space characteristics. Proper quicksort in Haskell is not a two-liner.

SilasX | karma 17521 | avg karma 2.12 · | 2014-03-27 20:34:38+00:00

Your criticisms are all from the perspective of "it keeps me from telling it how to implement the functionality", which is true, but not what Haskell's design (or pure FP language design in general) optimizes for.

Sometimes, you really do want to make commands to the bare metal regarding exactly how you want the program implemented. For general use, however, you don't: compilers are (at least these days) a lot better at finding optimizations than the typical user.

It comes down to abstraction levels: pick the one that matches the problem you're working on. The FP philosophy is "speaking about specific memory cells is way too low" in most cases, like telling it you want a sorted list.

In fact, it would probably be even better (from that selfsame philosophy) to define a sort in terms of the conditions the final list would meet (as opposed to nested, recursing conditions like in the example). Something like, "return a list where every element is <= the one after it, and where every element is also in the original list, as many times as it occurs there".

If you want to see the examples of quicksort in Haskell to enforce in-place and other requirements, see this discussion here, with links:

http://www.haskell.org/haskellwiki/Introduction/Direct_Trans...

tl;dr: Yeah, Haskell is worse at letting you force the sort to be in-place, but that can be a very good thing.

reply

exDM69 | karma 10126 | avg karma 3.75 · | 2014-04-22 07:19:10

> Write a sorting function and use it for 5 types -> you get the same sort in assembler 5 times with differences that relate to the types they use, alongside with 5 massive symbol names (C++ template symbols tend to be really large).

To be fair, this is also why the performance of C++ std::sort beats C qsort hands down. Calling the comparator function with a function pointer is expensive compared to doing an integer comparison. Additionally, the boundary between translation units (ie. object files) inhibits a lot of optimization that could otherwise take place (and does in the C++ case, when the function calls are inlined).

There are also techniques to avoid C++ template bloat, such as type erasure. Although this makes the template code only a thin shim that provides type safety.

reply

millstone | karma 6895 | avg karma 3.52 · | 2014-05-10 06:57:56+00:00

All true. On the other hand, std::sort in C++ works on RandomAccessIterator, which means that it can sort vectors, C arrays, deques, strings, your custom container type, etc. The Haskell implementation can only sort lists. Furthermore, the Haskell type signature fails to express that the input and output have the same size. But std::sort necessarily preserves its size, because you pass it a range, and a range cannot access its underlying container to change its size (a limitation leading to bizarre APIs like std::unique).

ajax77 | karma 30 | avg karma 2.14 · | 2012-03-26 17:13:56+00:00

Whether or not you are a C++ fan, templatized algorithms ARE often faster than their C-based, generic counterparts. Any decent C++ compiler should be able to inline generic code while C often relies on function pointers. I'm not saying this is always the case, but often. std::sort beating qsort is a typical argument, though I'm not sure the two algorithms are equivalent excepting language-specific constructs (http://stackoverflow.com/questions/4708105/performance-of-qs...)

As for size, you're absolutely right, though for many scenarios that appears to be less and less a pressing concern. Perhaps the biggest drawback is compilation time.

reply