Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

All this article has shown is that std::sort is much more optimal than std::unordered_map on the C++ standard library used.

Every std::unordered_map I've used is surprisingly slow. Normally hash-tables are slow because it's so easy to write a slow hash-table, but std::unordered_map is slow for other reasons that I have had explained to me and then quickly forgot :(.

I also find it strange that unordered_map is used rather than unordered_set. Not sure if it would make a performance difference, but if we are going for naive implementations, that should be at the top of the list.



view as:

It'd be interesting to see how std::map (implemented with a red-black tree) would do on these same benchmarks. A high ratio of inserts to lookups could favor the tree-based map.

A tree map is O(n*log n) on inserts, though.

For a hash map to be faster, you'd need to trade memory for speed, i.e have significantly more buckets than the expected size. This may however collide with good caching performance which I suspect is what the author observes.


It’s hard to get slower than std::map and std::set. They combine logarithmic algorithms with nonlocal memory access.

One reason why unordered map is slow is that it has to be a chained hash map due to the spec not allowing iterator invalidation when resizing.

that's certainly annoying...

I believe unordered_map does invalidate integrators on resize, but does not invalidate references to keys/data, which means the data can’t be stored inline with the hash table structure.

This.

For the OP’s use case, replacing std::unordered_set<T> with CAtlMap<T,bool> improves the hashtable method performance by a factor of ~4, consistent over different input sizes.


Agreed on use of std::unordered_map, but I appreciated that the depth this went to was a simple comparison between the two obvious STL approaches. I think you could probably speed things up tremendously via micro-optimization, but it’s arguably more useful to know which of these two approaches to use when writing the code for the first time than how to optimize it to the teeth.

The post also includes some results for Java - check the analysis notebook linked at the end.

Java's HashMap doesn't have the oddities unique to C++'s unordered_map, but it has broadly similar performance.


Legal | privacy