when you have string values, it's even worse, as you describe. But it's not clear that an overly-clever implementation, which caches numeric ranks of strings etc., is a good idea to have.
Regarding speed of the C++ version: The fundamental mistake is to use std::unordered_map. The standard library map and set implementations (both ordered -> tree-based and unordered -> hash-based) are almost never the correct choice if you care about performance.
That's definitely a flaw in the language, but like with C you would just use a different implementation for this purpose.
std::unordered_map is a bit of a red-headed stepchild in the C++ world. I'm not surprised that it hasn't had nearly the amount of tuning that hashmaps have had in other languages.
When i asked some C++ers about it, they warned me off using it, for reasons i didn't fully understand, but i got the impression that there are structural reasons why it can never be really fast, so anyone who needs a really fast hashmap uses some non-standard one anyway.
std::unordered_map is forced to be slow and make lots of allocations due to iterator invalidation requirements. It's usually a good choice to avoid it (and use a different hash table implementation) if you care about performance.
Looking at the code more carefully, I'd actually bet its string that's slow, and not actually map.
unordered_map is actually known to have relatively bad performance on GCC and some other open-source implementations. I don't know if there's a spec-problem that favors a slower-hash table (rather than the modern open-addressing scheme, which is faster on modern computers), but I can believe unordered_map has an issue on common standard libraries today.
Three, then two frees() as the previous two strings are deallocated.
> ret[string("id.")+to_string(i%500000)]
Four. The ret[] operator searches for the string. Since it doesn't exist, it now has to malloc the key and insert the value in there. std::move() might solve that issue and cut this one out easily?
Eight, to place the value in there. Again, probably solved with std::move. +1 more free.
So eight mallocs and 6 frees?
---------
Yeah, C++ strings shoot yourself in the foot performance-wise. But they're easy to use and free of stupid malloc/free bugs, but it means that they will malloc/free a LOT more than you might guess with the untrained eye.
I know that Perl / PHP / etc. etc. were explicitly designed to be more favorable towards string manipulation. And I'd bet that there's some old C++98 stuff that might lock in the performance of string to this older, less performant model.
EDIT: I see you managed to avoid the problem of excessive mallocs/frees in the Rust code with "format". The C++ equivalent might be snprintf, though that dips down into C-style strings.
For instance, the unordered_hash_map in C++ is based around a linked list of key/value pair buckets. This means iterating through the keys or values is very slow (lots of cache misses!), but insertion is fast.
Has this changed recently? Last time I used unordered_map, insertion was also slow because it had to allocate a linked list entry for every item in the hash table.
Well, std::unordered_map is faster than than std::map, but it is still slow, since it uses lists for its buckets, with lots of dynamic allocation. See also:
I rarely need to use maps, but everytime I had to, std::unordered_map was measurably slower than std::map. In particular in debug mode, it was really slow, and if you choose an unfortunate hashing function (I think older MSVC versions had a really bad one), it can easily become a bottleneck. In particular with small data or data that has trivial comparison functions, std::map might often be better and should be the go-to choice.
Not even considering that unordered_map is not the fastest hash map implementation for C++. There is plenty of the flat hash map implementation (swiss tables) around that beats unordered_map by an other factor....
Ordered maps require other tradeoffs. For example, c++ has both: std::map and std::unordered_map. std::map is a treemap, and so has logn access/insertion times. It also does a ton of small allocations, and scatters your data across memory, leading to poor cache performance at scale.
But it has ordered output, and doesn't invalidate iterators on insertion (hashmaps might, because they sometimes need to rehash).
Generally, the suggestion that I've heard is to avoid using std::map unless you really really what it specifically provides, because it's expensive and it's hard to know if you can safely relax those constraints.
The thing is, it’s not actually effortless in many situations to switch from one map to the other. In C++, if you switch to an unordered map, you now have to contend with the annoying boilerplate of implementing a hash function for many key types. Using an ordered map may sometimes give you a slight hit in performance, but it offers significant benefits in developer convenience, and for small maps, the binary tree could actually be faster anyway.
Although std::unordered_map is definitely not nearly as fast as a hash map can be in c++, that's probably not the main problem here. I would bet on iostreams and/or tolower being bigger problems.
Note that tolower is called for every /character/. I don't know everything it does, but I know it's doing some semi-esoteric locale stuff, and in general it's doing a lot more than a naive ascii implementation would. I seem to recall it's even doing a dynamic_cast in there somewhere. For each character of input.
I was curious, so I ran callgrind on it. Here are the top 10 offenders (by Ir, 'instructions retired'). For clarity I have elided many details. Total Ir is 5,647,099,795, so the stuff below accounts for greater than 50% of all Ir.
This has its own problems. Pointer stability limits the internal implementation so much that unordered_map is embarrassingly slow given what we know about modern hashmap design. There is a good reason why the swisstables that Google built dropped this behavior.
2. The C++ standard library's maps and sets are known to be rather slow. See, for example:
https://stackoverflow.com/q/42588264/1593077
when you have string values, it's even worse, as you describe. But it's not clear that an overly-clever implementation, which caches numeric ranks of strings etc., is a good idea to have.
reply