Hence why I wrote "For an ultra simple hash table". I think that for C that doesn't natively have such a data structure, a couple of lines to introduce such a structure really isn't too bad.
Maybe the should be. I know C is all about staying minimal, but native hash tables have proven themselves to be extremely useful in every other language where they are present.
Op here. I have done a lot of work involving hash tables in main-memory database research and therefore really like applying those optimizations. You are right though that it might be overkill :-)
A marginally faster mutable hash table is not really useful for much. If that 30% speedup or whatever is going to make or break your application, a single hash table was the wrong choice anyway.
If this is the kind of thing that keeps you up at night, you should just be using C.
C noob here.
Why isn't a hash table implementation merged into the c standard library? Is it because the stdlib has to be as thin as possible for some performance reason or something?
There are many reasons why it's lame! Here's a short list:
- There is only one global hash table for the whole process! If you want individual hash tables you need to use implementation-specific extensions (hcreate_r().)
- There is no way to remove a key from a hash table. No implementation I know of provides an extension to do it. If you want to truly remove a key you must destroy and rebuild the table.
- There is no requirement for a means to grow the table. On some platforms it's possible to run out of space. If you want to truly grow you must destroy and rebuild the table.
- There is no standard way to free keys or data stored in the table. Destroying the table leaks all contents; you must keep track of keys and data externally and free them yourself. Only OpenBSD frees keys by default, and only NetBSD has extensions to call callbacks to free keys and data.
- Keys must be strings. You cannot provide custom types or void pointers as keys. There is no way to specify custom hash or comparison functions. The quality of the hash function is unspecified.
- When using ENTER, you may not want to allocate the key unless it is necessary to insert a new entry. Since the given key is inserted directly, it's not necessarily safe to use a stack-allocated buffer for lookups. It's awkward to replace the key with an allocation after insertion and it's unclear whether it's allowed.
This doesn't even get into incompatibilities between the various implementations. You will encounter all of the above flaws even if your only target is glibc.
No one should ever be encouraged to use POSIX hash tables. They should be deprecated and we should all just pretend they don't exist.
Ehh, hash tables were invented in the 50s and were and are used wherever they are useful. I’m pretty sure a running joke is that every decently sized C program contains a half dozen hash table implementations. They’re not recent.
This post focuses almost exclusively on speed of the resulting program. That's a mistake. Using hash tables is a good thing because they are usually simpler than an ad hoc multi-level structure made just of pointers and arrays.
Dunno, I've had use from knowing how to implement hash tables several times just in the last few months. Extremely useful stuff if you are working with memory mapped data that is larger than system memory.
I was surprised recently when looking at different hash tables that have been implemented in C to discover that the standard library includes its own hash table. They are even part of POSIX. There is a reason you have never heard of it, or if you have you have never used it. In true POSIX fashion they are close to useless. The implementation doesn't allow you to modify the table after it has been created, you have to pass in all the data when you create the table. There is no add or delete and you can't change any value you pull from the table. It also stores the data in a static area local to the function so you can only use a single table in a program at a time. It doggedly commits every C stdlib sin possible.
Hash tables in C are not part of the standard. They are part of POSIX, so it is mandated only on UNIX-ish environments. The standard committee has nothing to do with that.
> Using a hash table for at most 3 entries is astronomical overkill!
I disagree. Simplicity of the code is most vital to avoid bugs, and if a hash table is what the language typically uses to represent key value mappings, that is what should be used, for 3 or 3 million items.
Keep the code idiomatic, and only optimize where necessary.
A good implementation of a hash table has a special fast path for tiny runtime sizes.
A good compiler could bound the size of the hash table, and switch out the implementation if its size can be bounded at compile time.
Sure, in this case, it didn't work so well. But the principle of idiomatic code still stands.
And depending on your use case, you may not need to implement growing the hash table, or deletions, making it even simpler/less daunting to implement in C (since this avoids a lot of the memory management parts of the job).
reply