Hacker Read

sgerenser · 2021-03-26 18:11:48

Agreed this seems like major overkill, but then again so does writing your own hash table from scratch in C. :shrug:

zeroonetwothree | karma 4300 | avg karma 2.02 · | 2023-05-05 08:42:06

Even in C it’s not that hard to make a basic hash table. It’s probably like 20 lines of code or something.

bArray | karma 5584 | avg karma 3.1 · | 2022-08-09 17:21:20

Hence why I wrote "For an ultra simple hash table". I think that for C that doesn't natively have such a data structure, a couple of lines to introduce such a structure really isn't too bad.

jandrese | karma 30121 | avg karma 3.36 · | 2021-03-27 06:13:54+00:00

Maybe the should be. I know C is all about staying minimal, but native hash tables have proven themselves to be extremely useful in every other language where they are present.

henrikm85 | karma 153 | avg karma 3.4 · | 2014-01-30 22:04:52+00:00

Op here. I have done a lot of work involving hash tables in main-memory database research and therefore really like applying those optimizations. You are right though that it might be overkill :-)

centimeter | karma 502 | avg karma 0.56 · | 2020-05-31 10:29:23

A marginally faster mutable hash table is not really useful for much. If that 30% speedup or whatever is going to make or break your application, a single hash table was the wrong choice anyway.

If this is the kind of thing that keeps you up at night, you should just be using C.

reply

fred256 | karma 1169 | avg karma 5.93 · | 2017-06-29 23:17:13+00:00

You could go even a step further and ditch the hash table altogether, and just use a 2^32 bit set. “Only” takes up half a gigabyte of memory.

jeremyawon | karma 104 | avg karma 3.06 · | 2009-04-18 23:01:30

you're right, hashing is overkill.

weatherlite | karma 1081 | avg karma 1.11 · | 2022-08-09 11:38:37

C noob here. Why isn't a hash table implementation merged into the c standard library? Is it because the stdlib has to be as thin as possible for some performance reason or something?

gruez | karma 34333 | avg karma 2.33 · | 2024-01-04 07:41:31

That requires way more hashes to be computed though.

jahewson | karma 7178 | avg karma 2.82 · | 2014-01-03 22:48:59+00:00

Could you use an enormously expensive hash function so that building a lookup table is infeasible?

ludocode | karma 961 | avg karma 5.31 · | 2021-03-26 15:47:13+00:00

There are many reasons why it's lame! Here's a short list:

- There is only one global hash table for the whole process! If you want individual hash tables you need to use implementation-specific extensions (hcreate_r().)

- There is no way to remove a key from a hash table. No implementation I know of provides an extension to do it. If you want to truly remove a key you must destroy and rebuild the table.

- There is no requirement for a means to grow the table. On some platforms it's possible to run out of space. If you want to truly grow you must destroy and rebuild the table.

- There is no standard way to free keys or data stored in the table. Destroying the table leaks all contents; you must keep track of keys and data externally and free them yourself. Only OpenBSD frees keys by default, and only NetBSD has extensions to call callbacks to free keys and data.

- Keys must be strings. You cannot provide custom types or void pointers as keys. There is no way to specify custom hash or comparison functions. The quality of the hash function is unspecified.

- When using ENTER, you may not want to allocate the key unless it is necessary to insert a new entry. Since the given key is inserted directly, it's not necessarily safe to use a stack-allocated buffer for lookups. It's awkward to replace the key with an allocation after insertion and it's unclear whether it's allowed.

This doesn't even get into incompatibilities between the various implementations. You will encounter all of the above flaws even if your only target is glibc.

No one should ever be encouraged to use POSIX hash tables. They should be deprecated and we should all just pretend they don't exist.

reply

boardwaalk | karma 1949 | avg karma 3.51 · | 2021-07-01 19:51:29+00:00

Ehh, hash tables were invented in the 50s and were and are used wherever they are useful. I’m pretty sure a running joke is that every decently sized C program contains a half dozen hash table implementations. They’re not recent.

lacker | karma 14091 | avg karma 4.94 · | 2008-11-20 17:40:58+00:00

This post focuses almost exclusively on speed of the resulting program. That's a mistake. Using hash tables is a good thing because they are usually simpler than an ad hoc multi-level structure made just of pointers and arrays.

eatonphil | karma 21581 | avg karma 5.52 · | 2022-05-04 07:09:25

Are you suggesting there aren't mature implementations of hash maps in C you can easily vendor?

marginalia_nu | karma 21123 | avg karma 4.08 · | 2021-10-16 14:04:53

Dunno, I've had use from knowing how to implement hash tables several times just in the last few months. Extremely useful stuff if you are working with memory mapped data that is larger than system memory.

jandrese | karma 30121 | avg karma 3.36 · | 2023-06-13 16:22:42

I was surprised recently when looking at different hash tables that have been implemented in C to discover that the standard library includes its own hash table. They are even part of POSIX. There is a reason you have never heard of it, or if you have you have never used it. In true POSIX fashion they are close to useless. The implementation doesn't allow you to modify the table after it has been created, you have to pass in all the data when you create the table. There is no add or delete and you can't change any value you pull from the table. It also stores the data in a static area local to the function so you can only use a single table in a program at a time. It doggedly commits every C stdlib sin possible.

coliveira | karma 8490 | avg karma 2.38 · | 2021-03-27 01:06:24+00:00

Hash tables in C are not part of the standard. They are part of POSIX, so it is mandated only on UNIX-ish environments. The standard committee has nothing to do with that.

londons_explore | karma 35497 | avg karma 2.72 · | 2021-06-03 13:09:35+00:00

> Using a hash table for at most 3 entries is astronomical overkill!

I disagree. Simplicity of the code is most vital to avoid bugs, and if a hash table is what the language typically uses to represent key value mappings, that is what should be used, for 3 or 3 million items.

Keep the code idiomatic, and only optimize where necessary.

A good implementation of a hash table has a special fast path for tiny runtime sizes.

A good compiler could bound the size of the hash table, and switch out the implementation if its size can be bounded at compile time.

Sure, in this case, it didn't work so well. But the principle of idiomatic code still stands.

reply

selimnairb | karma 2055 | avg karma 2.79 · | 2021-03-26 14:15:43

And depending on your use case, you may not need to implement growing the hash table, or deletions, making it even simpler/less daunting to implement in C (since this avoids a lot of the memory management parts of the job).