Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

Even in C it’s not that hard to make a basic hash table. It’s probably like 20 lines of code or something.


sort by: page size:

Hence why I wrote "For an ultra simple hash table". I think that for C that doesn't natively have such a data structure, a couple of lines to introduce such a structure really isn't too bad.

Ehh, hash tables were invented in the 50s and were and are used wherever they are useful. I’m pretty sure a running joke is that every decently sized C program contains a half dozen hash table implementations. They’re not recent.

Maybe the should be. I know C is all about staying minimal, but native hash tables have proven themselves to be extremely useful in every other language where they are present.

Agreed this seems like major overkill, but then again so does writing your own hash table from scratch in C. :shrug:

I worked on a C project where we counted 14 different hash table implementations done for different key/value data types strewn through the code.

C noob here. Why isn't a hash table implementation merged into the c standard library? Is it because the stdlib has to be as thin as possible for some performance reason or something?

And depending on your use case, you may not need to implement growing the hash table, or deletions, making it even simpler/less daunting to implement in C (since this avoids a lot of the memory management parts of the job).

"Is that really the gatekeeper we want?" Yes, I think so? I mean implementing a hash table in C is really easy. What is there to "forget"? If you can't do it, either you don't know what a hash table is, or you don't know how to program in C.

    enum { table_size = 1024 };
    typedef struct item { int k, v, full; } item;
    static item table[table_size];

    void put(item kv)
    {
        size_t orig = kv.k % table_size;  // dumbest possible hash function
        size_t b = (orig + 1) % table_size;
        // use linear probing for collisions
        while (table[b].full && b != orig) b = (b + 1) % table_size;
        if (b == orig) abort();  // table full
        table[b] = kv;
        table[b].full = 1;
    }

    item *get(int k)
    {
        size_t orig = k % table_size;
        size_t b = (orig + 1) % table_size;
        while (table[b].full && table[b].k != k && b != orig) {
            b = (b + 1) % table_size;
        }
        return (table[b].full && table[b].k == k) ? &table[b] : NULL;
    }
I haven't tested that (just as I wouldn't in a whiteboard interview) so I don't know if it works. I did briefly skim the article we're talking about, but I concluded I didn't have anything to learn from it, so I didn't read it. I found a bunch of bugs in my implementation as I was writing it. And of course it's a pretty dumb hash table: linear probing is very suboptimal, it uses a statically-allocated non-resizable hash table, it's specialized for a certain type of keys, and so on.

But are you saying you can't even do that? Probably it would be bad to hire you for a C programming job, then, unless it was an entry-level position for you to learn C in.

?

Evidently the above comment took me 15 minutes to write. Oh, and now I see another bug or at least lacuna: if you try to update the value of an existing key, it doesn't fail, but it also doesn't update, instead adding a new entry that will never be read. And then I did write a minimal smoke test and run it, and unsurprisingly, it does work:

    int main()
    {
      put((item) { 4, 7 });
      put((item) { 1028, 9 });
      put((item) { 3, 25 });
      printf("%d %d %d %p %p\n", get(4)->v, get(1028)->v, get(3)->v, get(5), get(3));
      return 0;
    }
Writing the test and the above additional commentary, and running the test, took another 8 minutes.

Oh, and there's another bug where it will loop infinitely when the hash table is full and the key is negative. (Actually it invokes UB but typically the result will just be an infinite loop.)

If it takes you 60 or 100 minutes to write this, or to write something better, maybe you still know C and know what hash tables are. But if you throw up your hands and say "I don't remember, it's been a long time since I was in school"? Either you don't know C, or you don't know what a hash table is, which probably means you've never written anything in C but toy programs. (This is a toy program, of course, but many toy programs in C don't need hash tables.)

The compilable and fixed toy program in question is at http://canonical.org/~kragen/sw/dev3/dumbhash.c if you want to probe it for bugs.

It occurs to me that you might have been talking about something that isn't a C programming job. For example, we might be talking about a job programming an HTML database front-end in Python using Django. And of course it would be silly to reject someone for a job like that because they didn't know C, unless they claimed to know C on their résumé. And it's totally reasonable for a Python programmer, or a Python/Perl/JS/bash programmer, to not understand hash tables.


Hash tables in C are not part of the standard. They are part of POSIX, so it is mandated only on UNIX-ish environments. The standard committee has nothing to do with that.

"No, because it's possible to implement efficient hash tables with C's primitives -- in fact that's how hash tables in essentially ALL languages ARE implemented."

Because each architecture has a C compiler that's been highly optimized. Popularity plus money invested. That's it. If you were right, we'd see optimizations coded in C even when alternative, optimizing compilers were available. I got a one-word counter to that interestingly enough from "high-performance computing" field: FORTRAN. Free Pascal people are doing fine in performance and low-level code as well despite little to no investment in them.

Seems throwing money at a turd (eg FORTRAN, C) can get a lot of people's hands on it despite some of us shuddering and saying "Get that pile of crap away from me!"


Well, our disagreement is just a matter of miscommunication then.

I never said anything about hand coding a hash table.

I would like prospective candidates for programming jobs to know that there is a thing in their language of choice that lets them make fast lookup tables.


Are you suggesting there aren't mature implementations of hash maps in C you can easily vendor?

This post focuses almost exclusively on speed of the resulting program. That's a mistake. Using hash tables is a good thing because they are usually simpler than an ad hoc multi-level structure made just of pointers and arrays.

I'm not sure I agree with you. Writing a hash map implementation in C made me a much better programmer as far as performance goes because I understand the machine better.

I’ve spent days making hash tables in C.

The module operator is really slow. But a bit shifting operator is much faster. Having hash tables that grow in sizes of two, or more precisely where you can just shave off some bits of a random number to get a position will yield the best performance.

In JS, hash tables need to be growable Incase they get dense. In that instance you don’t want to compute object hash again since it’s expensive. You just want a cheap, here’s the new position.


I know you're exaggerating here a bit, but come on, a hash table not able to do lookups in micro-seconds is just garbage. I expect better of any language, not just C++.

Well, Hash tables are much more complicated. I think the closest analog is going to be a jump table, just with fancy math.

I feel like the hardest part of Assembly is all of the hex math required...


I was surprised recently when looking at different hash tables that have been implemented in C to discover that the standard library includes its own hash table. They are even part of POSIX. There is a reason you have never heard of it, or if you have you have never used it. In true POSIX fashion they are close to useless. The implementation doesn't allow you to modify the table after it has been created, you have to pass in all the data when you create the table. There is no add or delete and you can't change any value you pull from the table. It also stores the data in a static area local to the function so you can only use a single table in a program at a time. It doggedly commits every C stdlib sin possible.

That is picking terms. Many languages don't allow to specify custom hash functions, so you can't specify the hash functions until you do write a custom hash table. A fair benchmark would allow for this, if it is allowed in C.
next

Legal | privacy