Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

Yes, in the end I had to go to a custom hashmap/linked list combination to get the best performance for our specific use case (basically a flow table with LRU cache behavior and also expiration of flows by time limits).


sort by: page size:

I wonder if it is worth it to create a hybrid data structure: A linked list of arrays. Say you'd tune it so each array fits in the L1 cache. You don't need a huge chunk of contiguous memory, but still gain most of the performance gains of arrays. Iteration gets a lot more complicated, of course.

I don't know of one, but it's such a natural idea that I'd guess it's been studied. There are standard implementations of LRU caches that use e.g. a hash map and a linked list to get both fast lookup and ordering, but for real performance I think you'd want to try and minimise the number of data structures to avoid having competing cache behaviours.

Yeah, that's fair. If it's the standard library hashmap, then yes, that's an Abseil Swiss Table (well, not actually, but that's the thing it's reimplementing) and so yes it will have great performance if correctly sized and if it's relatively sparse I'd bet it's faster than the heap allocated Vec idea because so much of it will fit in your L1 cache and the Vec would not (unless you've got a complete beast of a CPU dedicated to this task).

I would add the size hint if your program has any reason to know what hint to give (blind guesses are worse than just calling new) and otherwise forget optimizing it until it's too slow.


Author here, I was surprised not to find an LRU cache implementation in common libraries (STL, Boost), so I wanted to make my own.

This library is trying to make it easy to experiment with different backends (different maps, lists and so on). The fastest version is based on abseil's node_hash_set, inspired by Java's LinkedHashMap.


Yes, I agree that this is a peculiar case and it is worth reading and keeping in mind that such things can happen.

I am just saying that I think it would be more correct to consider this as a 9-fold performance increase gained from caching, as no one should rely on caching when dealing with linked lists.


Don't forget that good cache locality also can cause data being pulled into cache that the prefetcher did know nothing about.

I can create you a shitty linked list that fits perfectly in L3, but still has terrible cold cache performance because each individual cacheline has to be pulled in one by one.


Author here, I was surprised not to find an LRU cache implementation in common libraries (STL, Boost), so I wanted to make my own. This library is trying to make it easy to experiment with different backends (different maps, lists and so on). The fastest version is based on abseil's node_hash_set, inspired by Java's LinkedHashMap.

No but I can give a few examples where I refactored code from these antique pointer-based trees (inspired by classic CS books, half century old now) towards something more cache friendly, and improved performance by an order of magnitude.

It seemed like this person was talking about an intrusive linked list with an arena allocator of some sort, which isn't ideal but still fairly cache friendly.

I wonder if the wide branching factor of them gives you some cache friendliness.

However, not every use case can benefit from cache optimization and you can use other data structures. It’s not super useful to make generalizations about performance that way.


Any game engine architect worth her salt would know to not speak so absolutely about cache coherency, and that if you're dealing with a use-case where iteration is massively infrequent but random insertions and removals are likely, you could be better off with the linked list :)

You're misunderstanding the advice about contiguous. It's not that it's more likely to stay in cache, but if you're accessing data sequentially it's more likely the data you're going to access next is already in cache.

Most (all I've read/looked at) benchmarks in Java have data structures backed by linked lists utterly smashed by things implemented by an array.

There was in the last year or two a good c++ talk where a game or game engine changed their storage to be more array like and got roughly a 30% boost in performance. Memory locality is usually king, which is why linked lists are rarely used.


If really needed you can have it both ways, you could roll a data structure that is a linked list of arrays. Then you have constant time growth and good caching/prefetching.

Toy example:

  struct ListSegment {
    T[64] items
    int nextItem = 0
    ListSegment* nextSeg, prevSeg
  }

I see. Still, your approach leaks memory as it maintains references to nodes which may otherwise be GC'able. Have you considered `WeakMap` for caching (where supported)?

I've always considered an LRU cache question to be pretty decent. The general idea is fairly obvious (linked list + hash table) and straightforward to get something working and is a good indicator on how someone can synthesize data structures to fit the needs of the problem.

Plus there is a lot of open endedness that makes it easy to drill down into someone's thought process.


I find it interesting that you consider an LRU Cache an 'algorithm' rather than a 'Data Structure'. I was just curious if you could explain that?

Great stuff though!


Good luck. Another benefit of this strategy is that you optimize that data structure using techniques that aren't available in higher languages. So, for instance, small trees can be set up to have all of the nodes of the tree very close together, improving the odds of a cache hit. You can switch from lots of small strings to having integers that index a lookup table of strings for display only.

The amount of work to do this is insane. Expect 10x what it took to write it in a high level language. But the performance can often be made 10-100x as well. Which is a giant payoff.


You're thinking of like MOVNTDQA? Those still have to reach into RAM - that's still double-digits nanoseconds away and still creates a stall in the processor at n complexity!! No thanks? I can allocate a new linked list node with memory guaranteed already entirely in cache (except for TLA failure.)

> be able to pretty much max out the bandwidth available to the CPU core

Why max out a slow link at n complexity when you can just avoid doing that entirely and use a linked list?


>You could then use "CacheFriendlyHashMap" class as a drop in

It's impossible to have a generic cache-friendly datastructure because the optimal datastructure depends on access patterns. E.g. if I have a collection of struct{x:int, y:int, z:int, w:int}, and I know the two main usecases are indexing by x, and iterating over the collection performing some opp on (y,z,w), then the ideal datastructure is one where all the x are contiguous in memory ([x1, x2, x3...]) and separately the y,z,w triples are contiguous ([y1,z1,w1,y2,z2,w2,y3,z3,w3...]).

next

Legal | privacy