Hacker Read

gjulianm · 2022-01-09 06:27:55

Yes, in the end I had to go to a custom hashmap/linked list combination to get the best performance for our specific use case (basically a flow table with LRU cache behavior and also expiration of flows by time limits).

hyperman1 | karma 6804 | avg karma 4.13 · | 2019-05-03 20:51:29+00:00

I wonder if it is worth it to create a hybrid data structure: A linked list of arrays. Say you'd tune it so each array fits in the L1 cache. You don't need a huge chunk of contiguous memory, but still gain most of the performance gains of arrays. Iteration gets a lot more complicated, of course.

HenryR | karma 602 | avg karma 6.54 · | 2018-10-12 17:48:40+00:00

I don't know of one, but it's such a natural idea that I'd guess it's been studied. There are standard implementations of LRU caches that use e.g. a hash map and a linked list to get both fast lookup and ordering, but for real performance I think you'd want to try and minimise the number of data structures to avoid having competing cache behaviours.

tialaramex | karma 26322 | avg karma 2.85 · | 2021-08-23 14:56:22

Yeah, that's fair. If it's the standard library hashmap, then yes, that's an Abseil Swiss Table (well, not actually, but that's the thing it's reimplementing) and so yes it will have great performance if correctly sized and if it's relatively sparse I'd bet it's faster than the heap allocated Vec idea because so much of it will fit in your L1 cache and the Vec would not (unless you've got a complete beast of a CPU dedicated to this task).

I would add the size hint if your program has any reason to know what hint to give (blind guesses are worse than just calling new) and otherwise forget optimizing it until it's too slow.

reply

nitnelave | karma 415 | avg karma 5.06 · | 2020-06-03 13:50:59+00:00

Author here, I was surprised not to find an LRU cache implementation in common libraries (STL, Boost), so I wanted to make my own.

This library is trying to make it easy to experiment with different backends (different maps, lists and so on). The fastest version is based on abseil's node_hash_set, inspired by Java's LinkedHashMap.

reply

perdunov | karma 116 | avg karma 1.06 · | 2015-02-01 14:26:48

Yes, I agree that this is a peculiar case and it is worth reading and keeping in mind that such things can happen.

I am just saying that I think it would be more correct to consider this as a 9-fold performance increase gained from caching, as no one should rely on caching when dealing with linked lists.

reply

PixelOfDeath | karma 264 | avg karma 3.3 · | 2021-05-02 09:20:23

Don't forget that good cache locality also can cause data being pulled into cache that the prefetcher did know nothing about.

I can create you a shitty linked list that fits perfectly in L3, but still has terrible cold cache performance because each individual cacheline has to be pulled in one by one.

reply

nitnelave | karma 415 | avg karma 5.06 · | 2020-06-04 15:21:54+00:00

Author here, I was surprised not to find an LRU cache implementation in common libraries (STL, Boost), so I wanted to make my own. This library is trying to make it easy to experiment with different backends (different maps, lists and so on). The fastest version is based on abseil's node_hash_set, inspired by Java's LinkedHashMap.

Const-me | karma 5797 | avg karma 1.88 · | 2019-01-18 22:37:59

No but I can give a few examples where I refactored code from these antique pointer-based trees (inspired by classic CS books, half century old now) towards something more cache friendly, and improved performance by an order of magnitude.

jcranberry | karma 910 | avg karma 1.73 · | 2022-06-06 15:22:14

It seemed like this person was talking about an intrusive linked list with an arena allocator of some sort, which isn't ideal but still fairly cache friendly.

dgb23 | karma 6604 | avg karma 2.75 · | 2021-10-24 04:24:00

I wonder if the wide branching factor of them gives you some cache friendliness.

However, not every use case can benefit from cache optimization and you can use other data structures. It’s not super useful to make generalizations about performance that way.

reply

BrS96bVxXBLzf5B | karma 243 | avg karma 4.26 · | 2021-07-19 19:18:15

Any game engine architect worth her salt would know to not speak so absolutely about cache coherency, and that if you're dealing with a use-case where iteration is massively infrequent but random insertions and removals are likely, you could be better off with the linked list :)

SunnySkies | karma 10 | avg karma 2.5 · | 2017-11-25 02:15:11

You're misunderstanding the advice about contiguous. It's not that it's more likely to stay in cache, but if you're accessing data sequentially it's more likely the data you're going to access next is already in cache.

Most (all I've read/looked at) benchmarks in Java have data structures backed by linked lists utterly smashed by things implemented by an array.

There was in the last year or two a good c++ talk where a game or game engine changed their storage to be more array like and got roughly a 30% boost in performance. Memory locality is usually king, which is why linked lists are rarely used.

reply

voidlogic | karma 3297 | avg karma 2.86 · | 2013-05-22 17:13:39+00:00

If really needed you can have it both ways, you could roll a data structure that is a linked list of arrays. Then you have constant time growth and good caching/prefetching.

Toy example:

  struct ListSegment {
    T[64] items
    int nextItem = 0
    ListSegment* nextSeg, prevSeg
  }

strager | karma 300 | avg karma 1.65 · | 2012-11-28 18:14:56+00:00

I see. Still, your approach leaks memory as it maintains references to nodes which may otherwise be GC'able. Have you considered `WeakMap` for caching (where supported)?

plandis | karma 1951 | avg karma 2.25 · | 2016-07-28 17:29:14+00:00

I've always considered an LRU cache question to be pretty decent. The general idea is fairly obvious (linked list + hash table) and straightforward to get something working and is a good indicator on how someone can synthesize data structures to fit the needs of the problem.

Plus there is a lot of open endedness that makes it easy to drill down into someone's thought process.

reply

m10i | karma 34 | avg karma 0.94 · | 2019-08-07 15:25:29+00:00

I find it interesting that you consider an LRU Cache an 'algorithm' rather than a 'Data Structure'. I was just curious if you could explain that?

Great stuff though!

reply

btilly | karma 52813 | avg karma 4.93 · | 2017-08-15 02:37:25

Good luck. Another benefit of this strategy is that you optimize that data structure using techniques that aren't available in higher languages. So, for instance, small trees can be set up to have all of the nodes of the tree very close together, improving the odds of a cache hit. You can switch from lots of small strings to having integers that index a lookup table of strings for display only.

The amount of work to do this is insane. Expect 10x what it took to write it in a high level language. But the performance can often be made 10-100x as well. Which is a giant payoff.

reply

chrisseaton | karma 36438 | avg karma 2.64 · | 2022-11-04 20:58:06

You're thinking of like MOVNTDQA? Those still have to reach into RAM - that's still double-digits nanoseconds away and still creates a stall in the processor at n complexity!! No thanks? I can allocate a new linked list node with memory guaranteed already entirely in cache (except for TLA failure.)

> be able to pretty much max out the bandwidth available to the CPU core

Why max out a slow link at n complexity when you can just avoid doing that entirely and use a linked list?

reply

logicchains | karma 9077 | avg karma 2.62 · | 2021-05-05 06:42:58+00:00

>You could then use "CacheFriendlyHashMap" class as a drop in

It's impossible to have a generic cache-friendly datastructure because the optimal datastructure depends on access patterns. E.g. if I have a collection of struct{x:int, y:int, z:int, w:int}, and I know the two main usecases are indexing by x, and iterating over the collection performing some opp on (y,z,w), then the ideal datastructure is one where all the x are contiguous in memory ([x1, x2, x3...]) and separately the y,z,w triples are contiguous ([y1,z1,w1,y2,z2,w2,y3,z3,w3...]).

reply