Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

Even with perfect cache locality, a linked list will be significantly slower to traverse than an array because of worse pipelining and parallelization. Partially unrolled lists can help, but at the cost of additional complexity.

Of course not all container accesses require traversal.



sort by: page size:

You're misunderstanding the advice about contiguous. It's not that it's more likely to stay in cache, but if you're accessing data sequentially it's more likely the data you're going to access next is already in cache.

Most (all I've read/looked at) benchmarks in Java have data structures backed by linked lists utterly smashed by things implemented by an array.

There was in the last year or two a good c++ talk where a game or game engine changed their storage to be more array like and got roughly a 30% boost in performance. Memory locality is usually king, which is why linked lists are rarely used.


Don't forget that good cache locality also can cause data being pulled into cache that the prefetcher did know nothing about.

I can create you a shitty linked list that fits perfectly in L3, but still has terrible cold cache performance because each individual cacheline has to be pulled in one by one.


Wait why do linked lists have bad cache locality? It depends on how your heap is set up. For example, you could have an allocator that gives relatively good locality by having high granularity and only bailing out if say your LL gets super long (so if your lists are usually short, they could have awesome locality)

Cache means inserts into arrays are often faster than linked lists. Remember RAM is really really big and slow.

You are right and this is will know, but not only for the reason you state. Cache locality is so much better on an array/vector compared to linked list and that is really important.

I think it's also worth noting that this is specific to (common) CPU architecture where cache is sufficiently big and orders of magnitude faster than main memory, and where main memory reads and writes have similar cost.

The tradeoff in moving memory at the end of the array vs scanning to the point of linked list insertion could be different on some existing and some hypothetical systems where these concepts may be relevant. For example, storage write cycle minimization could be a priority in rare cases.

That said this is an excellent point about the majority of use cases. I hadn't considered the dominance of the time it takes to locate the element to modify. I've never had a reason to benchmark a linked list since even a basic idea of the effect of cache/prefetch precludes its use in every use case I've actually come across.

Probably useful for CS education to start comparing linked lists to bubble sort. There's always a better algorithm, but here's a simple one to learn so you understand how to compare data structure performance.


The question had 2 goals:

1) Do you think about cache at all or is it just something you heard mentioned as important that one time?

2) It's a good lead-in to discussing the effects of cache in algorithms. How that conversation goes helps me to understand how that person thinks and discusses complex problems.

A good answer would be "I'm not sure, but probably way, way slower because linked list can point all over memory but arrays cache really well."

An excellent, A+ answer would be "In the best case it might not be too much slower if you have an intrusive linked list is arranged sequentially in memory like an array like onekorg explained. But, in practice most will be 20-200x slower because they are usually implemented as pointers to nodes containing pointers to data and each piece is allocated piecemeal in a already fragmented heap. Uncached memory reads can take 100+ cycles and summing an int is not enough work to hide that even if the CPU speculatively prefetches."

I mainly expect a surprised reaction that they could be so slow and looked forward to the follow-up discussion.


Even this doesn't give you optimal locality. Objects in the same pool are closer than if they were randomly fragmented from a global allocator, but if you're accessing them in the list order, you're not accessing adjacent addresses to take advantage of memory prefetch. If the pool is large, it may not even fit in the cache.

When performance needs to be maximized, games switch to entity component systems and switch from arrays-of-structs to structs-of-arrays. This enables processing all objects as a vector, linearly from start to end, and often without needing to fetch any irrelevant bytes that aren't processed in a given pass. This sometimes also helps utilize SIMD for data spanning more than one entity, which you can't do when using linked lists.


You use them all the time in Haskell and OCaml. Cache locality isn't such an issue. You're not mallocing linked list nodes. You allocate by a pointer bump of the minor heap, and if the GC copies your list into the major heap, it's going to copy the list elements together.

You also use recursion all the time, and no, recursion is not generally straightforwardly optimised into iteration unless you're doing trivial tail recursion.


Data locality is a big reason. Sure, it's O(N) writes, but those writes are neatly lined up in such a way that the likelihood of a cache miss is low. As opposed to a linked list, where just traversing to the middle may entail multiple round trips to main memory.

Pre-fetchers rely on two metrics to determine what to cache - temporal locality, meaning it will cache things that it judges to be frequently accessed, and spatial locality, meaning that when you access something at memory address X, the cache will fill one cache line with the contents of consecutive memory locations i.e X,X+1,X+2,..,X+s.

Hence, they have no capability to understand a structure like a linked list and will not cache it intelligently. All you can rely on is that they will cache arrays and traverse them in a way that efficiently uses the cache.


You also need to load root. If you do not count that, then in the linked list case there are no dereferences (root is litterally the pinter to the allocated block).

The stack case will touch three cachelines (root, nodeptrs and the final block). The linked list only two (there is no nodeptrs).

TBH it would be very hard to design a nonncompletely artificial benchmark were une desig make a difference compared to the other. Only way is test on an actual application and check what fits best.


I wonder if it is worth it to create a hybrid data structure: A linked list of arrays. Say you'd tune it so each array fits in the L1 cache. You don't need a huge chunk of contiguous memory, but still gain most of the performance gains of arrays. Iteration gets a lot more complicated, of course.

That's true. You still have the heap overhead, though. If the heap allocates an int (for the block size) just before the block it returns, then if your linked list contains ints, the heap memory overhead is 50%. This means you get half as many entries in a cache line, even if they were allocated in order.

If the array fits in the cache, deleting from the middle of an array and copying all the subsequent elements one to the left is actually faster than a linked list.

And most arrays fit in the cache.


It has nothing to do with pointers, but memory access patterns can have a huge performance impact. Iterate over a large array, and do the same over a large linked list. The latter cannot make use of the CPU cache in any meaningful way, for example.

While this is correct about the theoretical effects of small N, an equally important factor in this day and age would be the cache. If that lovely little linked list causes O(N) cache misses through Java-style pointer-chasing... you are suddenly looking a a very slow hashmap.

Cache effects tend to mean that open addressing works out faster/more predictable in real world conditions.


Op arrays have the advantage of being in the i-cache, but needing an dispatcher with indirect calls.

Op linked lists are bad for the icache, but use all direct calls, which are much better cachable. The best is when you sort the list to be adjacent. Then you have the best of both.


How does a dequeue improve the situation with cache locality versus a linked list?
next

Legal | privacy