Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

If you have a long vector, say, 1000 elements, and you need to copy 500 of them to shift them over for an insert, you are going to be very slow compared to a linked list. Besides, linked lists aren't that bad if the nodes are close enough in memory (read: consecutive), or they are hot enough to be in already in cache. But as always, measure instead of guessing.


view as:

> If you have a long vector, say, 1000 elements, and you need to copy 500 of them to shift them over for an insert, you are going to be very slow compared to a linked list.

FWIW the original suggestion mentioned tombstoning so you'd have a Vec<Option<T>> instead of a `Vec<T>`, and "removing" an item would just set the `Option` to `None`.

Of course this would increase iteration cost (as you'd have to skip the `None`), and you'd probably want to compact the vec once in a while.


I probably shouldn't even be in this thread but isn't that creeping back toward garbage collection?

Sure, it's just a mark and sweep, but manually implemented (which is fine).

It’s completely normal to have this sort of things in low-level code bases. Both C++ and Rust have refcounting smart pointers in their standard libraries, and bespoke refcounting schemes are regular occurrences in C codebases (the Linux kernel has tons of refcounting, so much so that it has a utility file for reference counters: https://github.com/torvalds/linux/blob/master/include/linux/...)

thanks for the response! I'm a big fan of doubly linked lists, so it was interesting to see that there was extra work to get them going. I'm certainly going to read up on this "ownership" thing. I get the impression that its there for really good reasons, and I'm not going to argue with experts who are protecting users.

I have a question for you, would you say that the increase of difficulties with doubly linked lists are incidental in nature, or is there something inherent with doubly linked lists that impact security? Like I said, I like the doubly linked lists for certain things, and the more I know about the impact of using them the better!

I certainly will be reading up on this!


Or a Vec<T> for storage and a Vec<usize> for tombstoning. Lots of ways to solve this, and my experience is that you can beat a linked list approach with the 'Vec plus Vec' approach for a lot of data sizes/ operations.

Or a custom Vec that has a lower maximum bound, at which point you can start doing things like integer encoding/ pointer swizzling.


> If you have a long vector, say, 1000 elements, and you need to copy 500 of them to shift them over for an insert, you are going to be very slow compared to a linked list.

It takes almost no time at all to do that, depending on the size of the elements. But you have a starting assumption there that the vector must always be stored sorted. Challenge that assumption. If insertion is perfectly balanced with traversal it may be true, but that's also rarely the case. It's fairly trivial to instead track if the vector is already sorted or not & sort when sorted access is actually needed.

> Besides, linked lists aren't that bad if the nodes are close enough in memory (read: consecutive)

Which doesn't really happen outside of a controlled benchmark. If the linked list is stored in a single consecutive allocation, then resizing has observable side effects (eg, pointers to elements become invalidated). You can do things like allocate chunks at a time so that some elements are consecutive, but you won't have all elements consecutive or close in memory.


>> you need to copy 500 of them

> It takes almost no time at all to do that,

Uh, 500 loads and dependent stores, at minimum, in the best case hitting L1 cache, which is 3 cycles. So you are talking about 1500 cycles to....avoid a single cache line miss due to chasing a linked list pointer? A miss to main memory is 100-200 cycles, maximum. More likely you are going to L2/L1 which is 12-50 cycles. So no, in no circumstances would I expect copying 500 elements to be faster than chasing a single pointer.

> Challenge that assumption.

You don't get to pick the application behavior. If it needs to a do a lot of inserts and traversals, it just does.

>> are close enough in memory (read: consecutive)

> Which doesn't really happen outside of a controlled benchmark.

There is no supporting information for this statement at all. Of course a list created all at once using a bump-pointer allocator is going to be consecutive. And a (moving) garbage collector is going to dynamically reorganize the nodes of a list, generally in a breadth-first way, depending on the marking algorithm. That can result in the nodes of the list being laid out consecutively after compacting, no matter where they were originally organized.

Memory behavior of programs is complicated. I find it surprising that we're having a conversation that we can't or shouldn't use linked lists now because vectors are universally better, or we can lazy sort or some other crazy workaround. I'm not sure what motivates this whole line of argumentation. Sometimes lists are just the best damn thing.


> 500 loads and dependent stores, at minimum, in the best case hitting L1 cache, which is 3 cycles. So you are talking about 1500 cycles

This is absolutely not how modern CPUs work, and I think you misunderstood where the dependencies are. All the load/store pairs are independent from each other, which means they can be executed in parallel. Which means that this code is throughput limited. Modern CPUs tend to have at least 2 load/store ports, so we're talking a throughout of one copy per cycle, or 500 cycles for the entire operation (plus warm-up time).

Furthermore, this is a pure memmove in many cases, which means a real memmove implementation that has been optimised using vector instructions can be used. Now you're talking about moving 32 bytes per cycle, or 4 array entries if they're pointer-sized, which brings us down to 125 cycles plus warm-up. Which is on the order of a miss to memory...


Great, now we're talking! I realized in the background that yes, there aren't dependencies between the copies...if the regions of memory don't overlap. But they do overlap, especially if you are just moving them over one element, so the simple analysis doesn't hold. But you're right that's it's just a memmove, so there is an optimized vector implementation (which, incidentally, is probably going to have relatively large setup costs for 500 elements).

And oops, now your vectors are 1 million entries.

I appreciate the discussion. It's a bit of a rathole for something that you shouldn't be optimizing if you can completely avoid it by using the right data structure for your needs.


> And oops, now your vectors are 1 million entries.

As soon as you traverse that linked list of 1 million entries, oops.

In fact, here, I put together a quick benchmark. It inserts in the middle of the list and then traverses it (since, after all, ordering is irrelevant if there's no traversal).

https://paste.ofcode.org/7Buz2Mua9xna55TC3uFaQp

Test system is a Ryzen 3700x. I tried to avoid all amounts of compiler optimizations and believe I succeeded, but by all means happy to take a review of the above. Would love to see you do a similar test in whatever runtime you want, especially one with that juicy super-amazing compacting GC that makes linked-lists so fast.

At 100 elements vector wins (40ns vs 160ns). At 1000 elements vector wins (~300ns vs. ~1000ns). At 10,000 elements vector wins (~2700ns vs. 12,000ns). At 100,000 elements guess what? Vector still wins (29us vs. 130us).

You are either vastly underestimating how slow linked-list traversal is or vastly overestimating how expensive memmove is.

But you did say 1 million entries so let's give that a shot:

    Vector size 1000000  took 286,123ns
(Vector at 1 million is only twice as slow as a linked list at 100,000, linked lists aren't off to a strong start here...)

    Linked-list size 1000000  took 2,670,542ns
Oh my god it's a bloodbath. Vector wins by 10x. Still. At 1 million entries the O(N) insert structure is still faster than the O(1) insert one. And the gap is getting bigger!

And this is the basically best case for linked lists of inserting in the middle happens equally as often as traversal, I'll point out. If traversal is rare obviously a lazy sort would crush these vector results, and if traversal is common the linked lists results get that much worse.

Now really these results shouldn't actually be that surprising if you really dig into it. After all, the vector version of this is storing (and therefore traversing) 4MB of data. That's all 1 million ints really is, it's not very big. Now sure you can store bigger things, but much bigger and you're probably storing a vector of pointers instead which only bumps that to 8MB of data. The linked list version, on the other hand, is storing 2 pointers + an int for each node. That's 20MB of data total. So resizing the vector involves a read of 2MB of data, a write of 2MB of data, and a traversal of 4MB of data - 8MB total. Simply traversing the linked list is hitting 20MB of data - over double the amount of memory bandwidth utilized. That's a big difference. In fact on my system at 1 million entries that happened to be the difference between the vector version comfortably fitting in L3 with loads of room to spare (especially since the resize made part of it nice & hot) vs. the linked-list version not coming close to fitting (specs on this CPU claim 32MB of L3, but it's really 16MBx2 - hitting the far L3 isn't much slower than hitting main memory). So more data to hit and it's pointer chasing? Modern CPUs just really hate that. Like a lot.

Since I don't intend to cheat at happening to have sufficient L3 for the use case you laid out (combined with the benchmark naturally keeping this hot), I also tried adjusting it. I changed the containers to hold intptr's instead, and hit it with 10 million entries. Neither one comes close to fitting in L3 now. Still 80MB of data for the vector vs. 240MB for the linked list but maybe not having that copy and with both of them thrashing L3 the linked list might finally show something to redeem it.

    Vector size 10,000,000  took 3,680,111ns
    Linked-list size 10,000,000  took 26,460,333ns
That's a big fucking oof right there.

> I appreciate the discussion. It's a bit of a rathole for something that you shouldn't be optimizing if you can completely avoid it by using the right data structure for your needs.

And a linked list is rarely the right data structure, so yes you should avoid trying to optimize for that without measuring.


I think one important variable to tune would be the element size (since supposedly linked list shines more with bigger data). Curious to see what the numbers are at 100b/1kb

My guess is that the linked list impl remains almost the same while vec sees a linear ish slowdown.


That's true. And there is seriously optimized software out there using linked lists (e.g. in the Linux kernel).

However, with bigger data in my experience a very good solution is often to have a vector of pointers to the actual data. This loses some locality of course, but during traversal the fact that you don't have dependencies between iterations can still be a win.


Oh, absolutely! Linked lists do have a quite useful strength in that they are agnostic to data size and, potentially more importantly, that pointers to the data are not invalidated when a linked list changes size as they are in vector. Both those properties combined make linked lists good data structures for holding large data as more of an allocator than a list type usage. Where it's not the O(1) inserts-anywhere that matters so much as the O(data-never-moves) that does.

But since you asked I adjusted the benchmark to instead contain a 100byte & 1kb payload (just an array of ints). And in the loop I just use the first number of the array in the summation. I assumed an iteration isn't accessing the entire payload but say instead an ID or whatever.

I also added a vector that is pointers to the data instead of the data itself (always a new allocation, none of the pointers are the same)

For 100 bytes:

    Vector; payload sizeof=100 of count 100 took 142ns
    Vector-of-pointers; payload sizeof=8 of count 100 took 150ns
    Linked-list; payload sizeof=100 of count 100 took 180ns
    
    Vector; payload sizeof=100 of count 10,000 took 18,796ns
    Vector-of-pointers; payload sizeof=8 of count 10,000 took 8,670ns
    Linked-list; payload sizeof=100 of count 10,000 took 32,853ns
Gap shrunk, but possibly not by as much as you would have expected.

For 1kb Results:

    Vector; payload sizeof=1000 of count 100 took 1137ns
    Vector-of-pointers; payload sizeof=8 of count 100 took 549ns
    Linked-list; payload sizeof=1000 of count 100 took 387ns
    
    Vector; payload sizeof=1000 of count 10,000 took 120,709ns
    Vector-of-pointers; payload sizeof=8 of count 10,000 took 10,681ns
    Linked-list; payload sizeof=1000 of count 10,000 took 105,927ns
Linked-list finally manages to pass up vector, but then vector of pointers runs away from linked-lists in the 1kb test.

Think about it like this: Traversal speeds of linked lists are bottlenecked by memory latency. Traversal speeds of vectors are bottlenecked by memory bandwidth. Only one of those metrics has been improving over the years.

Code: https://paste.ofcode.org/hYTf6PYPJegWLReSGQyUBg


That's interesting data. I guess I nerd-sniped you.

One use case where I think linked lists would be far superior would be merging two sorted linked lists. That can be done completely in-place with a linked list, consuming no intermediate memory. With vectors, you end up with an intermediate copy, either of one of the lists, or both. If you take the result of merging two lists, then merge it with a third list, then take that and merge it with a fourth list, etc, you can't really do this with a vector-of-lists representation that might come to mind.


> One use case where I think linked lists would be far superior would be merging two sorted linked lists.

Benchmark it & prove it. I expect you're still vastly underestimating how slow it is to traverse a linked list.

To visit a node in a linked list you must first load the node before it. Every iteration is a dependent load. You get no pipelining here, so every node lookup is ~100ns. Meanwhile for vectors each index is independent, so they are nice & pipelined. Yeah you're copying, but you've got vastly more memory bandwidth than memory latency.

Also to zipper merge 2 linked lists means updating 2 * N pointers. That's essentially a copy of the entire list right there, depending on the size of the contained object.


There are situations where you can have a linked list perform faster than a vector. It's good to consider linked lists if you have the time to think about your solution and profile multiple options... but the point is that memory latency is a huge hurdle to overcome.

For context: A cold memory fetch is ~200 cycles. That's around 10 incorrectly-predicted branches. That's around 100 correctly-predicted branches. You need to be skipping a lot of work to justify injecting 100s of cycles of lag inside a loop of some sort.

(And, of course, those situations would need to be the common, relevant case. Don't follow a 10x performance speed-up in a one-off situation at a time no-one cares about to justify a 5x slowdown in a critical, hot loop.)

If you have big objects, then this might also start to become a concern. However, in those cases, a vector of pointers (or structs of pointers + metadata) would appear as a new challenger. In that case, you would only take the latency hit if you actually need to go to the object (and those objects could be held in contiguous storage, etc. etc. etc.).

Chances are that, unless you have data for your specific case showing otherwise, vector will probably win. It should be the default unless you have a reason not to, and evidence to back it up.


Heh. I guess I have an interview question to update.

One of my quick warm-up questions is asking about what data structure to choose if inserting an element after an element you already have a reference to is a common operation that you want to optimize for.

Of course, I don't consider there to be a right or wrong answer, but just make sure that people can intelligently discuss it. Some people start off with vec or the like, and then we discuss how fast that would be compared to a linked list. I generally stop after we get to the O(n) vs. O(1) comparison.

But maybe if we have some extra time I should also talk about the tradeoff with actually traversing the list.

After all, I suppose this is why things like gap buffers are popular for text editors, rather than linked lists.


> So you are talking about 1500 cycles to....avoid a single cache line miss due to chasing a linked list pointer?

No I'm talking about 1500 cycles to avoid 1000 cache line misses. Ordering only matters if you're traversing, and traversing a linked list sure as shit isn't a single cache line miss as I'm sure you know.

> And a (moving) garbage collector is going to dynamically reorganize the nodes of a list, generally in a breadth-first way, depending on the marking algorithm.

We're talking about Rust & C++. There isn't a moving GC.

> You don't get to pick the application behavior. If it needs to a do a lot of inserts and traversals, it just does.

If it's doing that linked lists perform terribly. Traversals of linked lists that suddenly become a million long are insanely slow. That's dependent loads and cache misses all the way through. And even if your hypothetical GC works flawlessly giving you maximum compactness, you're still blowing likely half your throughput & cache size on pointers to neighboring nodes. Even more if it's a doubly-linked list.

> Sometimes lists are just the best damn thing.

It's potentially possible, sure, as I even acknowledged but you seem to have skipped. No, the argument is that they are rarely the right thing, so the focus on them being difficult in Rust is irrelevant.

If you actually know doubly linked lists are what you need then Rusty's unsafe is almost certainly not a deterrent.


Legal | privacy