Hacker Read

cheepin · 2016-10-29 03:53:49

If you are measuring response in nanoseconds, 100 microseconds is still a lot.

However, it may be good enough for games at well below 1% of your time budget for a 60 fps game assuming cache locality is good enough so you don't waste too much time fetching from main memory.

reply

vardump | karma 7011 | avg karma 2.24 · | 2015-02-05 20:21:34+00:00

And how about latency? DDR3 has 100 ns latency, give or take.

Dylan16807 | karma 31639 | avg karma 1.39 · | 2017-04-20 21:24:18+00:00

> extremely low latency (which it has already delivered)

Are we looking at the same numbers? "probably under 10 microseconds" is pretty terrible compared to DRAM.

reply

dvirsky | karma 2771 | avg karma 3.1 · | 2013-09-17 15:59:49+00:00

It's a function of the number of keys more than heap size directly. I've benchmarked the STW at 20ms pauses for ~2G if the values are ~10k each, and 200ms if the keys are 1K each at the same amount of RAM.

FrozenVoid | karma 883 | avg karma 1.58 · | 2017-06-04 15:39:01+00:00

50% of speed isn't the whole deal. Latency and memory use matter too. Audio,Video, Games, Networking,etc require minimal latency. Memory use is also critical: when you data structures explode into GBs of RAM, a twice larger means you can run out of RAM and start swapping to disk, lowering performance substantially. Finally 50% of speed in isolated benchmarks with minimal GC use doesn't mean 50% inside a complex application, where GC can pause the current thread or steal time continuously.

jleahy | karma 889 | avg karma 2.69 · | 2023-12-05 01:40:39

Even hitting RAM takes the best part of 100ns. They probably mean 5-10us given the ‘6x faster’ thing.

formerly_proven | karma 13110 | avg karma 3.44 · | 2020-11-30 21:00:43+00:00

Latencies like this are doable with a lot of tuning on Intel CPUs; out of the box you'll get to the 40s with fast memory. And those CPUs have three cache levels instead of two...

A good old-fashioned 2010-era gaming PC would already get down to around 50 ns levels.

It's definitely really good, but considering it's rather fast RAM (DDR4 4266 CL16) and doesn't have L3 it's not that surprising.

reply

Panzer04 | karma 1554 | avg karma 3.1 · | 2022-06-06 18:46:39

"only" an order of magnitude.

Never mind comparing bleeding edge NVMe with (by now) decade-old DDR3, while the current bleeding edge DDR5 is now trickling out at 100GB/s+ pretty easily.

Never mind DDR memory latencies of ~50ns, vs NVMe at 50uS+.

Just because the user doesn't notice GUI problems doesn't mean it's not going to be a catastrophic bottleneck in any memory-intensive application.

reply

loser777 | karma 1076 | avg karma 4.66 · | 2018-04-27 23:33:37+00:00

0.25s for 200MB (in memory) seems pretty slow for a modern CPU (800MB/s) which is more than an order of magnitude below what you would expect from main memory.

There can be a lot of performance left on the table even in standard library implementations: http://0x80.pl/articles/simd-strfind.html

reply

Arelius | karma 1728 | avg karma 1.62 · | 2017-01-13 01:55:44+00:00

I wonder if that would still be an acceptable trade-off with high-performance games on modern memory architectures.

Dylan16807 | karma 31639 | avg karma 1.39 · | 2022-12-07 17:03:57

It's a much better estimate than hand waving about memory isolation.

If we want to talk about how things work directly, my program can get things to the GPU in far less than a millisecond. The safety layers are not the problem.

reply

GeekyBear | karma 9784 | avg karma 3.86 · | 2021-11-23 04:23:57

> they have 400mb/sec of memory bandwidth

Per chiplet.

reply

seba_dos1 | karma 7517 | avg karma 2.24 · | 2021-12-27 14:07:51

No, this is about memory bandwidth.

foota | karma 6719 | avg karma 2.15 · | 2023-06-24 16:02:02

Wait till you see the memory bandwidth that 1 thousandth of a cpu gets you.

robocat | karma 11778 | avg karma 2.08 · | 2020-02-04 23:13:39

However the memory usage difference is astonishing for some of those benchmarks - using 1000x more memory is only acceptable for some situations.

aikah | karma 5617 | avg karma 2.52 · | 2014-12-27 14:44:15+00:00

When doing a speed benchmark,I consider the memory profile to be as relevant as the number of requests per second.

vvanders | karma 14177 | avg karma 4.33 · | 2017-04-12 15:11:50

> It is very rare to find a task which is memory speed bound. There's almost always substantial processing to be done with data.

One could argue that memory speed doesn't matter because memory latency has remained (relatively) constant since the advent of DDR. Can't process something while you're waiting for that cache miss to complete.

reply

vardump | karma 7011 | avg karma 2.24 · | 2020-02-05 08:07:42

But not faster than L3 cache bandwidth. Some cards can DMA to L3 cache. Granted, eventually it's flushed to main RAM, so might not help too much in the end.

lucian1900 | karma 4353 | avg karma 1.66 · | 2013-02-18 18:49:06+00:00

Game developers will almost always take more memory over faster memory, as long as the slower one's bandwidth is still sufficient.

Bandwidth of storage is abysmal by comparison.

reply

thedance | karma 613 | avg karma 1.32 · | 2020-03-01 01:38:08+00:00

If you're not sensitive to main memory latency, just say that. Don't try to tell me that 25ns is not relevant. It's ~100 CPU cycles and it's also about 25% swing from fastest to slowest.