Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

If you are measuring response in nanoseconds, 100 microseconds is still a lot.

However, it may be good enough for games at well below 1% of your time budget for a 60 fps game assuming cache locality is good enough so you don't waste too much time fetching from main memory.



sort by: page size:

And how about latency? DDR3 has 100 ns latency, give or take.

> extremely low latency (which it has already delivered)

Are we looking at the same numbers? "probably under 10 microseconds" is pretty terrible compared to DRAM.


It's a function of the number of keys more than heap size directly. I've benchmarked the STW at 20ms pauses for ~2G if the values are ~10k each, and 200ms if the keys are 1K each at the same amount of RAM.

50% of speed isn't the whole deal. Latency and memory use matter too. Audio,Video, Games, Networking,etc require minimal latency. Memory use is also critical: when you data structures explode into GBs of RAM, a twice larger means you can run out of RAM and start swapping to disk, lowering performance substantially. Finally 50% of speed in isolated benchmarks with minimal GC use doesn't mean 50% inside a complex application, where GC can pause the current thread or steal time continuously.

Even hitting RAM takes the best part of 100ns. They probably mean 5-10us given the ‘6x faster’ thing.

Latencies like this are doable with a lot of tuning on Intel CPUs; out of the box you'll get to the 40s with fast memory. And those CPUs have three cache levels instead of two...

A good old-fashioned 2010-era gaming PC would already get down to around 50 ns levels.

It's definitely really good, but considering it's rather fast RAM (DDR4 4266 CL16) and doesn't have L3 it's not that surprising.


"only" an order of magnitude.

Never mind comparing bleeding edge NVMe with (by now) decade-old DDR3, while the current bleeding edge DDR5 is now trickling out at 100GB/s+ pretty easily.

Never mind DDR memory latencies of ~50ns, vs NVMe at 50uS+.

Just because the user doesn't notice GUI problems doesn't mean it's not going to be a catastrophic bottleneck in any memory-intensive application.


0.25s for 200MB (in memory) seems pretty slow for a modern CPU (800MB/s) which is more than an order of magnitude below what you would expect from main memory.

There can be a lot of performance left on the table even in standard library implementations: http://0x80.pl/articles/simd-strfind.html


I wonder if that would still be an acceptable trade-off with high-performance games on modern memory architectures.

It's a much better estimate than hand waving about memory isolation.

If we want to talk about how things work directly, my program can get things to the GPU in far less than a millisecond. The safety layers are not the problem.


> they have 400mb/sec of memory bandwidth

Per chiplet.


No, this is about memory bandwidth.

Wait till you see the memory bandwidth that 1 thousandth of a cpu gets you.

However the memory usage difference is astonishing for some of those benchmarks - using 1000x more memory is only acceptable for some situations.

When doing a speed benchmark,I consider the memory profile to be as relevant as the number of requests per second.

> It is very rare to find a task which is memory speed bound. There's almost always substantial processing to be done with data.

One could argue that memory speed doesn't matter because memory latency has remained (relatively) constant since the advent of DDR. Can't process something while you're waiting for that cache miss to complete.


But not faster than L3 cache bandwidth. Some cards can DMA to L3 cache. Granted, eventually it's flushed to main RAM, so might not help too much in the end.

Game developers will almost always take more memory over faster memory, as long as the slower one's bandwidth is still sufficient.

Bandwidth of storage is abysmal by comparison.


If you're not sensitive to main memory latency, just say that. Don't try to tell me that 25ns is not relevant. It's ~100 CPU cycles and it's also about 25% swing from fastest to slowest.
next

Legal | privacy