If you are measuring response in nanoseconds, 100 microseconds is still a lot.
However, it may be good enough for games at well below 1% of your time budget for a 60 fps game assuming cache locality is good enough so you don't waste too much time fetching from main memory.
It's a function of the number of keys more than heap size directly. I've benchmarked the STW at 20ms pauses for ~2G if the values are ~10k each, and 200ms if the keys are 1K each at the same amount of RAM.
50% of speed isn't the whole deal. Latency and memory use matter too.
Audio,Video, Games, Networking,etc require minimal latency.
Memory use is also critical: when you data structures explode into GBs of RAM, a twice larger means you can run out of RAM and start swapping to disk, lowering performance substantially. Finally 50% of speed in isolated benchmarks with minimal GC use doesn't mean 50% inside a complex application, where GC can pause the current thread or steal time continuously.
Latencies like this are doable with a lot of tuning on Intel CPUs; out of the box you'll get to the 40s with fast memory. And those CPUs have three cache levels instead of two...
A good old-fashioned 2010-era gaming PC would already get down to around 50 ns levels.
It's definitely really good, but considering it's rather fast RAM (DDR4 4266 CL16) and doesn't have L3 it's not that surprising.
Never mind comparing bleeding edge NVMe with (by now) decade-old DDR3, while the current bleeding edge DDR5 is now trickling out at 100GB/s+ pretty easily.
Never mind DDR memory latencies of ~50ns, vs NVMe at 50uS+.
Just because the user doesn't notice GUI problems doesn't mean it's not going to be a catastrophic bottleneck in any memory-intensive application.
0.25s for 200MB (in memory) seems pretty slow for a modern CPU (800MB/s) which is more than an order of magnitude below what you would expect from main memory.
It's a much better estimate than hand waving about memory isolation.
If we want to talk about how things work directly, my program can get things to the GPU in far less than a millisecond. The safety layers are not the problem.
> It is very rare to find a task which is memory speed bound. There's almost always substantial processing to be done with data.
One could argue that memory speed doesn't matter because memory latency has remained (relatively) constant since the advent of DDR. Can't process something while you're waiting for that cache miss to complete.
But not faster than L3 cache bandwidth. Some cards can DMA to L3 cache. Granted, eventually it's flushed to main RAM, so might not help too much in the end.
If you're not sensitive to main memory latency, just say that. Don't try to tell me that 25ns is not relevant. It's ~100 CPU cycles and it's also about 25% swing from fastest to slowest.
However, it may be good enough for games at well below 1% of your time budget for a 60 fps game assuming cache locality is good enough so you don't waste too much time fetching from main memory.
reply