Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

> Why does #cores matter?

If you're benchmarking 4kB IOs, then the system call and interrupt handling overhead means you can't keep a high-end NVMe SSD 100% busy with only a single CPU core issuing requests one at a time. The time it takes to move 4kB across a PCIe link is absolutely trivial compared to post-Spectre/Meltdown context switch times. A program performing random IO to a mmapped file will never stress the SSD as much as a program that submits batches of several IO requests using a low-overhead asynchronous IO API.

> so why does a disk queue of say 20 actually make things any faster over a disk queue of just 2?

Because unlike almost all hard drives, SSDs can actually work on more than one outstanding request at a time. Consumer SSDs use controllers with four or eight independent NAND flash memory channels. Enterprise SSD controllers are 8-32 channels. And there's some amount of parallelism available between NAND flash dies attached to the same channel, and between planes on an individual die. Also, for writes, SSDs will commonly buffer commands to be combined and issued with fewer actual NAND program operations.



view as:

Thanks, very informative!

One more thing, I'm surprised that 4KB blocks are relevant, I'd have thought that disk requests in benchmarking (edit: cos manufacturers like to cheat), and a lot of reads in the real world, would operate at much larger requests than 4K.

Is it that IOs are broken down to 4K blocks at the disc controller level, or is that done deliberately in benchmarking to stress the IO subsystem?


SSDs are traditionally marketed with sequential IO performance expressed in MB/s or GB/s, and random IO performance expressed in 4kB IOs per second (IOPS). Using larger block sizes for random IO will increase throughput in terms of GB/s but will almost always yield a lower IOPS number. Block sizes smaller than 4kB usually don't give any improvement to IOPS because the SSD's Flash Translation Layer is usually built around managing data with 4kB granularity.

> expressed in 4kB IOs per second (IOPS). Using larger block sizes for random IO will increase throughput in terms of GB/s but will almost always yield a lower IOPS number

and higher IOPs numbers give the marketing department the warm and fuzzies. Got it.


It is a good metric because it is a measure of random access. Bigger requests are a mixed measure of sequential performance and random access (and you can basically infer the performance for any io size from the huge request bandwidth and the smallest reasonable IO IOPS)

Legal | privacy