Your version only describes what happens if you do the operations serially, though. For example, a c...

twotwotwo • last Friday at 9:21 PM • 2 replies • view on HN

Your version only describes what happens if you do the operations serially, though. For example, a consumer SSD can do a million (or more) operations in a second not 50K, and you can send a lot more than 7 total packets between CA and the Netherlands in a second, but to do either of those you need to take advantage of parallelism.

If the reciprocal numbers are more intuitive for you you can still say an L1 cache reference takes 1/2,000,000,000 sec. It's "ops/sec" that makes it look like it's a throughput.

An interesting thing about the latency numbers is they mostly don't vary with scale, whereas something like the total throughput with your SSD or the Internet depends on the size of your storage or network setups, respectively. And aggregate CPU throughput varies with core count, for example.

I do think it's still interesting to think about throughputs (and other things like capacities) of a "reference deployment": that can affect architectural things like "can I do this in RAM?", "can I do this on one box?", "what optimizations do I need to fix potential bottlenecks in XYZ?", "is resource X or Y scarcer?" and so on. That was kind of done in "The Datacenter as a Computer" (https://pages.cs.wisc.edu/~shivaram/cs744-readings/dc-comput... and https://books.google.com/books?id=Td51DwAAQBAJ&pg=PA72#v=one... ) with a machine, rack, and cluster as the units. That diagram is about the storage hierarchy and doesn't mention compute, and a lot has improved since 2018, but an expanded table like that is still seems like an interesting tool for engineering a system.

Replies

zahlman • yesterday at 4:51 PM

> For example, a consumer SSD can do a million (or more) operations in a second not 50K

The "Read 1MB from SSD" entry translates into a higher throughput (still not as high as you imply, but "SSD" is also a broad category ranging from SATA-connected devices though I think five generations of NVMe now); I assume the "Read 4KB" timing really describes a single, isolated page read which would be rather difficult to parallelize.

chrisweekly • yesterday at 4:58 AM

Great comment. I like your phrasing "capacities of a reference deployment", this is what I tend to refer to as the performance ceiling. In practical terms, if you're doing synthetic performance measurements in the lab, it's a good idea to try to recreate optimal field conditions so your benchmarks have a proper frame of reference.

alt Hacker News

Replies