Love the format, and super cool to see a benchmark that so clearly shows DRAM refresh stalls, especi...

foltik • today at 3:28 AM • 2 replies • view on HN

Love the format, and super cool to see a benchmark that so clearly shows DRAM refresh stalls, especially avoiding them via reverse engineering the channel layout! Ran it on my 9950X3D machine with dual-channel DDR5 and saw clear spikes from 70ns to 330ns every 15us or so.

The hedging technique is a cool demo too, but I’m not sure it’s practical.

At a high level it’s a bit contradictory; trying to reduce the tail latency of cold reads by doubling the cache footprint makes every other read even colder.

I understand the premise is “data larger than cache” given the clflush, but even then you’re spending 2x the memory bandwidth and cache pressure to shave ~250ns off spikes that only happen once every 15us. There’s just not a realistic scenario where that helps.

Especially HFT is significantly more complex than a huge lookup table in DRAM. In the time you spend doing a handful of 70ns DRAM reads, your competitor has done hundreds of reads from cache and a bunch of math. It’s just far better to work with what you can fit in cache. And to shrink what doesn’t as much as possible.

Replies

Lramseyer • today at 7:23 AM

Another point about HFT - They're mostly using FPGAs (some use custom silicon) which means that they have much tighter control over how DRAM is accessed and how the memory controller is configured. They could implement this in hardware if they really need to, but it wouldn't be at the OS level.

formerly_proven • today at 7:09 AM

On most RAM tREF can be increased a lot from the default, at least if kept somewhat cool.

alt Hacker News

Replies