logoalt Hacker News

smirutrandola10/03/20241 replyview on HN

I don't think the initial access time (latency) shall be included when speaking about peak bandwidth, otherwise it is not the peak bandwidth.

Peak bandwidth shall be considered with ideally infinite (large enough) payload to make latency negligible. When you have these two values, latency and peak bandwidth, you can estimate your (still theoretical, of course) performance given the transfer size.

The article uses the 240-320 MB/s peak bandwidth, and 110-130 ns latency for a comparison with the used external flash, which has latency in the us range and a peak bandwidth of 17 MB/s (arguably assuming using infinite payload, as 136.5/8 is about 17, i.e. without taking the initial setup time).

Still, even if you compare the actual speeds of a 1996 Pentium with the theoretical external flash speed values cited in the article, the consideration does not change: the external flash is much slower than what you could get even in 1996.


Replies

rasz10/04/2024

>I don't think the initial access time (latency) shall be included

Its not about the RAS. Bandwidth is bandwidth. When someone says

> In fact, the bandwidth for sequential reads varied a lot but with a 40 MHz EDO 64-bit DRAM (already available on 1996) one could get a maximum throughput of 320 MB/s

it tells me they multiplied 40MHz by 8 bytes and called it good. Thats not how EDO works. EDO still needs a CAS cycle for every new access, even linear. Its BEDO (Burst EDO) that has a 5-1-1-1 pattern.

https://www.electronics-notes.com/articles/electronic_compon...

https://dosdays.co.uk/topics/chipsets.php#VP1

    BEDO DRAM Read Timings (66MHz) 5-1-1-1
    EDO DRAM Read Timings (66MHz) 5-2-2-2
    FPM DRAM Read Timings (66MHz) 5-3-3-3
    SDRAM Read Timings (66MHz) 5-1-1-1
Absolute maximum _purely theoretical_ EDO burst bandwidth at 66MHz is <260MB/s. That doesnt take into account reality of 1996 hardware. Processors (Intel still hasnt acknowledged 'rep movsb' should be optimized), Chipsets and their Cache subsystems (cache on same bus as ram so no parallel accesses, lookup slows down reads). On real hardware 50-70 MB/s is all you get.

>The article uses the 240-320 MB/s

Article states "1996) one could get a maximum throughput of 320 MB/s" which is ~5x higher than reality. Im not arguing the achievement realized here is somehow lesser because of this mistake. Im pointing out assumptions about vintage hardware were incorrectly inflated. In fact those assumptions might have led to lower expectations and worse outcome. Usually learning something is possible with less is a strong catalyst to try until you get there. Great example of this effect while still staying on topic, Video7 FIFO story told by Abrash https://www.bluesnews.com/abrash/chap64.shtml

Abrash: "push past the limits he had unconsciously set in coming up with his original design. And, in the end, I think that the single most important element of great design, whether it be hardware or software or any creative endeavor, is precisely what the Paradise news triggered in Tom: The ability to detect the limits you have built into the way you think about your design, and transcend those limits."

show 1 reply