>In a PC or in other Quake ports, all the data is available from RAM (if not even from the CPU data cache), which had a relatively high bandwidth and low latency even back to 1996. In fact, the bandwidth for sequential reads varied a lot but with a 40 MHz EDO 64-bit DRAM (already available on 1996) one could get a maximum throughput of 320 MB/s.
The youth, so sweet and naive :) EDO ram on average Pentium motherboard does around 50-70MB/s. 256-1024KB of L2 cache bumps that to 70-120MB/s depending on chipset and cache type (and obviously usage pattern, Quake wasnt optimized on that aspect at all). Tiny 8KB of L1 below 200MB/s.
I think that figure is of course referring to the peak bandwidth in burst mode, where you are sequentially access data (tacc = 25 ns), in the ideal case. In that sense, also the figure given for the Cortex M33 are to be taken as absolute maximum as well. 50-70 MB/s sounds representative of random read of 8 bytes blocks, where you have the full access time. This will go even lower if you just make byte-access.
Quake was indeed optimized to work on such 1996 Pentium PCs. Look for instance how the edge/surface/span arrays are allocated in the stack: they allocate extra size, to be sure that the data will be aligned to the cache line size.