I think that figure is of course referring to the peak bandwidth in burst mode, where you are sequentially access data (tacc = 25 ns), in the ideal case. In that sense, also the figure given for the Cortex M33 are to be taken as absolute maximum as well. 50-70 MB/s sounds representative of random read of 8 bytes blocks, where you have the full access time. This will go even lower if you just make byte-access.
Quake was indeed optimized to work on such 1996 Pentium PCs. Look for instance how the edge/surface/span arrays are allocated in the stack: they allocate extra size, to be sure that the data will be aligned to the cache line size.
50-70 MB/s is burst best case scenario linear read into nothing on contemporary 1996 chipsets/CPUs. Moving more or less halves that, writing is slightly faster than reading due to cache lookups.
320MB/s is even faster than theoretical maximum of EDO on Pentium platform. 8 bytes x 66MHz / 5-2-2-2 timings = <260MB/s burst.
Quake optimized for prefilling caches, but not for contemporary cache sizes. https://dependency-injection.com/2mb-cache-benchmarks/ Doom gains tiny amount when going from 256KB to 512KB, Quake linearly gains all the way to mindbogglingly absurd 2MB of L2. Could really benefit from data-oriented design, but there was no tooling for that at the time not to mention time crunch, Abrash did all he could under circumstances.