Yes, this paper is insane. The actual quote about caching is:
> Once a region of tape has been read, the controller stores the result. Subsequent operations reference the cache rather than re-interrogating the physical medium. Re-reading a known bit is unnecessary; the controller already holds its state
However, earlier, the paper claims:
> The transformer architectures underpin- ning modern large language models are bandwidth-limited, not compute-limited [1–3]. The energy consumed moving data between DRAM, NAND flash, and processor cache already exceeds the energy consumed by arithmetic in datacenter AI accelerators [2]. This is not an optimization problem. It is a materials problem [emphasis mine].
as part of a longer rant about the AI "memory wall" in the very first section. If we open with a long spiel about how memory is expensive in material cost and energy cost and this material is a solution for that then what are we caching the read in? On that note, what kind of computer engineer thinks about cache on the order of individual bits on a medium?
And, as you point out, 25 PB/s is a lot. Around 1000x that of a typical on-die SRAM cache, I think.
A while later, the author speaks of using atomic force microscopy to read the data back. The size of AFM scans are, in practice, as I understand, along the order of square micrometers. I think this whole paper is an AI-driven, as you put it, 'fever dream', enabling an author to put forth 60 pages of sciencey claims and sciencey math without -- as far as I can tell -- any concrete and novel scientific result of any kind. AI-driven reality warps are not new; the difference is nowdays AIs are good enough at sounding smart to get past the barriers of a typical smart person who might want to be fooled or make a show of being open-minded. Later on, the author proposes using "shaped femtosecond IR pulses" -- without further elaboration -- to address single atoms! IR wavelengths are on the order of a micrometer at minimum!