Your question hits directly at latency vs. throughput distinction. Depends on which you mean by "fast."
Throughput-wise, the supercomputer is competitive because it has a lot of local RAM connected to lots of independent nodes, which, in aggregate, is comparable to modern laptop's RAM throughput (still much more than disk) with a caveat, that you can only leverage the supercomputer bandwidth if your workload is embarrassingly parallel running on all nodes[1]. Latency-wise, old RAM still beats NVMe by two or three orders of magnitude.
[1]: there's another advantage that supercomputer has which is lots more of local SRAM caches. If the workload is parallel and can benefit from cache locality, it blows away the modern microprocessor.