Same reason why Cerebras doesn't use DRAM. The whole point of putting memory close is to increase performance and bandwidth, and DRAM is fundamentally latent.
Also, process that is good at making logic isn't necessarily good for making DRAM. Yes, eDRAM exists, but most designs don't put DRAM on the same die as logic and instead stack it or put it off-chip.
Almost all these microcontrollers that are single-die have flash+SRAM. Almost all microprocessor cache designs are SRAM for these reasons (with some designs using off-die L3 DRAM)-- for these reasons.
CPU cache is understandably SRAM.
>The whole point of putting memory close is to increase performance and bandwidth, and DRAM is fundamentally latent.
When the access patterns are well established and understood, like in the case of transformers, you can mitigate latency by prefetch (we can even have very beefed up prefetch pipeline knowing that we target transformers), while putting memory on the same chip gives you huge number of data lines thus resulting in huge bandwidth.