I have also been thinking about this a lot, and share your belief that this is inevitable.
Taalas has a running demo here: https://chatjimmy.ai/
It's eye opening: generated an AVX-512 optimized Mersenne Twister in C in 0.076s, 13,706 tok/s. Too fast for the tok/s to be terribly accurate.