Will be interesting to see how fast inference ASICs, diffusion LLMs, architectural changes like IBM granite small (when is that coming to OpenRouter?) and slight compromises for pre-generation can speed this up.
Also I wonder if eventually you could go further and skip the LLM entirely and just train a game world frame generator on productivity software.