Obligatory taalas mention:
Despite the performative UI components they have a shipped (demo) product:
This is only 3.1 8B and a very small context window, but at 17k tokens per second it's likely enough to reliably call tools which would make a huge difference in agentic applications. Assuming they can bake in better models I'm just as bullish or even moreso on this, considering this opens up edge computing at the extremely low power requirement.
High tok/s is the future IMO.
My dream is claude or codex running at this speed.