It might be not that bad. “Good enough” open-weight models are almost there, the focus may shift to agentic workflows and effective prompting. The lifecycle of a model chip will be comparable to smartphones, getting longer and longer, with orchestration software being responsible for faster innovation cycles.
If you’re running at 17k tokens / s what is the point of multiple agents?