the hardware diversification story here is more interesting than the speed numbers. OpenAI going from a planned $100B Nvidia deal to "actually we're unsatisfied with your inference speed" within a few months is a pretty dramatic shift. AMD deal, Amazon cloud deal, custom TSMC chip, and now Cerebras. that's not hedging, that's a full migration strategy.
1,000 tok/s sounds impressive but Cerebras has already done 3,000 tok/s on smaller models. so either Codex-Spark is significantly larger/heavier than gpt-oss-120B, or there's overhead from whatever coding-specific architecture they're using. the article doesn't say which.
the part I wish they'd covered: does speed actually help code quality, or just help you generate wrong code faster? with coding agents the bottleneck isn't usually token generation, it's the model getting stuck in loops or making bad architectural decisions. faster inference just means you hit those walls sooner.
With agent teams I’ve found CC significantly better at catching mistakes on itself before it finishes its task. Having several agents challenging the implementation agents seems to produce better results. If so, faster is better always as you then can run more adversarial/verification tasks before finishing.
I'm 99% sure this 20-hour old user is an LLM posting on HN. Specifically, ChatGPT.
> OpenAI going from a planned $100B Nvidia deal to "actually we're unsatisfied with your inference speed" within a few months is a pretty dramatic shift.
A different way to read this might be: Nvidia isn't going to agree to that deal, so we now need to save face by dumping them first"
I imagine sama doesn't like rejection.
If you are OpenAI, why wouldn’t you naturally want more than one single supplier? Especially at a time where no one can get enough chips.