For comparison, openrouter says opus 4.8 is ~55 tokens/s and fast mode is ~102.
750 tokens/s for their largest model is going to be nuts
But it seems that there is some queuing/load balancing on their side, I mean when opus is actually outputting this 55t/s it feles fast, but apart from it's internal reasoning I think there's sometimes just waiting.
Using gpt-5.4-mini in off-peak hours already feels like super-speed to me. That's probably no more than 100-150 tk/s. I can't imagine 750!
I've always eyed Cerebras but never had a use for it that would justify paying for the API directly. Although now that I think about it, trying out the API would probably cost less than a subscription for a month...
the more advanced models also utilize a lot more tokens, and a lot of these extra tokens may go towards safeguards at a higher rate than prior models as well.
not to say a speed boost isnt there but if they didnt increase tokens / s at all youd likely see things slow down a lot with the new model compared to current
What about 15k tokens per second? [0] I remember looking at this earlier in the year and it being so fast that it feels fake. And, yes, this model is old - but still awesome for what it is.
[0] https://chatjimmy.ai/