For comparison, openrouter says opus 4.8 is ~55 tokens/s and fast mode is ~102. 750 tokens&#x...

sberens • yesterday at 6:18 PM • 4 replies • view on HN

For comparison, openrouter says opus 4.8 is ~55 tokens/s and fast mode is ~102.

750 tokens/s for their largest model is going to be nuts

Replies

What about 15k tokens per second? [0] I remember looking at this earlier in the year and it being so fast that it feels fake. And, yes, this model is old - but still awesome for what it is.

[0] https://chatjimmy.ai/

➕ show 2 replies

comboy • yesterday at 8:05 PM

But it seems that there is some queuing/load balancing on their side, I mean when opus is actually outputting this 55t/s it feles fast, but apart from it's internal reasoning I think there's sometimes just waiting.

➕ show 1 reply

gandreani • yesterday at 6:29 PM

Using gpt-5.4-mini in off-peak hours already feels like super-speed to me. That's probably no more than 100-150 tk/s. I can't imagine 750!

I've always eyed Cerebras but never had a use for it that would justify paying for the API directly. Although now that I think about it, trying out the API would probably cost less than a subscription for a month...

➕ show 3 replies

order-matters • yesterday at 7:18 PM

the more advanced models also utilize a lot more tokens, and a lot of these extra tokens may go towards safeguards at a higher rate than prior models as well.

not to say a speed boost isnt there but if they didnt increase tokens / s at all youd likely see things slow down a lot with the new model compared to current

➕ show 1 reply

alt Hacker News

Replies