logoalt Hacker News

sberensyesterday at 6:18 PM4 repliesview on HN

For comparison, openrouter says opus 4.8 is ~55 tokens/s and fast mode is ~102.

750 tokens/s for their largest model is going to be nuts


Replies

windexh8eryesterday at 7:58 PM

What about 15k tokens per second? [0] I remember looking at this earlier in the year and it being so fast that it feels fake. And, yes, this model is old - but still awesome for what it is.

[0] https://chatjimmy.ai/

show 2 replies
comboyyesterday at 8:05 PM

But it seems that there is some queuing/load balancing on their side, I mean when opus is actually outputting this 55t/s it feles fast, but apart from it's internal reasoning I think there's sometimes just waiting.

show 1 reply
gandreaniyesterday at 6:29 PM

Using gpt-5.4-mini in off-peak hours already feels like super-speed to me. That's probably no more than 100-150 tk/s. I can't imagine 750!

I've always eyed Cerebras but never had a use for it that would justify paying for the API directly. Although now that I think about it, trying out the API would probably cost less than a subscription for a month...

show 3 replies
order-mattersyesterday at 7:18 PM

the more advanced models also utilize a lot more tokens, and a lot of these extra tokens may go towards safeguards at a higher rate than prior models as well.

not to say a speed boost isnt there but if they didnt increase tokens / s at all youd likely see things slow down a lot with the new model compared to current

show 1 reply