logoalt Hacker News

dcreyesterday at 4:00 PM1 replyview on HN

Tokens per second are similar across Sonnet 4.5, Opus 4.5, and Opus 4.6. More importantly, normalizing for speed isn't enough anyway because smarter models can compensate for being slower by having to output fewer tokens to get the same result. The use of 99.9p duration is a considered choice on their part to get a holistic view across model, harness, task choice, user experience level, user trust, etc.


Replies

Havocyesterday at 11:41 PM

>Tokens per second are similar across Sonnet 4.5, Opus 4.5, and Opus 4.6.

This may come as a shock, but there are LLMs not authored by anthropic and when we do measurements we may want them to be comparable across providers