Tokens per second are similar across Sonnet 4.5, Opus 4.5, and Opus 4.6. More importantly, normalizi...

dcre • yesterday at 4:00 PM • 1 reply • view on HN

Tokens per second are similar across Sonnet 4.5, Opus 4.5, and Opus 4.6. More importantly, normalizing for speed isn't enough anyway because smarter models can compensate for being slower by having to output fewer tokens to get the same result. The use of 99.9p duration is a considered choice on their part to get a holistic view across model, harness, task choice, user experience level, user trust, etc.

Replies

Havoc • yesterday at 11:41 PM

>Tokens per second are similar across Sonnet 4.5, Opus 4.5, and Opus 4.6.

This may come as a shock, but there are LLMs not authored by anthropic and when we do measurements we may want them to be comparable across providers

alt Hacker News

Replies