For a 56.7 score on the Artificial Intelligence Index, GPT 5.5 used 22m output tokens. For a score o...

louiereederson • yesterday at 6:16 PM • 4 replies • view on HN

For a 56.7 score on the Artificial Intelligence Index, GPT 5.5 used 22m output tokens. For a score of 57, Opus 4.7 used 111m output tokens.

The efficiency gap is enormous. Maybe it's the difference between GB200 NVL72 and an Amazon Tranium chip?

swyx • yesterday at 6:18 PM

why would chip affect token quantity. this is all models.

➕ show 1 reply

karmasimida • yesterday at 6:19 PM

Chips doesn’t impact output quality in this magnitude

➕ show 1 reply

AtNightWeCode • yesterday at 9:35 PM

You need to compare total cost. Token count is irrelevant.

dist-epoch • yesterday at 8:24 PM

If it's a new pretrain, the token embeddings could be wider - you can pack more info into a token making it's way through the system.

Like Chinese versus English - you need fewer Chinese characters to say something than if you write that in English.

So this model internally could be thinking in much more expressive embeddings.

alt Hacker News