Interesting selection of models for the "instruction count vs. accuracy" plot. Curious whe...

jasonjmcghee • yesterday at 8:17 PM • 1 reply • view on HN

Interesting selection of models for the "instruction count vs. accuracy" plot. Curious when that was done and why they chose those models. How well does ChatGPT 5/5.1 (and codex/mini/nano variants), Gemini 3, Claude Haiku/Sonnet/Opus 4.5, recent grok models, Kimi 2 Thinking etc (this generation of models) do?

Replies

alansaber • yesterday at 8:25 PM

Guessing they included some smaller models just to show how they dump accuracy at smaller context sizes

➕ show 1 reply

alt Hacker News

Replies