The main frontier models are all up on https:&#x...

gordonhart • yesterday at 8:10 PM • 2 replies • view on HN

The main frontier models are all up on https://arcprize.org/tasks

Barely any of them break 0% on any of the demo tasks, with Claude Opus 4.6 coming out on top with a few <3% scores, Gemini 3.1 Pro getting two nonzero scores, and the others (GPT-5.4 and Grok 4.20) getting all 0%

Replies

ACCount37 • yesterday at 8:19 PM

Pre-release, I would have expected Gemini 3.1 Pro to get ahead of Opus 4.6, with GPT-5.4 and Grok 4.20 trailing. Guess I shouldn't have bet against Anthropic.

Not like it's a big lead as of yet. I expect to see more action within the next few months, as people tune the harnesses and better models roll in.

This is far more of a "VLA" task than it is an "LLM" task at its core, but I guess ARC-AGI-3 is making an argument that human intelligence is VLA-shaped.

➕ show 2 replies

thatguymike • yesterday at 9:32 PM

Curious, that doesn't match the graph up on the Leaderboard page? https://arcprize.org/leaderboard

➕ show 1 reply

alt Hacker News

Replies