logoalt Hacker News

minimaxiryesterday at 6:35 PM2 repliesview on HN

Note that GPT 5.2 newly supports a "xhigh" reasoning level, which could explain the better benchmarks.

It'll be noteworthy to see the cost-per-task on ARC AGI v2.


Replies

granzymesyesterday at 6:59 PM

> It'll be noteworthy to see the cost-per-task on ARC AGI v2.

Already live. gpt-5.2-pro scores a new high of 54.2% with a cost/task of $15.72. The previous best was Gemini 3 Pro (54% with a cost/task of $30.57).

The best bang-for-your-buck is the new xhigh on gpt-5.2, which is 52.9% for $1.90, a big improvement on the previous best in this category which was Opus 4.5 (37.6% for $2.40).

https://arcprize.org/leaderboard

show 1 reply
walletdraineryesterday at 7:44 PM

5.1-codex supports that too, no? Pretty sure I’ve been using xhigh for at least a week now