Note that GPT 5.2 newly supports a "xhigh" reasoning level, which could explain the better benchmarks.
It'll be noteworthy to see the cost-per-task on ARC AGI v2.
5.1-codex supports that too, no? Pretty sure I’ve been using xhigh for at least a week now
> It'll be noteworthy to see the cost-per-task on ARC AGI v2.
Already live. gpt-5.2-pro scores a new high of 54.2% with a cost/task of $15.72. The previous best was Gemini 3 Pro (54% with a cost/task of $30.57).
The best bang-for-your-buck is the new xhigh on gpt-5.2, which is 52.9% for $1.90, a big improvement on the previous best in this category which was Opus 4.5 (37.6% for $2.40).
https://arcprize.org/leaderboard