logoalt Hacker News

pants2today at 6:37 PM2 repliesview on HN

We're gonna need some new benchmarks...

ARC-AGI-3 might be the only remaining benchmark below 50%


Replies

Leynostoday at 8:08 PM

Opus 4.6 currently leads the remote labor index at 4.17. GPT-5.4 isn't measured on that one though: https://www.remotelabor.ai/

GPT 5.4 Pro leads Frontier Maths Tier 4 at 35%: https://epoch.ai/benchmarks/frontiermath-tier-4/

randomtoasttoday at 7:32 PM

[dead]