We're gonna need some new benchmarks... ARC-AGI-3 might be the only remaining benchmark below...

pants2 • today at 6:37 PM • 2 replies • view on HN

We're gonna need some new benchmarks...

ARC-AGI-3 might be the only remaining benchmark below 50%

Opus 4.6 currently leads the remote labor index at 4.17. GPT-5.4 isn't measured on that one though: https://www.remotelabor.ai/

randomtoast • today at 7:32 PM

[dead]

alt Hacker News