logoalt Hacker News

nltoday at 1:11 AM0 repliesview on HN

> barely competitive

It's higher than all other models except vs Gemini 3.1 Pro on MMMLU

MMMLU is generally thought to be maxed out - as it it might not be possible to score higher than those scores.

> Overall, they estimated that 6.5% of questions in MMLU contained an error, suggesting the maximum attainable score was significantly below 100%[1]

Other models get close on GPQA Diamond, but it wouldn't be surprising to anyone if the max possible on that was around the 95% the top models are scoring.

[1] https://en.wikipedia.org/wiki/MMLU