> And, of course, 35 is the same score claimed by AI systems from Google, OpenAI, and others. T...

quietbritishjim • today at 2:32 PM • 1 reply • view on HN

> And, of course, 35 is the same score claimed by AI systems from Google, OpenAI, and others.

This is the part of the quote your6 replying about.

You seemed to take "of course" as an implication that the contestants used LLMs, and that's why they got the same score as the LLMs.

I took it to mean: since this was the modal score, there seemed to be 35 points worth of significantly easier answers (relatively speaking) than the remaining points, so it's not a surprise that LLMs got the same easier bits right. (Though I doubt all contestants got their points on exactly the same answers.)

But it's certainly unclear what exactly the author meant.

Replies

daquisu • today at 8:24 PM

Later in the same blog post, the author says:

> We can also consider the IMO 2025 problems individually. In the Epoch AI newsletter, Greg Burnham combines a subjective analysis with Evan Chen’s MOHS ratings to argue that the first five problems at IMO 2025 were unusually easy and the sixth was unusually hard, so it’s not surprising that the first five problems were exactly the ones solved by these AIs. Though I’m not sure the MOHS scale is rigorous enough to make sense as the x-axis of a bar chart it’s easy to corroborate the high-level story with the official IMO statistics. Based on average scores, this year’s Problem 6 was the fourth hardest and its Problem 3 was by far the easiest of all Problem 3s and 6s since 2000.

In the linked MaxProof paper, in the section "6.3.1. Per-Problem Analysis" it shows the same behavior: 7/7 in the first 5 problems, 0/7 in the last problem.

alt Hacker News

Replies