logoalt Hacker News

blenderobtoday at 3:52 PM4 repliesview on HN

Can someone explain how this would work?

> the answers are known to the authors of the questions but will remain encrypted for a short time.

Ok. But humans may be able to solve the problems too. What prevents Anthropic or OpenAI from hiring mathematicians, have them write the proof and pass it off as LLM written? I'm not saying that's what they'll do. But shouldn't the paper say something about how they're going to validate that this doesn't happen?

Honest question here. Not trying to start a flame here. Honestly confused how this is going to test what it wants to test. Or maybe I'm just plain confused. Someone help me understand this?


Replies

yorwbatoday at 4:12 PM

This is not a benchmark. They just want to give people the opportunity to try their hand at solving novel questions with AI and see what happens. If an AI company pulls a solution out of their hat that cannot be replicated with the products they make available to ordinary people, that's hardly worth bragging about and in any case it's not the point of the exercise.

show 3 replies
data_maantoday at 6:17 PM

Nothing prevents them, and they are already doing that. I work in this field and one can be sure that now, because of the notoriety this preprint got, the questions will be solved soon.

conformisttoday at 4:14 PM

It's possible but unlikely given the short timeline, diverse questions that require multiple matheamticians, and low stakes. Also they've already run preliminary tests.

show 1 reply
iLoveOncalltoday at 6:10 PM

That was exactly my first thought as well. All those exercises are pointless and people don't seem to understand it, it's baffling.

Even if it's not Anthropic or OpenAI paying for the solutions, maybe it'll be someone solving them "for fun" because the paper got popular and posting them online.

It's a futile exercise.