I had the same thought, because even if the exact solution doesn't appear there's a notabl...

fc417fc802 • today at 2:19 PM • 2 replies • view on HN

I had the same thought, because even if the exact solution doesn't appear there's a notable difference between performing a literature search versus solving something de novo. But I think perhaps this benchmark wasn't meant to exclude the former and that the point may have been to test the ability of the model to accurately interpret and synthesize relevant output for research level mathematical problems at all.

Replies

christianstump • today at 3:08 PM

I think you are underestimating the complexity of such problems. A PhD in the exact field of research would need days to weeks to understand what the problem means and how to solve it. This is far beyond "throwing standard techniques" at a problem. (But, I keep emphasizing this, it is also far away from solving research mathematics.)

➕ show 1 reply

tossandthrow • today at 2:26 PM

I can recommend reading section 2 of the paper.

The goal was not to define unsolved problems.

But as such, the problems are also not previously published problems.

This seems quite reasonable IMHO.

alt Hacker News

Replies