logoalt Hacker News

sinuhe69yesterday at 6:12 PM2 repliesview on HN

I'm pretty certain that DeepMind (and all other labs) will try their frontier (and even private) models on First Proof [1].

And I wonder how Gemini Deep Think will fare. My guess is that it will get half the way on some problems. But we will have to take an absence as a failure, because nobody wants to publish a negative result, even though it's so important for scientific research.

[1] https://1stproof.org/


Replies

zozbot234yesterday at 6:27 PM

The 1st proof original solutions are due to be published in about 24h, AIUI.

octoberfranklinyesterday at 10:44 PM

Really surprised that 1stproof.org was submitted three times and never made front page at HN.

https://hn.algolia.com/?q=1stproof

This is exactly the kind of challenge I would want to judge AI systems based on. It required ten bleeding-edge-research mathematicians to publish a problem they've solved but hold back the answer. I appreciate the huge amount of social capital and coordination that must have taken.

I'm really glad they did it.