Paper not about benchmarking or ML research is bad from the perspective of benchmarking. Not exactly...

bawolff • today at 6:11 PM • 1 reply • view on HN

Paper not about benchmarking or ML research is bad from the perspective of benchmarking. Not exactly a shocker.

The authors themselves literally state: "Unlike other proposed math research benchmarks (see Section 3), our question list should not be considered a benchmark in its current form"

Replies

data_maan • today at 6:58 PM

On the website https://1stproof.org/#about they claim: "This project represents our preliminary efforts to develop an objective and realistic methodology for assessing the capabilities of AI systems to autonomously solve research-level math questions."

Sounds to me to be a benchmark in all but a name. And they failed pretty terribly at achieving what they set out to do.

alt Hacker News

Replies