logoalt Hacker News

data_maanyesterday at 6:54 PM1 replyview on HN

> these are problems of some practical interest, not just performative/competitive maths.

FrontierMath did this a year ago. Where is the novelty here?

> a solution is known, but is guaranteed to not be in the training set for any AI.

Wrong, as the questions were poses to commercial AI models and they can solve them.

This paper violates basic benchmarking principles.


Replies

offnominalyesterday at 8:14 PM

> Wrong, as the questions were poses to commercial AI models and they can solve them.

Why does this matter? As far as I can tell, because the solution is not known this only affects the time constant (i.e. the problems were known for longer than a week). It doesn't seem that I should care about that.

show 1 reply