logoalt Hacker News

NitpickLawyertoday at 3:50 PM1 replyview on HN

> gpt-oss that games the benchmarks just for PR.

gpt-oss is killing the ongoing AIME3 competition on kaggle. They're using a hidden, new set of problems, IMO level, handcrafted to be "AI hardened". And gpt-oss submissions are at ~33/50 right now, two weeks into the competition. The benchmarks (at least for math) were not gamed at all. They are really good at math.


Replies

lostmsutoday at 5:11 PM

Are they ahead of all other recent open models? Is there a leaderboard?

show 1 reply