> gpt-oss that games the benchmarks just for PR. gpt-oss is killing the ongoing AIME3 competiti...

NitpickLawyer • today at 3:50 PM • 1 reply • view on HN

> gpt-oss that games the benchmarks just for PR.

gpt-oss is killing the ongoing AIME3 competition on kaggle. They're using a hidden, new set of problems, IMO level, handcrafted to be "AI hardened". And gpt-oss submissions are at ~33/50 right now, two weeks into the competition. The benchmarks (at least for math) were not gamed at all. They are really good at math.

Replies

lostmsu • today at 5:11 PM

Are they ahead of all other recent open models? Is there a leaderboard?

➕ show 1 reply

alt Hacker News

Replies