logoalt Hacker News

greenchairlast Sunday at 12:08 PM1 replyview on HN

"this thing is clearly trained via RL to think and solve tasks for specific reasoning benchmarks. nothing else." Has the train already reached the end of the line?


Replies

red75primelast Monday at 9:29 AM

If you think something like "They have to train their models on benchmarks to make it look like there's progress, while in reality it's a dead end," you are missing a few things.

It's an open model, everyone can bench it on everything not only on specific benchmarks. Training on specific reasoning benchmarks is a conjecture.