"this thing is clearly trained via RL to think and solve tasks for specific reasoning benchmark...

greenchair • last Sunday at 12:08 PM • 1 reply • view on HN

"this thing is clearly trained via RL to think and solve tasks for specific reasoning benchmarks. nothing else." Has the train already reached the end of the line?

Replies

red75prime • last Monday at 9:29 AM

If you think something like "They have to train their models on benchmarks to make it look like there's progress, while in reality it's a dead end," you are missing a few things.

It's an open model, everyone can bench it on everything not only on specific benchmarks. Training on specific reasoning benchmarks is a conjecture.

alt Hacker News

Replies