"this thing is clearly trained via RL to think and solve tasks for specific reasoning benchmarks. nothing else." Has the train already reached the end of the line?
If you think something like "They have to train their models on benchmarks to make it look like there's progress, while in reality it's a dead end," you are missing a few things.
It's an open model, everyone can bench it on everything not only on specific benchmarks. Training on specific reasoning benchmarks is a conjecture.
If you think something like "They have to train their models on benchmarks to make it look like there's progress, while in reality it's a dead end," you are missing a few things.
It's an open model, everyone can bench it on everything not only on specific benchmarks. Training on specific reasoning benchmarks is a conjecture.