[flagged]
I don't think this argument is wrong. But also debatable. At the end of the day, we are talking about the manifold of the reality (as compressed by LLM through language abstraction). It is remain to be seen if supervised fine-tuning on the best human can produce would nag the model enough to generate surprising findings.
We know the pre-trained models do tend to revert to mean, but I don't think that's enough to say SFT / RL models will do the same, although some might argue RL only sharpens the distribution, even for that, I am skeptical about that paper.
It's been shown in other fields that training models on the output of other models produces subtly broken models, not a flattening to the statistical mean. Why would science be different?