Hasn't this been proven true, many times now? Just look at the difference between ChatGPT 3 and 3.5, for example (which used the same dataset). That, and all the top performing models have large gains from thinking, using the exact same weights.
And, all the new research around self learning architectures has nothing to do with the datasets.