logoalt Hacker News

nomelyesterday at 11:59 PM0 repliesview on HN

Hasn't this been proven true, many times now? Just look at the difference between ChatGPT 3 and 3.5, for example (which used the same dataset). That, and all the top performing models have large gains from thinking, using the exact same weights.

And, all the new research around self learning architectures has nothing to do with the datasets.