There was this very interesting paper out of Stanford this last September about pretraining under th...

linolevan • yesterday at 10:23 PM • 1 reply • view on HN

There was this very interesting paper out of Stanford this last September about pretraining under the unlimited compute but limited data paradigm[0]. Pretty much exactly the same thing but with ~200M training tokens instead.

[0] https://www.alphaxiv.org/abs/2509.14786

Replies

sdpmas • yesterday at 10:33 PM

yeah, we do incorporate some of the findings from the paper in our repo! like aggressive regularization and ensembling.

➕ show 1 reply

alt Hacker News

Replies