logoalt Hacker News

Herringyesterday at 8:23 PM2 repliesview on HN

I'd say try the nanogpt speedrun. It's much easier to train, and gives you a better comparison vs optimized systems.

https://github.com/KellerJordan/modded-nanogpt


Replies

naaskingtoday at 2:13 AM

The linked paper tested nanoGPT with this new transformer:

https://www.techrxiv.org/users/685780/articles/1375955-topol...

show 1 reply
nickpsecuritytoday at 1:27 AM

Labs were also competing to train BERT's for $20 or less. People still use them a lot, too.

https://www.databricks.com/blog/mosaicbert

I'll add they should do a number of small, training runs with different architectures and data mixes. That proves generalization.