You can train that size of a model on ~1 billion tokens in ~3 minutes on a rented 8xH100 80GB node (...

smaddox • 01/21/2025 • 0 replies • view on HN

You can train that size of a model on ~1 billion tokens in ~3 minutes on a rented 8xH100 80GB node (~$9/hr on Lambda Labs, RunPod io, etc.) using the NanoGPT speed run repo: https://github.com/KellerJordan/modded-nanogpt

For that short of a run, you'll spend more time waiting for the node to come up, downloading the dataset, and compiling the model, though.

alt Hacker News