Making Deep Learning Go Brrrr from First Principles

45 points • by tosh • today at 11:50 AM • 21 comments • view on HN

Comments

> in the time that Python can perform a single FLOP, an A100 could have chewed through 9.75 million FLOPS

wild

➕ show 5 replies

Right now, all I know how to do is pull models from Hugging Face, but someday I want to build my own small LLM from scratch

➕ show 3 replies

noosphr • today at 12:24 PM

>For example, getting good performance on a dataset with deep learning also involves a lot of guesswork. But, if your training loss is way lower than your test loss, you're in the "overfitting" regime, and you're wasting your time if you try to increase the capacity of your model.

https://arxiv.org/abs/1912.02292

➕ show 1 reply

alt Hacker News

Making Deep Learning Go Brrrr from First Principles

Comments