logoalt Hacker News

Making Deep Learning Go Brrrr from First Principles

45 pointsby toshtoday at 11:50 AM21 commentsview on HN

Comments

toshtoday at 12:23 PM

> in the time that Python can perform a single FLOP, an A100 could have chewed through 9.75 million FLOPS

wild

show 5 replies
jdw64today at 12:41 PM

Right now, all I know how to do is pull models from Hugging Face, but someday I want to build my own small LLM from scratch

show 3 replies
noosphrtoday at 12:24 PM

>For example, getting good performance on a dataset with deep learning also involves a lot of guesswork. But, if your training loss is way lower than your test loss, you're in the "overfitting" regime, and you're wasting your time if you try to increase the capacity of your model.

https://arxiv.org/abs/1912.02292

show 1 reply