Right now, all I know how to do is pull models from Hugging Face, but someday I want to build my own small LLM from scratch
>For example, getting good performance on a dataset with deep learning also involves a lot of guesswork. But, if your training loss is way lower than your test loss, you're in the "overfitting" regime, and you're wasting your time if you try to increase the capacity of your model.
> in the time that Python can perform a single FLOP, an A100 could have chewed through 9.75 million FLOPS
wild