Right now, all I know how to do is pull models from Hugging Face, but someday I want to build my own...

jdw64 • today at 12:41 PM • 3 replies • view on HN

Right now, all I know how to do is pull models from Hugging Face, but someday I want to build my own small LLM from scratch

Replies

max-amb • today at 1:31 PM

If you want a written resource I have a blog post about the mathematics behind building a feed forward from scratch, https://max-amb.github.io/blog/the_maths_behind_the_mlp/. Kinda focuses on translation from individual components to matrix operations.

kflansburg • today at 1:20 PM

If you aren't already aware, Karpathy has several videos that could get you there in a few hours https://www.youtube.com/@AndrejKarpathy

➕ show 1 reply

glouwbug • today at 1:16 PM

It’s just linear algebra. Work your way from feed forward to CNN to RNN to LSTM to attention then maybe a small inference engine. Kaparthy’s llama2.c is only ~300 lines on the latter and it pragma simds so you don’t need fancy GPUs

alt Hacker News

Replies