Microgpt

733 points • by tambourine_man • today at 1:39 AM • 120 comments • view on HN

Comments

What I find most valuable about this kind of project is how it forces you to understand the entire pipeline end-to-end. When you use PyTorch or JAX, there are dozens of abstractions hiding the actual mechanics. But when you strip it down to ~200 lines, every matrix multiplication and gradient computation has to be intentional.

I tried something similar last year with a much simpler model (not GPT-scale) and the biggest "aha" moment was understanding how the attention mechanism is really just a soft dictionary lookup. The math makes so much more sense when you implement it yourself vs reading papers.

Karpathy has a unique talent for making complex topics feel approachable without dumbing them down. Between this, nanoGPT, and the Zero to Hero series, he has probably done more for ML education than most university programs.

➕ show 2 replies

subset • today at 4:25 AM

I had good fun transliterating it to Rust as a learning experience (https://github.com/stochastical/microgpt-rs). The trickiest part was working out how to represent the autograd graph data structure with Rust types. I'm finalising some small tweaks to make it run in the browser via WebAssmebly and then compile it up for my blog :) Andrej's code is really quite poetic, I love how much it packs into such a concise program

0xbadcafebee • today at 4:43 AM

Since this post is about art, I'll embed here my favorite LLM art: the IOCCC 2024 prize winner in bot talk, from Adrian Cable (https://www.ioccc.org/2024/cable1/index.html), minus the stdlib headers:

  #define a(_)typedef _##t
  #define _(_)_##printf
  #define x f(i,
  #define N f(k,
  #define u _Pragma("omp parallel for")f(h,
  #define f(u,n)for(I u=0;u<(n);u++)
  #define g(u,s)x s%11%5)N s/6&33)k[u[i]]=(t){(C*)A,A+s*D/4},A+=1088*s;
  
  a(int8_)C;a(in)I;a(floa)F;a(struc){C*c;F*f;}t;enum{Z=32,W=64,E=2*W,D=Z*E,H=86*E,V='}\0'};C*P[V],X[H],Y[D],y[H];a(F
  _)[V];I*_=U" 炾ોİ䃃璱ᝓ၎瓓甧染ɐఛ瓁",U,s,p,f,R,z,$,B[D],open();F*A,*G[2],*T,w,b,c;a()Q[D];_t r,L,J,O[Z],l,a,K,v,k;Q
  m,e[4],d[3],n;I j(I e,F*o,I p,F*v,t*X){w=1e-5;x c=e^V?D:0)w+=r[i]*r[i]/D;x c)o[i]=r[i]/sqrt(w)*i[A+e*D];N $){x
  W)l[k]=w=fmax(fabs(o[i])/~-E,i?w:0);x W)y[i+k*W]=*o++/w;}u p)x $){I _=0,t=h*$+i;N W)_+=X->c[t*W+k]*y[i*W+k];v[h]=
  _*X->f[t]*l[i]+!!i*v[h];}x D-c)i[r]+=v[i];}I main(){A=mmap(0,8e9,1,2,f=open(M,f),0);x 2)~f?i[G]=malloc(3e9):exit(
  puts(M" not found"));x V)i[P]=(C*)A+4,A+=(I)*A;g(&m,V)g(&n,V)g(e,D)g(d,H)for(C*o;;s>=D?$=s=0:p<U||_()("%s",$[P]))if(!
  (*_?$=*++_:0)){if($<3&&p>=U)for(_()("\n\n> "),0<scanf("%[^\n]%*c",Y)?U=*B=1:exit(0),p=_(s)(o=X,"[INST] %s%s [/INST]",s?
  "":"<<SYS>>\n"S"\n<</SYS>>\n\n",Y);z=p-=z;U++[o+=z,B]=f)for(f=0;!f;z-=!f)for(f=V;--f&&f[P][z]|memcmp(f[P],o,z););p<U?
  $=B[p++]:fflush(0);x D)R=$*D+i,r[i]=m->c[R]*m->f[R/W];R=s++;N Z){f=k*D*D,$=W;x 3)j(k,L,D,i?G[~-i]+f+R*D:v,e[i]+k);N
  2)x D)b=sin(w=R/exp(i%E/14.)),c=1[w=cos(w),T=i+++(k?v:*G+f+R*D)],T[1]=b**T+c*w,*T=w**T-c*b;u Z){F*T=O[h],w=0;I A=h*E;x
  s){N E)i[k[L+A]=0,T]+=k[v+A]*k[i*D+*G+A+f]/11;w+=T[i]=exp(T[i]);}x s)N E)k[L+A]+=(T[i]/=k?1:w)*k[i*D+G[1]+A+f];}j(V,L
  ,D,J,e[3]+k);x 2)j(k+Z,L,H,i?K:a,d[i]+k);x H)a[i]*=K[i]/(exp(-a[i])+1);j(V,a,D,L,d[$=H/$,2]+k);}w=j($=W,r,V,k,n);x
  V)w=k[i]>w?k[$=i]:w;}}

➕ show 1 reply

teleforce • today at 6:21 AM

Someone has modified microgpt to build a tiny GPT that generates Korean first names, and created a web page that visualizes the entire process [1].

Users can interactively explore the microgpt pipeline end to end, from tokenization until inference.

[1] English GPT lab:

https://ko-microgpt.vercel.app/

znnajdla • today at 6:22 AM

Super useful exercise. My gut tells me that someone will soon figure out how to build micro-LLMs for specialized tasks that have real-world value, and then training LLMs won’t just be for billion dollar companies. Imagine, for example, a hyper-focused model for a specific programming framework (e.g. Laravel, Django, NextJS) trained only on open-source repositories and documentation and carefully optimized with a specialized harness for one task only: writing code for that framework (perhaps in tandem with a commodity frontier model). Could a single programmer or a small team on a household budget afford to train a model that works better/faster than OpenAI/Anthropic/DeepSeek for specialized tasks? My gut tells me this is possible; and I have a feeling that this will become mainstream, and then custom model training becomes the new “software development”.

➕ show 6 replies

growingswe • today at 6:46 AM

Great stuff! I wrote an interactive blogpost that walks through the code and visualizes it: https://growingswe.com/blog/microgpt

red_hare • today at 4:25 AM

This is beautiful and highly readable but, still, I yearn for a detailed line-by-line explainer like the backbone.js source: https://backbonejs.org/docs/backbone.html

➕ show 3 replies

ruszki • today at 8:12 AM

> [p for mat in state_dict.values() for row in mat for p in row]

I'm so happy without seeing Python list comprehensions nowadays.

I don't know why they couldn't go with something like this:

[state_dict.values() for mat for row for p]

or in more difficult cases

[state_dict.values() for mat to mat*2 for row for p to p/2]

I know, I know, different times, but still.

➕ show 1 reply

verma7 • today at 6:00 AM

I wrote a C++ translation of it: https://github.com/verma7/microgpt/blob/main/microgpt.cc

2x the number of lines of code (~400L), 10x the speed

The hard part was figuring out how to represent the Value class in C++ (ended up using shared_ptrs).

kuberwastaken • today at 7:18 AM

I'm half shocked this wasn't on HN before? Haha I built PicoGPT as a minified fork with <35 lines of JS and another in python

And it's small enough to run from a QR code :) https://kuber.studio/picogpt/

You can quite literally train a micro LLM from your phone's browser

➕ show 2 replies

freakynit • today at 5:45 AM

Is there something similar for diffusion models? By the way, this is incredibly useful for learning in depth the core of LLM's.

fulafel • today at 3:00 AM

This could make an interesting language shootout benchmark.

➕ show 1 reply

jimbokun • today at 4:16 AM

It’s pretty staggering that a core algorithm simple enough to be expressed in 200 lines of Python can apparently be scaled up to achieve AGI.

Yes with some extra tricks and tweaks. But the core ideas are all here.

➕ show 2 replies

colonCapitalDee • today at 2:25 AM

Beautiful work

MattyRad • today at 6:10 AM

Hoenikker had been experimenting with melting and re-freezing ice-nine in the kitchen of his Cape Cod home.

Beautiful, perhaps like ice-nine is beautiful.

shevy-java • today at 8:20 AM

Microslop is alive!

with • today at 7:09 AM

"everything else is just efficiency" is a nice line but the efficiency is the hard part. the core of a search engine is also trivial, rank documents by relevance. google's moat was making it work at scale. same applies here.

➕ show 1 reply

ThrowawayTestr • today at 3:06 AM

This is like those websites that implement an entire retro console in the browser.

dhruv3006 • today at 3:42 AM

Karapthy with another gem !

➕ show 1 reply

coolThingsFirst • today at 4:44 AM

Incredibly fascinating. One thing is that it seems still very conceptual. What id be curious about how good of a micro llm we can train say with 12 hours of training on macbook.

rramadass • today at 4:06 AM

C++ version - https://github.com/Charbel199/microgpt.cpp?tab=readme-ov-fil...

Rust version - https://github.com/mplekh/rust-microgpt

ViktorRay • today at 2:36 AM

Which license is being used for this?

➕ show 1 reply

genie3io • today at 8:30 AM

[dead]

kelvinjps10 • today at 4:26 AM

Why there is multiple comments talking about 1000 c lines, bots?

➕ show 1 reply

abhitriloki • today at 7:17 AM

[flagged]

➕ show 1 reply

lynxbot2026 • today at 3:33 AM

[flagged]

➕ show 5 replies

Paddyz • today at 3:08 AM

[flagged]

➕ show 3 replies

tithos • today at 2:13 AM

What is the prime use case

➕ show 7 replies

profsummergig • today at 3:04 AM

If anyone knows of a way to use this code on a consumer grade laptop to train on a small corpus (in less than a week), and then demonstrate inference (hallucinations are okay), please share how.

➕ show 1 reply

alt Hacker News

Microgpt

Comments