logoalt Hacker News

Show HN: I built a tiny LLM to demystify how language models work

299 pointsby armanifiedtoday at 12:20 AM25 commentsview on HN

Built a ~9M param LLM from scratch to understand how they actually work. Vanilla transformer, 60K synthetic conversations, ~130 lines of PyTorch. Trains in 5 min on a free Colab T4. The fish thinks the meaning of life is food.

Fork it and swap the personality for your own character.


Comments

mudkipdevtoday at 5:55 AM

This is probably a consequence of the training data being fully lowercase:

You> hello Guppy> hi. did you bring micro pellets.

You> HELLO Guppy> i don't know what it means but it's mine.

ordinarilytoday at 2:57 AM

It's genuinely a great introduction to LLMs. I built my own awhile ago based off Milton's Paradise Lost: https://www.wvrk.org/works/milton

Alexzoofficialtoday at 6:14 AM

This is a fantastic educational resource. I've always found that building a "toy" version of a complex system is the best way to actually understand the architecture.

Quick question for the author: did you experiment with different tokenization strategies, or did you stick to a simple character-level/word-level split for this scale? I'm curious if BPE or similar would even be worth the overhead for a 9M parameter model.

monksytoday at 6:35 AM

Is this a reference from the Bobiverse?

zwapstoday at 5:46 AM

I like the idea, just that the examples are reproduced from the training data set.

How does it handle unknown queries?

ankitsanghitoday at 5:15 AM

Love it! I think it's important to understand how the tools we use (and will only increasingly use) work under the hood.

cbdevidaltoday at 3:13 AM

> you're my favorite big shape. my mouth are happy when you're here.

Laughed loudly :-D

show 1 reply
kubradortoday at 5:33 AM

how's it handle longer context or does it start hallucinating after like 2 sentences? curious what the ceiling is before the 9M params

kaipereiratoday at 5:22 AM

This is so cool! I'd love to see a write-up on how made it, and what you referenced because designing neural networks always feel like a maze ;)

martmulxtoday at 3:33 AM

How much training data did you end up needing for the fish personality to feel coherent? Curious what the minimum viable dataset looks like for something like this.

gnarlousetoday at 3:32 AM

I... wow, you made an LLM that can actually tell jokes?

NyxVoxtoday at 3:47 AM

Hm, I can actually try the training on my GPU. One of the things I want to try next. Maybe a bit more complex than a fish :)

brcmthrowawaytoday at 5:58 AM

Why are there so many dead comments from new accounts?

show 1 reply
AndrewKemendotoday at 1:53 AM

I love these kinds of educational implementations.

I want to really praise the (unintentional?) nod to Nagel, by limiting capabilities to representation of a fish, the user is immediately able to understand the constraints. It can only talk like a fish cause it’s very simple

Especially compared to public models, thats a really simple correspondence to grok intuitively (small LLM > only as verbose as a fish, larger LLM > more verbose) so kudos to the author for making that simple and fun.

show 1 reply
nullbyte808today at 2:10 AM

Adorable! Maybe a personality that speaks in emojis?

oyebennytoday at 5:17 AM

Neat!

dinkumthinkumtoday at 5:03 AM

I think this is a nice project because it is end to end and serves its goal well. Good job! It's a good example how someone might do something similar for a specific purpose. There are other visualizers that explain different aspects of LLMs but this is a good applied example.

SilentM68today at 2:22 AM

Would have been funny if it were called "DORY" due to memory recall issues of the fish vs LLMs similar recall issues :)

Morpheus_Matrixtoday at 2:13 AM

[flagged]

agenexustoday at 4:07 AM

[flagged]

ethanmacavoytoday at 3:35 AM

[flagged]

aesopturtletoday at 4:33 AM

[flagged]

weiyong1024today at 2:37 AM

[flagged]

aditya7303011today at 5:10 AM

[dead]

aditya7303011today at 5:10 AM

[dead]

LeonTing1010today at 3:55 AM

[flagged]

show 1 reply