Show HN: Z80-μLM, a 'Conversational AI' That Fits in 40KB

159 points • by quesomaster9000 • today at 5:41 AM • 39 comments • view on HN

How small can a language model be while still doing something useful? I wanted to find out, and had some spare time over the holidays.

Z80-μLM is a character-level language model with 2-bit quantized weights ({-2,-1,0,+1}) that runs on a Z80 with 64KB RAM. The entire thing: inference, weights, chat UI, it all fits in a 40KB .COM file that you can run in a CP/M emulator and hopefully even real hardware!

It won't write your emails, but it can be trained to play a stripped down version of 20 Questions, and is sometimes able to maintain the illusion of having simple but terse conversations with a distinct personality.

The extreme constraints nerd-sniped me and forced interesting trade-offs: trigram hashing (typo-tolerant, loses word order), 16-bit integer math, and some careful massaging of the training data meant I could keep the examples 'interesting'.

The key was quantization-aware training that accurately models the inference code limitations. The training loop runs both float and integer-quantized forward passes in parallel, scoring the model on how well its knowledge survives quantization. The weights are progressively pushed toward the 2-bit grid using straight-through estimators, with overflow penalties matching the Z80's 16-bit accumulator limits. By the end of training, the model has already adapted to its constraints, so no post-hoc quantization collapse.

Eventually I ended up spending a few dollars on Claude API to generate 20 questions data (see examples/guess/GUESS.COM), I hope Anthropic won't send me a C&D for distilling their model against the ToS ;P

But anyway, happy code-golf season everybody :)

Comments

jacquesm • today at 10:42 AM

Between this and RAM prices Zilog stock must be up! Awesome hack. Now apply the same principles to a laptop and take a megabyte or so, see what that does.

nineteen999 • today at 7:38 AM

This couldn't be more perfectly timed .. I have an Unreal Engine game with both VT100 terminals (for running coding agents) and Z80 emulators, and a serial bridge that allows coding agents to program the CP/M machines:

https://i.imgur.com/6TRe1NE.png

Thank you for posting! It's unbelievable how someone sometimes just drops something that fits right into what you're doing. However bizarre it seems.

➕ show 3 replies

rahen • today at 10:09 AM

I love it, instant Github star. I wrote an MLP in Fortran IV for a punched card machine from the sixties (https://github.com/dbrll/Xortran), so this really speaks to me.

The interaction is surprisingly good despite the lack of attention mechanism and the limitation of the "context" to trigrams from the last sentence.

This could have worked on 60s-era hardware and would have completely changed the world (and science fiction) back then. Great job.

vedmakk • today at 7:37 AM

If one would train an actual secret (e.g. a passphrase) into such a model, that a user would need to guess by asking the right questions. Could this secret be easily reverse engineered / inferred by having access to models weights - or would it be safe to assume that one could only get to the secret by asking the right questions?

➕ show 2 replies

Dwedit • today at 7:39 AM

In before AI companies buy up all the Z80s and raise the prices to new heights.

➕ show 1 reply

roygbiv2 • today at 7:34 AM

Awesome. I've just designed and built my own z80 computer, though right now it has 32kb ROM and 32kb RAM. This will definitely change on the next revision so I'll be sure to try it out.

➕ show 1 reply

orbital-decay • today at 9:01 AM

Pretty cool! I wish free-input RPGs of old had fuzzy matchers. They worked by exact keyword matching and it was awkward. I think the last game of that kind (where you could input arbitrary text when talking to NPCs) was probably Wizardry 8 (2001).

anonzzzies • today at 8:44 AM

Luckily I have a very large amount of MSX computers, zx, amstrad cpc etc and even one multiprocessor z80 cp/m machine for the real power. Wonder how gnarly this is going to perform with bankswitching though. Probably not good.

Peteragain • today at 9:30 AM

There are two things happening here. A really small LLM mechanism which is useful for thinking about how the big ones work, and a reference to the well known phenomenon, commonly dismissively referred to as a "trick", in which humans want to believe. We work hard to account for what our conversational partner says. Language in use is a collective cultural construct. By this view the real question is how and why we humans understand an utterance in a particular way. Eliza, Parry, and the Chomsky bot at http://chomskybot.com work on this principle. Just sayin'.

Zee2 • today at 6:53 AM

This is super cool. Would love to see a Z80 simulator set up with these examples to play with!

➕ show 1 reply

magicalhippo • today at 8:35 AM

As far as I know, the last layer is very quantization-sensitive, and is typically not quantized, or quantized lightly.

Have you experimented with having it less quantized, and evaluated the quality drop?

Regardless, very cool project.

➕ show 1 reply

a_t48 • today at 8:05 AM

Nice - that will fit on a Gameboy cartridge, though bank switching might make it super terrible to run. Each bank is only 16k. You can have a bunch of them, but you can only access one bank at a time (well, technically two - bank 0 is IIRC always accessible).

vatary • today at 8:49 AM

It's pretty obvious this is just a stress test for compressing and running LLMs. It doesn't have much practical use right now, but it shows us that IoT devices are gonna have built-in LLMs really soon. It's a huge leap in intelligence—kind of like the jump from apes to humans. That is seriously cool.

➕ show 1 reply

pdyc • today at 7:40 AM

interesting, i am wondering how far can it go if we remove some of these limitations but try to solve some extremely specific problem like generating regex based on user input? i know small models(270M range) can do that but can it be done in say < 10MB range?

➕ show 1 reply

dirkt • today at 7:48 AM

Eliza's granddaughter.

alfiedotwtf • today at 7:26 AM

An LLM in a .com file? Haha made my day

➕ show 1 reply

jasonjmcghee • today at 7:10 AM

For future projects and/or for this project, there are many LLMs available more than good enough to generate that kind of synthetic data (20 Qs) with permissive terms of use. (So you don’t need to stress about breaking TOS / C&D etc)

Zardoz84 • today at 8:36 AM

Meanwhile, Eliza was ported to BASIC and was run on many home computers in the 80s.

NooneAtAll3 • today at 9:19 AM

did you measure token/s?

codetiger • today at 7:22 AM

Imagine, this working on a Gameboy, in those days. Would've sounded like magic

➕ show 3 replies

alt Hacker News

Show HN: Z80-μLM, a 'Conversational AI' That Fits in 40KB

Comments