https://www.youtube.com/watch?v=VaeI9YgE1o8
Yes I know how much a kilobyte is. But cutting down to 2 million 3 bit parameters or something like that would definitely be possible.
And a 32 bit processor should be able to pack and unpack parameters just fine.
Edit: Hey look what I just found https://github.com/DaveBben/esp32-llm "a 260K parameter tinyllamas checkpoint trained on the tiny stories dataset"
And remember, Tiny Stories is 1GB of data. Can you train it for longer and with more data? Again, certainly, BUT again, there are costs. That Minecraft one is far more powerful than this thing.
Also, remember that these models are not RLHF'd, so you really shouldn't expect it to act like you're expecting a LLM to work. It is only at stage 0, the "pre-training", or what Karpathy calls a "babbler".