This is a fantastic educational resource. I've always found that building a "toy" ver...

Alexzoofficial • today at 6:14 AM • 0 replies • view on HN

This is a fantastic educational resource. I've always found that building a "toy" version of a complex system is the best way to actually understand the architecture.

Quick question for the author: did you experiment with different tokenization strategies, or did you stick to a simple character-level/word-level split for this scale? I'm curious if BPE or similar would even be worth the overhead for a 9M parameter model.

alt Hacker News