Really interesting approach — attacking token efficiency at the encoding level is more fundamental t...

xodn348 • yesterday at 10:43 PM • 0 replies • view on HN

Really interesting approach — attacking token efficiency at the encoding level is more fundamental than what I did.

Even without retraining BPE from scratch, starting with YUTF-8 and measuring how existing tokenizers handle it would already be a worthwhile experiment.

Hope you find the time to build it, good luck!

alt Hacker News