logoalt Hacker News

empikoyesterday at 7:17 PM0 repliesview on HN

Agreed completely. There is a ton of research into how to represent text, and these simple tokenizers are consistently performing on SOTA levels. The bitter lesson is that you should not worry about it that much.