logoalt Hacker News

akoboldfryingtoday at 6:37 AM0 repliesview on HN

The tokenisation needs to be general -- it needs to be able to encode any possible input. It should also be at least moderately efficient across the distribution of inputs that it will tend to see. Existing tokenisation schemes explicitly target this.