logoalt Hacker News

fooofwyesterday at 6:22 PM0 repliesview on HN

The tokenization can represent uncommon words with multiple tokens. Inputting your example on https://platform.openai.com/tokenizer (GPT-4o) gives me (tokens separated by "|"):

    lower|case|un|se|parated|name