logoalt Hacker News

bandramitoday at 7:39 AM1 replyview on HN

They're literally text generators so that's... troubling


Replies

eigenspacetoday at 8:01 AM

They're text generators, but you can think of them as basically operating with a different alphabet than us. When they are given text input, it's not in our alphabet, and when they produce text output it's also not in our alphabet. So when you ask them what letters are in a given word, they're literally just guessing when they respond.

Rather, they use tokens that are usually combinations of 2-8 characters. You can play around with how text gets tokenized here: https://platform.openai.com/tokenizer

_____

For example, the above text I wrote has 504 characters, but 103 tokens.