One byte equals one character was already incorrect in the pre-unicode days for east asian languages...

plorkyeran • last Thursday at 1:44 AM • 0 replies • view on HN

One byte equals one character was already incorrect in the pre-unicode days for east asian languages. UTF-8 is much easier to parse than something like Shift JIS, where splitting a string in between bytes of a codepoint results in a valid but incorrect string.

alt Hacker News