logoalt Hacker News

zzo38computerlast Tuesday at 6:31 PM1 replyview on HN

I think Unicode is messy and is not the best way to do i18n and m17n and l10n and a11y and many other things (although there are also problems with existing implementations that do not have to do with the character set, the character set is one of the problems), and I also think that it is not good to insist on using one character set for everything (especially if that character set is Unicode).

UTF-8, UTF-16, UTF-32, UTF-EBCDIC, etc are encodings of Unicode. EUC-JP is not an encoding of a subset of Unicode; it is an encoding of JIS, which is a different character set. The PC character set is not a subset of Unicode; it is the PC character set. Being able to be mapped to Unicode in some cases does not make it Unicode (nor does it mean that these mappings are necessarily "clean", but even if they are, that still wouldn't make non-Unicode character sets to be (subsets of, or even supersets of) Unicode).


Replies

kortexlast Tuesday at 9:48 PM

So how does going back to multiple incompatible character sets help anything?