logoalt Hacker News

chuckadamstoday at 3:50 PM2 repliesview on HN

> It would have been expensive, but all characters should have been fixed size 64bit values.

It would have been a non-starter, and then we'd all be dealing with Shift-JIS, BIG5, and FSM knows how many different codepages to this day. UTF-8 is about as elegant as it gets, though Java and JS still managed to fuck that up too (they both encode every codepoint outside the BMP as surrogate pairs in UTF-8)


Replies

chrismorgantoday at 4:34 PM

> Java and JS […] both encode every codepoint outside the BMP as surrogate pairs in UTF-8

I can’t comment on Java, but JS I know reasonably well and I can’t think of any place it uses CESU-8.

show 1 reply