logoalt Hacker News

Aardwolfyesterday at 7:56 PM4 repliesview on HN

Now just UTF-16 and non '\n' newline types remaining to go


Replies

syncsynchaltyesterday at 10:38 PM

Of the two UTF-16 is much less of a problem, it's trivially[1] and losslessly convertible.

[1] Ok I admit, not trivially when it comes to unpaired surrogates, BOMs, endian detection, and probably a dozen other edge and corner cases I don't even know about. But you can offload the work to pretty well-understood and trouble-free library calls.

show 1 reply
hypeateiyesterday at 8:09 PM

UTF-16 will be quite the mountain as Windows APIs and web specifications/engines default to it for historical reasons.

show 1 reply
augustktoday at 11:09 AM

> Now just UTF-16 and non '\n' newline types remaining to go

Also ISO 8601 (YYYY-MM-DD) should be the default date format.

jeberleyesterday at 11:04 PM

UTF-16 arguably is Unicode 2.0+. It's how the code point address space is defined. Code points are either 1 or 2 16-bit code units. Easy. Compare w/ UTF-8 where a code point may be 1, 2, 3, or 4 8-bit code units.

UTF-16 is annoying, but it's far from the biggest design failure in Unicode.

show 4 replies