logoalt Hacker News

sheepttoday at 1:48 AM3 repliesview on HN

UTF-8 is not locale independent. You cannot correctly render multilingual UTF-8 text without also specifying its locale, and some transformations like uppercase/lowercase also depend on the locale.


Replies

Joker_vDtoday at 2:37 AM

> You cannot correctly render multilingual UTF-8 text without also specifying its locale

You can render it pretty well, not perfect, but good enough to actually read it, as opposed to not being able to render it at all or rendering mojibake à la Кракозябры instead.

numpad0today at 3:43 AM

At least touching Unicode strings in wrong locales only mildly corrupts the strings. Plenty of Win32 apps would crash if the system locale is in UTF-8.

sourcegrifttoday at 2:11 AM

Eg: some cjk characters render differently based on whether mainland China, Taiwan, or Japan. One example 骨 (from my old notes so tiny chance this example is incorrect)