logoalt Hacker News

adgjlsfhk1yesterday at 12:51 AM1 replyview on HN

UTF-16 is the worst of all worlds. Either use UTF32 where code-points are fixed, or if you care about space efficiency use UTF8


Replies

mort96yesterday at 7:22 AM

UTF-32 is arguably even more worst of all worlds. You don't get fixed-size units in any meaningful way. Yes you have fixed sized code points, but those aren't the "units" you care about; you still have variable size grapheme clusters, so you still can't do things like reversing a string or splitting a string at an arbitrary index or anything else like that. Yet it consumes twice the space of UTF-16 for almost everything, and four times the space of UTF-8 for many things.

UTF-32 is the worst of all worlds. UTF-16 has the teeny tiny advantage that pure Chinese text takes a bit less space in UTF-16 than UTF-8 (typically irrelevant because that advantage is outweighed by the fact that the markup surrounding the text takes more space). UTF-8 is the best option for pretty much everything.

As a consequence, never use UTF-32, only use UTF-16 where necessary due to backwards compatibility, always use UTF-8 where possible.

show 1 reply