logoalt Hacker News

akerstentoday at 2:52 PM2 repliesview on HN

Unicode is both the best thing that's ever happened to text encoding and the worst. The approach I take here is to treat any text coming from the user as toxic waste. Assume it will say "Administrator" or "Official Government Employee" or be 800 pixels tall because it was built only out of decorative combining characters. Then put it in a fixed box with overflow hidden, and use some other UI element to convey things like "this is an official account."

The worst part that this article doesn't even touch on with normalizing and remapping characters is the risk your login form doesn't do it but your database does. Suddenly I can re-register an existing account by using a different set of codepoints that the login system doesn't think exists but the auth system maps to somebody else's record.


Replies

chuckadamstoday at 8:36 PM

> or be 800 pixels tall because it was built only out of decorative combining characters

Also known as Zalgo. But it seems most renderers nowadays overlay multiple combining marks over each other rather than stack them, which makes it look far less eldritch than it used to.

ElectricalUniontoday at 3:55 PM

For some sorts of "confusables", you don't even need Unicode in some cases. Depending on the cursed combination of font, kerning, rendering and display, `m` and `rn` are also very hard to distinguish.

show 2 replies