logoalt Hacker News

I rendered 1,418 confusables over 230 fonts. Most aren't confusable to the eye

52 pointsby paultendolast Wednesday at 12:30 PM23 commentsview on HN

Comments

recursivecaveattoday at 7:37 AM

This seems misguided. The fact that 'ρ' isn't a pixel for pixel match for 'p' doesn't mean they're not confusable. The threat model is not being unable to solve a spot-the-difference puzzle. Unless you are familiar with every pixel of your system fonts, and carefully scrutinize every character on your screen, the lack of an exact match in jρmorgan[.]com in a URL is going to do very little for you. There are many english characters that have multiple totally distinct ways to write them, so you can have two 'a' variants that are distinct but equally 'normal' looking. I guess if you get an LLM to write your blog posts they don't have to make much sense to begin with.

show 1 reply
Grom_PEtoday at 7:59 AM

0 and O, and l and I that look the same in a single font is a crime of modern typography.

Also, I remember 8x16 VGA font that came with KeyRus had some slight differences between Cyrillic and Latin lookalikes, that brought some strange sense of comfort when reading, and especially typing the letter c, because its Cyrillic lookalike is located on the same key.

apothegmlast Wednesday at 1:23 PM

Maybe not at super large font sizes. But even lowercase i and l are easy enough to confuse at a glance mid-word in most sans-serif fonts, not to mention uppercase I and lowercase l. You don’t even need “confusable” glyphs to create a domain name that will stand up to a casual visual confirmation from a busy user in a phishing context.

show 1 reply
vivid242today at 8:17 AM

Thanks for the effort!

I'm always intrigued by the German FE-Schrift ("fälschungserschwerende Schrift", "more-difficult-to-forge font") chooses shapes for characters that makes it hard for them to be turned into one another (like a 3 into an 8 or so):

https://en.wikipedia.org/wiki/FE-Schrift

show 2 replies
ordutoday at 7:41 AM

But what about 'Ы'? It looks like 'bl', doen't it? 'Ы' is one codepoint and one glyph, though 'bl' is a sequence of two letters. I believe that the method described will miss such things. Cyrillic also has 'Ю', I suppose it is possible to design a font that make it look like 'lO'? Are there any fonts like this in a wild?

Oarchlast Wednesday at 6:35 PM

This is really cool. I loved the technical breakdown and side by side comparisons. Surprised to hear that Microsoft and MacOS default fonts didn't score so well!

chiitoday at 6:50 AM

> A domain using only Cyrillic characters that happen to spell a Latin word (like “аpple” in all-Cyrillic) may still render in the address bar’s font and look identical.

that is very interesting.

I imagine the browser could take some context clues and switch rendering to puny code if the locale of the user is nowhere near a cyrillic region. But that is only going to patch some edge cases and miss others.

Ideally, the solution is password managers everywhere, which don't have this vulnerability, instead of using human eyes to visually recognize web urls and thus is vulnerable.

show 3 replies
Cool_Cariboutoday at 7:28 AM

Why are all the descending letters truncated in the titles? Not sure if it's a css glitch or terrible font choice. A bit ironic on an article about fonts.

show 1 reply
arlattimoretoday at 6:43 AM

This is very cool, impressive piece of work Paul.

doctorpanglosstoday at 6:04 AM

well, you didn't really do anything, did you? Claude Code rendered these things and wrote the blog post haha

> "This is not theoretical. It is a measured property of the font files shipping on every Mac."

some patterns of speech are so recognizably LLM, i am convinced that the AI detection startups have a very strong chance to succeed on text.

show 4 replies