The "book scanning" hypothesis doesn't sound so bad — but couldn't it simply be ...

xg15 • yesterday at 9:57 AM • 1 reply • view on HN

The "book scanning" hypothesis doesn't sound so bad — but couldn't it simply be OCR bias? I imagine it's pretty easy for OCR software to misrecognize hyphens or other kinds of dashes as em-dashes if the only distinction is some subtle differences in line length.

Replies

flowerthoughts • yesterday at 3:25 PM

You'd think context-less OCR would prefer interpreting it as a simple hyphen, since that's the most common dash. Seems unlikely any bias would go the other way.

alt Hacker News

Replies