logoalt Hacker News

xg15yesterday at 9:57 AM1 replyview on HN

The "book scanning" hypothesis doesn't sound so bad — but couldn't it simply be OCR bias? I imagine it's pretty easy for OCR software to misrecognize hyphens or other kinds of dashes as em-dashes if the only distinction is some subtle differences in line length.


Replies

flowerthoughtsyesterday at 3:25 PM

You'd think context-less OCR would prefer interpreting it as a simple hyphen, since that's the most common dash. Seems unlikely any bias would go the other way.