I can't say for project Gutenberg specifically, but in general a huge issue I see is OCR errors...

freedomben • yesterday at 6:44 PM • 2 replies • view on HN

I can't say for project Gutenberg specifically, but in general a huge issue I see is OCR errors. What do you all do to address OCR?

Replies

gluejar • yesterday at 7:11 PM

Check out Distributed Proofreaders: https://pgdp.net

➕ show 1 reply

lapetitejort • yesterday at 6:47 PM

I uploaded a PDF to archive.org that auto-OCRs with plenty of mistakes. I have found no way of updating the entire stack of documents produced. I wonder if Project Gutenberg is similar

alt Hacker News

Replies