logoalt Hacker News

danglast Thursday at 6:34 PM2 repliesview on HN

> happy to run additional documents if people want to share examples

I've got one! The pdf of this out-of-print book is terrible: https://archive.org/details/oneononeconversa0000simo. The text is unreadably faint, and the underlying text layer is full of errors, so copy-paste is almost useless. Can your software extract usable text?

(I'll email you a copy of the pdf for convenience since the internet archive's copy is behind their notorious lending wall)


Replies

ritvikpandey21last Thursday at 7:54 PM

Results look pretty good (with the exception of one very faint page) - check it out here! https://platform.runpulse.com/dashboard/extractions/public/f...

show 1 reply