> happy to run additional documents if people want to share examples
I've got one! The pdf of this out-of-print book is terrible: https://archive.org/details/oneononeconversa0000simo. The text is unreadably faint, and the underlying text layer is full of errors, so copy-paste is almost useless. Can your software extract usable text?
(I'll email you a copy of the pdf for convenience since the internet archive's copy is behind their notorious lending wall)
Results look pretty good (with the exception of one very faint page) - check it out here! https://platform.runpulse.com/dashboard/extractions/public/f...