I assume it doesn't work well for books that have non-text structured elements (code, diagrams, etc)or images (which is expected).
I wonder, is there some open source NN that can consume PDF pages and produce a "pure prose" version of it. Say, a page with mixed text and an image of a car engine would be output to the text and then a detailed description of the image, or what it is depicting.