Modern OCR tooling is quite good. If the knowledge you are adding into your search database is able to be OCR'd then I think the approach we took here is able to be generalized.