logoalt Hacker News

mimim1miyesterday at 2:47 PM0 repliesview on HN

By definition, OCR means optical character recognition. It depends on the contents of the PDF what kind of extraction methodology can work. Often some available PDFs are just scans of printed documents or handwritten notes. If machine readable text is available your approach is great.