logoalt Hacker News

sergiotapia07/31/20250 repliesview on HN

Use pymupdf to extract the PDF text. Hell, run that nasty business through an LLM as step-2 to get a beautiful clean markdown version of the text. Lord knows the PDF format is horribly complex!