logoalt Hacker News

Stagnantyesterday at 5:21 PM3 repliesview on HN

Chrome ships a local OCR model for text extraction from PDFs which is better than any of the VLM or open source OCR models i've tried. I had a few hundred gigs of old newspaper scans and after trying all the other options I ended up building a wrapper around the DLL it uses to get the text and bboxes. Performance and accuracy on another level compared to tesseract, and while VLM models sometimes produced good results they just seemed unreliable.

I've thought of open sourcing the wrapper but havent gotten around to it yet. I bet claude code can build a functioning prototype if you just point it to "screen_ai" dir under chrome's user data.


Replies

alviboyesterday at 11:28 PM

Is there a chance you'll open source the wrapper after all? It would help a lot of people like me. No pressure though, but now I really want to try it to OCR a bunch of Japanese scans I have lying around. Unfortunately, finding a good OCR for Japanese scans is still a huge problem in 2026.

zzleeperyesterday at 6:03 PM

Surprisingly, I have a few hundred gigs of old newspaper scans so am very curious.

How fast was it per page? Do you recall if it's CPU or GPU based? TY!

show 1 reply
mwcampbellyesterday at 6:48 PM

What's the name of this DLL? I assume it's separate from the monster chrome.dll, and that the model is proprietary.

show 1 reply