logoalt Hacker News

ritvikpandey21last Thursday at 5:28 PM3 repliesview on HN

we disagree! we've found llms by themselves aren't enough and suffer from pretty big failure modes like hallucination and inferring text rather than pure transcription. we wrote a blog about this [1]. the right approach so far seems to be a hybrid workflow that uses very specific parts of the language model architecture.

[1] https://www.runpulse.com/blog/why-llms-suck-at-ocr


Replies

mritchie712last Thursday at 5:40 PM

> Why LLMs Suck at OCR

I paste screenshots into claude code everyday and it's incredible. As in, I can't believe how good it is. I send a screenshot of console logs, a UI and some HTML elements and it just "gets it".

So saying they "Suck" makes me not take your opinion seriously.

show 2 replies
serjesterlast Thursday at 6:46 PM

This is a hand wavy article that dismisses away VLMs without acknowledging the real world performance everyone is seeing. I think it’d be far more useful if you published an eval.

mikert89last Thursday at 6:01 PM

one or two more model releases, and raw documents passed to claude will beat whatever prompt voodoo you guys are cooking

show 1 reply