we disagree! we've found llms by themselves aren't enough and suffer from pretty big failu...

ritvikpandey21 • last Thursday at 5:28 PM • 3 replies • view on HN

we disagree! we've found llms by themselves aren't enough and suffer from pretty big failure modes like hallucination and inferring text rather than pure transcription. we wrote a blog about this [1]. the right approach so far seems to be a hybrid workflow that uses very specific parts of the language model architecture.

[1] https://www.runpulse.com/blog/why-llms-suck-at-ocr

Replies

mritchie712 • last Thursday at 5:40 PM

> Why LLMs Suck at OCR

I paste screenshots into claude code everyday and it's incredible. As in, I can't believe how good it is. I send a screenshot of console logs, a UI and some HTML elements and it just "gets it".

So saying they "Suck" makes me not take your opinion seriously.

➕ show 2 replies

serjester • last Thursday at 6:46 PM

This is a hand wavy article that dismisses away VLMs without acknowledging the real world performance everyone is seeing. I think it’d be far more useful if you published an eval.

mikert89 • last Thursday at 6:01 PM

one or two more model releases, and raw documents passed to claude will beat whatever prompt voodoo you guys are cooking

➕ show 1 reply

alt Hacker News

Replies