logoalt Hacker News

brumaryesterday at 10:06 PM3 repliesview on HN

Tangentially related: I don't think OCR is the right term and I am generally vocal about that. But seeing this unquestioned here, I am wondering if I am the one who is wrong here. Is it ok to call this OCR? To me ocr means text in the end, not visual tokens.


Replies

parsimo2010yesterday at 10:10 PM

OCR means optical character recognition. The terms do not require a direct transcription, but that is mostly what OCR meant in the past. If you’re using an LLM’s vision capability to pass in text and the LLM actually understands it, then I would say that it recognized the characters, hence OCR seems okay to use.

TurdF3rgusonyesterday at 10:15 PM

It's not. OCR is not what the vision model is doing here. We're used to using OCR as a verb but it's more accurate to say the model "visioned" it.

Also, some models still do OCR and it's usually way more expensive that way.

devmoryesterday at 10:12 PM

So if I OCR a document, edit it, and print it, OCR didn't happen?