"it is still a LLM and not a "pure" OCR"
When does a character model become a language model?
If you're looking at block text with no connections between letter forms, each character mostly stands on its own. Except capital letters are much more likely at the beginning of a word or sentence than elsewhere, so you probably get a performance boost if you incorporate that.
Now we're considering two-character chunks. Cursive script connects the letterforms, and the connection changes based on both the source and target. We can definitely get a performance boost from looking at those.
Hmm you know these two-letter groupings aren't random. "ng" is much more likely if we just saw an "i". Maybe we need to take that into account.
Hmm actually whole words are related to each other! I can make a pretty good guess at what word that four-letter-wide smudge is if I can figure out the word before and after...
and now it's an LLM.