Most likely, and probably inferring the structure on texts with "similar" writing forms. T...

lccerina • today at 1:11 PM • 4 replies • view on HN

Most likely, and probably inferring the structure on texts with "similar" writing forms. Tried with my handwriting (in italian) and the performance wasn't that stellar. More annoyingly, it is still a LLM and not a "pure" OCR, so some sentences were partially rephrased with different words than the one in the text. This is crucially problematic if they would be used to transcribe historical documents

Replies

embedding-shape • today at 1:15 PM

> Tried with my handwriting (in italian) and the performance wasn't that stellar.

Same here, for diaries/journals written in mixed Swedish/English/Spanish and with absolutely terrible hand-writing.

I'd love for the day where the writing is on the wall for handwriting recognition, which is something I bet on when I started with my journals, but seems that day has yet to come. I'm eager to get there though so I can archive all of it!

pbronez • today at 4:43 PM

"it is still a LLM and not a "pure" OCR"

When does a character model become a language model?

If you're looking at block text with no connections between letter forms, each character mostly stands on its own. Except capital letters are much more likely at the beginning of a word or sentence than elsewhere, so you probably get a performance boost if you incorporate that.

Now we're considering two-character chunks. Cursive script connects the letterforms, and the connection changes based on both the source and target. We can definitely get a performance boost from looking at those.

Hmm you know these two-letter groupings aren't random. "ng" is much more likely if we just saw an "i". Maybe we need to take that into account.

Hmm actually whole words are related to each other! I can make a pretty good guess at what word that four-letter-wide smudge is if I can figure out the word before and after...

and now it's an LLM.

butlike • today at 2:37 PM

So it doesn't work is what you're saying, right?

GaggiX • today at 1:27 PM

Are you sure to have used the Gemini 3.0 pro model? Maybe try increasing the media resolution on the AI studio if the text is small

alt Hacker News

Replies