Great post and amazing progress in this field! However, I have to wonder if some of these letters we...

coolness • today at 9:15 AM • 4 replies • view on HN

Great post and amazing progress in this field! However, I have to wonder if some of these letters were part of the training data for Gemini, since they are well-known and someone has probably already done the painstaking work of transcribing them...

Replies

lccerina • today at 1:11 PM

Most likely, and probably inferring the structure on texts with "similar" writing forms. Tried with my handwriting (in italian) and the performance wasn't that stellar. More annoyingly, it is still a LLM and not a "pure" OCR, so some sentences were partially rephrased with different words than the one in the text. This is crucially problematic if they would be used to transcribe historical documents

➕ show 4 replies

MrSkelter • today at 10:50 AM

I have a personal corpus of letters between my grandparents in WW2. My grandfather fighting in Europe and my grandmother in England. The ability of Claude and ChatGPT to transcribe them is extremely impressive. Though I haven’t worked on them in months and this uses older models. At that time neither system could properly organize pages though and chatGPT would sometimes skip a paragraph.

➕ show 1 reply

dmd • today at 11:54 AM

Possibly, but given it can also read my handwriting- which is much, MUCH worse than Boole’s - with better accuracy than any human I’ve given it to- that’s probably not the explanation.

suddenlybananas • today at 9:21 AM

Shhhhh no one cares about data contamination anymore.

➕ show 1 reply

alt Hacker News

Replies