Right, it can do modern writing but anything older than a century ( church records and census)and it...

myth_drannon • today at 1:51 PM • 1 reply • view on HN

Right, it can do modern writing but anything older than a century ( church records and census)and it produces garbage. Yandex Archives figured that out and have CER in a single digit but they have the resources to collect immense data for training. I'm slowly building a dataset for finetuning TROCR model and the best it can do is CER 18% ... which is sort of readable.

Replies

coredog64 • today at 3:29 PM

How do you do, fellow TrOCR fine-tuner?

I'm using TrOCR because it's a smaller model that I can fine tune on a consumer card, but the age of the model and resources certainly make it a challenge. The official notebook for fine tuning hasn't been updated in years and has several errors due to the march of progress in the primary packages.

alt Hacker News

Replies