logoalt Hacker News

fngjdflmdflgyesterday at 7:10 PM2 repliesview on HN

These OCR improvements will almost certainly be brought to google books, which is great. Long term it can enable compressing all non-digital rare books into a manageable size that can be stored for less than $5,000.[0] It would also be great for archive.org to move to this from Tesseract. I wonder what the cost would be, both in raw cost to run, and via a paid API, to do that.

[0] https://annas-archive.org/blog/critical-window.html


Replies

levocardiayesterday at 11:05 PM

This is a really interesting "data flywheel" -- better model >> more usable data >> even better model

show 1 reply
kridsdale3yesterday at 9:16 PM

More Data for the Data Gods!