> There must be a ton of companies with very large document collections at this point See, I do...

mrweasel • yesterday at 10:35 AM • 2 replies • view on HN

> There must be a ton of companies with very large document collections at this point

See, I don't think there is, I don't think they want that expense. It's basically the Linus Torvalds philosophy of data storage, if it's on the Internet, I don't need a backup. While I have absolutely no proof of this, I'd guess that many AI companies just crawl the Internet constantly, never saving any of the data. We're seeing some of these scrapers go to great length attempting to circumvent any and all forms of caching, they aren't interested in having a two week old copy of anything.

Replies

kelvinjps10 • yesterday at 4:28 PM

Where did Linus Torvalds expressed this philosophy I have never seen it

➕ show 1 reply

n1xis10t • yesterday at 3:41 PM

Could be. Can you train a model without saving things though?

alt Hacker News

Replies