logoalt Hacker News

mrweaselyesterday at 10:35 AM2 repliesview on HN

> There must be a ton of companies with very large document collections at this point

See, I don't think there is, I don't think they want that expense. It's basically the Linus Torvalds philosophy of data storage, if it's on the Internet, I don't need a backup. While I have absolutely no proof of this, I'd guess that many AI companies just crawl the Internet constantly, never saving any of the data. We're seeing some of these scrapers go to great length attempting to circumvent any and all forms of caching, they aren't interested in having a two week old copy of anything.


Replies

kelvinjps10yesterday at 4:28 PM

Where did Linus Torvalds expressed this philosophy I have never seen it

show 1 reply
n1xis10tyesterday at 3:41 PM

Could be. Can you train a model without saving things though?