logoalt Hacker News

derefryesterday at 6:59 PM3 repliesview on HN

I wonder if these publishers would be more amenable to a private archiver that only serves registered academic / journalistic research projects (the way most physical private archives do), with a specific provision to never provide data to companies that would resell it or use it for training of generative models.


Replies

eternauta3kyesterday at 7:54 PM

They already have archives with online and printed articles which they license to libraries, because the libraries take care of rate limiting and limiting abuse.

coffeefirstyesterday at 9:11 PM

Yes. Most publishers already do syndication deals. This is a fine idea.

The problem with the LLMs is they capture the value chain and give back nothing. It didn’t have to be this way. It still doesn’t.

ninjagooyesterday at 7:15 PM

They probably have internal archives if they're smart; but that isn't accessible to the public. I think the issue isn't whether the data is archived, but whether that information is available to the public for the foreseeable future.

show 1 reply