I wonder if these publishers would be more amenable to a private archiver that only serves registered academic / journalistic research projects (the way most physical private archives do), with a specific provision to never provide data to companies that would resell it or use it for training of generative models.
Yes. Most publishers already do syndication deals. This is a fine idea.
The problem with the LLMs is they capture the value chain and give back nothing. It didn’t have to be this way. It still doesn’t.
They probably have internal archives if they're smart; but that isn't accessible to the public. I think the issue isn't whether the data is archived, but whether that information is available to the public for the foreseeable future.
They already have archives with online and printed articles which they license to libraries, because the libraries take care of rate limiting and limiting abuse.