> they would likely obey robots.txt
If only... Despite providing a useful service, they are not as nice towards site owners as one would hope.
Internet Archive says:
> We see the future of web archiving relying less on robots.txt file declarations geared toward search engines
https://blog.archive.org/2017/04/17/robots-txt-meant-for-sea...
They are not alone in that. The "Archiveteam", a different organization, not to be confused with archive.org, also doesn't respect robots.txt according to their wiki: https://wiki.archiveteam.org/index.php?title=Robots.txt
I think it is safe to say that there is little consideration for site owners from the largest archiving organizations today. Whether there should be is a different debate.
What an absolutely insufferable explanation from ArchiveTeam. What else do you expect from an organization aggressively crawling websites and bringing them down to their knees because they couldn't care less?
It seems like the general problem is that the original common usage of robots.txt was to identify the parts of a site that would lead a recursive crawler into an infinite forest of dynamically generated links, which nobody wants, but it's increasingly being used to disallow the fixed content of the site which is the thing they're trying to archive and which shouldn't be a problem for the site when the bot is caching the result so it only ever downloads it once. And more sites doing the latter makes it hard for anyone to distinguish it from the former, which is bad for everyone.
> The "Archiveteam", a different organization, not to be confused with archive.org, also doesn't respect robots.txt according to their wiki
"Archiveteam" exists in a different context. Their usual purpose is to get a copy of something quickly because it's expected to go offline soon. This both a) makes it irrelevant for ordinary sites in ordinary times and b) gives the ones about to shut down an obvious thing to do, i.e. just give them a better/more efficient way to make a full archive of the site you're about to shut down.