logoalt Hacker News

walskilast Monday at 7:28 PM1 replyview on HN

IA has not honored robots.txt for the better part of a decade now.

https://blog.archive.org/2017/04/17/robots-txt-meant-for-sea...


Replies

lxgryesterday at 10:22 AM

Are you sure? The article (from 2017) you've linked only mentions "U.S. government and military web sites", and their wayback machine FAQ still mentions that robots.txt "might" prevent crawling:

https://help.archive.org/help/using-the-wayback-machine/