logoalt Hacker News

joecool1029today at 2:30 AM1 replyview on HN

Which they don’t respect. I’ve had it for my blog for years and they still added it to wayback machine, see my last comment for their official announcement of the ignore robots.txt policy, it is not new.


Replies

socalgal2today at 3:56 AM

robots.txt means they shouldn't auto-scan your site. Any user though can go to the wayback machine and type in a URL and the wayback machine will read that URL. That was the intent of robots.txt (don't scan) not (don't read period). It's spelled out in the spec for robots.txt

show 1 reply