robots.txt means they shouldn't auto-scan your site. Any user though can go to the wayback mach...

socalgal2 • today at 3:56 AM • 1 reply • view on HN

robots.txt means they shouldn't auto-scan your site. Any user though can go to the wayback machine and type in a URL and the wayback machine will read that URL. That was the intent of robots.txt (don't scan) not (don't read period). It's spelled out in the spec for robots.txt

Replies

keane • today at 4:40 AM

The <meta name="robots"> tag and robots.txt serve different roles: robots.txt controls crawling, while the robots meta tag influences indexing and other behavior. https://developer.mozilla.org/en-US/docs/Web/HTML/Reference/...

I wonder how archive.org_bot behaves when <meta name="robots" content="noindex, noarchive, nocache" /> is present.

alt Hacker News

Replies