Question: do these bots not respect robots.txt? I haven't added these scrapers to my robots.t...

kerkeslager • 01/16/2025 • 4 replies • view on HN

Question: do these bots not respect robots.txt?

I haven't added these scrapers to my robots.txt on the sites I work on yet because I haven't seen any problems. I would run something like this on my own websites, but I can't see selling my clients on running this on their websites.

The websites I run generally have a honeypot page which is linked in the headers and disallowed to everyone in the robots.txt, and if an IP visits that page, they get added to a blocklist which simply drops their connections without response for 24 hours.

Replies

0xf00ff00f • 01/16/2025

> The websites I run generally have a honeypot page which is linked in the headers and disallowed to everyone in the robots.txt, and if an IP visits that page, they get added to a blocklist which simply drops their connections without response for 24 hours.

I love this idea!

➕ show 1 reply

Dwedit • 01/18/2025

Even something like a special URL that auto-bans you can be abused by pranksters. Simply embedding an <img> tag that fetches the offending URL could trigger it, as well as tricking people into clicking a link.

➕ show 2 replies

throw_m239339 • 01/16/2025

> Question: do these bots not respect robots.txt?

No they don't, because there is no potential legal liability for not respecting that file in most countries.

jonatron • 01/16/2025

You haven't seen any problems because you created a solution to the problem!

➕ show 1 reply

alt Hacker News

Replies