logoalt Hacker News

antonyhtoday at 1:54 PM2 repliesview on HN

As facebookexternalhit is listed in the robots.txt, it does look like it's optimistically rechecking in the hope it's no longer disallowed. That rate of request is obscene though, and falls firmly into the category of Bad Bot.


Replies

RobotToastertoday at 3:43 PM

My guess is it's dutifully obeying it, not storing anything from the site and then exiting, without clearing the site from the crawl queue.

mghackerladytoday at 2:23 PM

That is probably the dumbest yet most genius solution to getting your scraper blocked I've ever seen