logoalt Hacker News

jonatronlast Thursday at 5:02 PM1 replyview on HN

I just looked at the logs for a site, and I saw PerplexityBot is looking at the robots.txt and ignoring it. They don't provide a list of IPs to verify if it is actually them. Anyway, just for anyone with PerplexityBot in their user agent, they can get increasingly bad responses until the abuse stops.


Replies

dawnerdlast Thursday at 6:18 PM

Perplexity is exceptionally bad because they say they respect the robots.txt but clearly don't. When pressed on it they basically shrug and say too bad not put stuff in public if you don't want it crawled. They got a UA block in cloudflare and seems like that did the trick.

show 2 replies