logoalt Hacker News

onion2ktoday at 6:12 AM1 replyview on HN

So fuzzycanary also checks user agents and won't show the links to legitimate search engines, so Google and Bing won't see them.

Unscrupulous AI scrapers will not be using a genuine UA string. They'll be using Google. You'll need to do reverse DNS check instead - https://developers.google.com/crawling/docs/crawlers-fetcher...


Replies

bakugotoday at 6:20 AM

Most AI scrapers use normal browser user agents (usually random outdated Chrome versions, from my experience). They generally don't fake the UAs of legitimate bots like Googlebot, because Googlebot requests coming from non-Google IP ranges would be way too easy to block.