In this case it is actually OpenAI, the IP (74.7.175.182) is in one of their published ranges (74.7....

jsheard • last Monday at 3:05 PM • 2 replies • view on HN

In this case it is actually OpenAI, the IP (74.7.175.182) is in one of their published ranges (74.7.175.128/25).

I don't know if imitating a major crawler is really worth it, it may work against very naive filters, but it's easy to definitively check whether you're faking so it's just handing ammo to more advanced filters which do check.

  $ curl -I https://www.cloudflare.com
  HTTP/2 200

  $ curl -I -H "User-Agent: Googlebot" https://www.cloudflare.com
  HTTP/2 403

Replies

btown • last Monday at 5:36 PM

I don't have a statistic here, but I'm always surprised how many websites I come across that do limited user-agent and origin/referrer checks, but don't maintain any kind of active IP based tracking. If you're trying to build a site-specific scraper and are getting blocked, mimicking headers is an easy and often sufficient step.

➕ show 1 reply

Aurornis • last Monday at 3:06 PM

Thanks for looking it up!

alt Hacker News

Replies