logoalt Hacker News

Fabricio20yesterday at 4:05 PM2 repliesview on HN

Since you had the logs for this, can you confirm the IP ranges they were operating from? You mention "Claudebot and GPTBot" but I'm guessing this is based off of the user-agent presented by the scrapers and could easily be faked to shift blame. I genuinely doubt Anthropic and such would be running scrapers that are this badly written/implemented, it doesnt make economic sense. I'd love to see some of the web logs from this if you'd be willing to share! I feel like this is just some of the old scraper bots now advertising themselves as AI bots to shift blame into the AI companies.


Replies

Tharreyesterday at 4:25 PM

There are a bit too many IPs to list but from my logs they're always of the form 74.7.2XX.* for GPTBot, matching OpenAIs published ip ranges[0].

So yes, they are definitely running scrapers that are this badly written.

Also old scraper bots trying to disguise themselves as GPTBot seems wholly unproductive, they're try to immitate users, not bots.

[0] https://openai.com/gptbot.json

embedding-shapeyesterday at 6:05 PM

> but I'm guessing this is based off of the user-agent presented by the scrapers and could easily be faked to shift blame

Yes, hence the "which was the only two I saw, but could have been forged".

> I'd love to see some of the web logs from this if you'd be willing to share!

Unfortunately not, I'm deleting any logs from the server after one hour, and also don't even log the full IP. I took a look now and none of the logs that still exists are from any user agent that looks like one of those bots.