logoalt Hacker News

Aachenyesterday at 3:41 PM1 replyview on HN

Yet they provide the user agents and IP address ranges which they scrape from, and say they respect robots.txt

I run a web server and so see a lot of scrapers, but OpenAI is one of the ones that appear to respect limits that you set. A lot of (if not most) others don't even have that ethics standard so I'd not say that "OpenAI scrapes everything they can access. Everything" without qualification, as that doesn't seem to be true, at least not until someone puts a file behind a robots deny page and finds that chatgpt (or another of openai's products) has knowledge of it


Replies

immibisyesterday at 7:11 PM

There's no evidence the barrage of residentially-proxied bot accesses hitting every public website have anything to do with OpenAI, but then again, there's also no evidence they don't.