This is a completely in-scope question.
How do we defend against your scraping, OpenAI?
I dont want any of my content scraped or seen by you all. Frankly, fuck you all for thinking my content is owned by you.
I use nginx conditionals and useragent checking, then respond with 418 or 410.
Probably too late now but my list needs updating
robots.txt bro https://developers.openai.com/api/docs/bots/
It's documented here: https://developers.openai.com/api/docs/bots