Indeed, that's why Anthropic, OpenAI and other LLM providers are known to adhere to published APIs to gather the world's data, obeying licensing and ROBOTS.txt.
I was under the impression that they do obey robots.txt now? There are clearly a lot of dumb agents that don’t, but didn’t think it was the major AI labs.
I was under the impression that they do obey robots.txt now? There are clearly a lot of dumb agents that don’t, but didn’t think it was the major AI labs.