If you put something on the open web, as I see it, you only get so much say in what people do with it.
Yes, they can't publish it without attribution and/or compensation (copyright, at least currently, for better or worse). Yes, they shouldn't get to hammer your server with redundant brainless requests for thousands of copies of the same content that no human will ever read (abuse/DDOS prevention).
No, I don't think you get to decide what user agent your visitors are using, and whether that user agent will summarize or otherwise transform it, using LLMs, ad blockers, or 273 artisanal regular expressions enabling dark/bright/readable/pink mode.
> it makes sense for content owners who don't want AI trained on their content to poison it if possible. It's possibly the only way to keep the AI crawlers away.
How would that work? The crawler needs to, well, crawl your site to determine that it's full of slop. At that point, it's already incurred the cost to you.
I'm all for banning spammy, high-request-rate crawlers, but those you would detect via abusive request patterns, and that won't be influenced by tokens.