Funny idea, some days ago i was really annoyed again by the idea that these AI crawlers still ignore all code licenses and train their models against any github repo no matter what so i quickly hammerd down this
-> https://github.com/voodooEntity/ghost_trap
basically a github action that extends your README.md with a "polymorphic" prompt injection. I run some "llm"s against it and most cases they just produced garbage.
Thought about also creating a JS variant that you can add to your website that will (not visible for the user) also inject such prompt injections to stop web crwaling like you described