For the "good" bots which at least respect robots.txt you can use this list to get ahead of them before they pummel your site.
https://github.com/ai-robots-txt/ai.robots.txt
There's no easy solution for bad bots which ignore robots.txt and spoof their UA though.
For those looking, this is the best I've found: https://blog.cloudflare.com/declaring-your-aindependence-blo...
Thanks, will look into that!
Such as OpenAI, who will ignore robots.txt and change their user agent to evade blocks, apparently[1]
1: https://www.reddit.com/r/selfhosted/comments/1i154h7/openai_...