I think there's a few things at play here
- AI scrapers will pull a bunch of docs from many sites in parallel (so instead of a human request where someone picks a single Google result, it hits a bunch of sites)
- AI will crawl the site looking for the correct answer which may hit a handful of pages
- AI sends requests in quick succession (big bursts instead of small trickle over longer time)
- Personal assistants may crawl the site repeatedly scraping everything (we saw a fair bit of this at work, they announced themselves with user agents)
- At work (b2b SaaS webapp) we also found that the personal assistant variety tended to hammer really computationally expensive data export and reporting endpoints generally without filters. While our app technically supported it, it was very inorganic traffic
That said, I don't think the solution is blanket blocks. Really it's exposing sites are poorly optimized for emerging technology.