You cant really cache the dynamic content produced by the forges like Gitlab and, say, web forums like phpbb. So it means every request gets through the slow path. Media/JS is of course cached on the edge, so it's not an issue.
Even when the amount of AI requests isnt that high - generally it's in hundreds per second tops for our services combined - that's still a load that causes issues for legitimate users/developers. We've seen it grow from somewhat reasonable to pretty much being 99% of responses we serve.
Can it be solved by throwing more hardware at the problem? Sure. But it's not sustainable, and the reasonable approach in our case is to filter off the parasitic traffic.
You kind of can though. You serve cached assets and then use JavaScript to modify it for the individual user. The specific user actions can't be cached, but the rest of it can.
Thanks, appreciate the details. 99% is far above the amount I expected, and if it specifically hits hard to cache data then I can see how that brings a system to its knees.