Complex, stateful.
I'm not even sure what that would look like for a huge service like GitHub. Where do you hold those many thousands of concurrent http connections and their pending request queues in a way that you can make decisions on them while making more operational sense than a simple rate limit?
A lot of things would be easy if it were viable to have one big all-knowing giga load balancer.
I remember Rap Genius wrote a blog post whining that Heroku did random routing to their dynos instead of routing to the dyno with the shortest request queue. As opposed to just making an all-knowing infiniscaling giga load balancer that knows everything about the system.
> Where do you hold those many thousands of concurrent http connections and their pending request queues in a way that you can make decisions on them while making more operational sense than a simple rate limit?
Holding open an idle HTTP connection is cheap today. That's the use case for "async". Servicing a Github fetch is much more expensive.
A giga load balancer is no less viable than a giga Redis cache or a giga database. Rate limiting is inherently stateful - you can't rate limit a request without knowledge of prior requests, and that knowledge has to be stored somewhere. You can shift the state around, but you can't eliminate it.
Sure, some solutions tend to be more efficient than others, but those typically boil down to implementation details rather than fundamental limitations in system design.