logoalt Hacker News

nucleardogtoday at 2:37 PM2 repliesview on HN

Yeah, this is where I landed a while ago. What problem am I _really_ trying to solve?

For some people it's an ideological one--we don't want AI vacuuming up all of our content. For those, "is this an AI user?" is a useful question to answer. However it's a hard one.

For many the problem is simply "there are a class of users that are putting way too much load on the system and it's causing problems". Initially I was playing wack-a-mole with this and dealing with alerts firing on a regular basis because of Meta crawling our site very aggressively, not backing off when errors were returned, etc.

I looked at rate limiting but the work involved in distributed rate limiting versus the number of offenders involved made the effort look a little silly, so I moved towards a "nuke it from orbit" strategy:

Requests are bucketed by class C subnet (31.13.80.36 -> 31.13.80.x) and request rate is tracked over 30 minute windows. If the request rate over that window exceeds a very generous threshold I've only seen a few very obvious and poorly behaved crawlers exceed it fires an alert.

The alert kicks off a flow where we look up the ASN covering every IP in that range, look up every range associated with those ASNs, and throw an alert in Slack with a big red "Block" button attached. When approved, the entire ASN is blocked at the edge.

It's never triggered on anything we weren't willing to block (e.g., a local consumer ISP). We've dropped a handful of foreign providers, some "budget" VPS providers, some more reputable cloud providers, and Facebook. It didn't take long before the alerts stopped--both for high request rates and our application monitoring seeing excessive loads.

If anyone's interested in trying to implement something similar, there's a regularly updated database of ASN <-> IP ranges announced here: https://github.com/ipverse/asn-ip


Replies

embedding-shapetoday at 3:54 PM

> If anyone's interested in trying to implement something similar, there's a regularly updated database of ASN <-> IP ranges announced here: https://github.com/ipverse/asn-ip

What exactly is the source of these mappings? Never heard about ipverse before, seems to be a semi-anonymous GitHub organization and their website has had a failing certificate for more than a year by now.

show 1 reply
sgctoday at 3:13 PM

You ban the ASN permanently in this scenario?