Several people in the comments seem to be blaming Github for taking this step for no apparent reason...

PaulDavisThe1st • 05/14/2025 • 6 replies • view on HN

Several people in the comments seem to be blaming Github for taking this step for no apparent reason.

Those of us who self-host git repos know that this is not true. Over at ardour.org, we've passed the 1M-unique-IP's banned due to AI trawlers sucking our repository 1 commit at a time. It was killing our server before we put fail2ban to work.

I'm not arguing that the specific steps Github have taken are the right ones. They might be, they might not, but they do help to address the problem. Our choice for now has been based on noticing that the trawlers are always fetching commits, so we tweaked things such that the overall http-facing git repo works, but you cannot access commit-based URLs. If you want that, you need to use our github mirror :)

Replies

soraminazuki • 05/15/2025

Only they haven't started doing this right now. For many years, GitHub has been crippling unauthenticated browsing, doing it gradually to gauge the response. When unauthenticated, code search doesn't work at all and issue search stops working after like, 5 clicks at best.

This is egregious behavior because Microsoft hasn't been upfront about this while they were doing this. Many open source projects are probably unaware that their issue tracker has been walled off, creating headaches unbeknownst to them.

➕ show 1 reply

hannob • 05/15/2025

> Several people in the comments seem to be blaming Github for taking this step for no apparent reason.

I mean...

* Github is owned by Microsoft.

* The reason for this are AI crawlers.

* The reason AI crawlers exist in masses is an absurd hype around LLM+AI technology.

* The reason for that is... ChatGPT?

* The main investor of ChatGPT happens to be...?

➕ show 1 reply

uallo • 05/15/2025

That is also a problem on a side project I've been running for several years. It is based on a heavily rate-limited third-party API. And the main problem is that bots often cause (huge) traffic spikes which essentially DDoSes the application. Luckily, a large part of these bots can easily be detected based on their behaviour in my specific case. I started serving them trash data and have not been DDoSed since.

VladVladikoff • 05/15/2025

Have you noticed significant slowdown and CPU usage from failban with that many banned IPs? I saw it becoming a huge resource hog with far less IPs than that.

➕ show 1 reply

knowitnone • 05/15/2025

you mean AI crawlers from Microsoft, owners of Github?

➕ show 2 replies

londons_explore • 05/14/2025

Surely most AI trawlers have special support for git and just clone the repo once?

➕ show 4 replies

alt Hacker News

Replies