logoalt Hacker News

PaulDavisThe1styesterday at 4:53 PM5 repliesview on HN

Several people in the comments seem to be blaming Github for taking this step for no apparent reason.

Those of us who self-host git repos know that this is not true. Over at ardour.org, we've passed the 1M-unique-IP's banned due to AI trawlers sucking our repository 1 commit at a time. It was killing our server before we put fail2ban to work.

I'm not arguing that the specific steps Github have taken are the right ones. They might be, they might not, but they do help to address the problem. Our choice for now has been based on noticing that the trawlers are always fetching commits, so we tweaked things such that the overall http-facing git repo works, but you cannot access commit-based URLs. If you want that, you need to use our github mirror :)


Replies

soraminazukitoday at 12:54 AM

Only they haven't started doing this right now. For many years, GitHub has been crippling unauthenticated browsing, doing it gradually to gauge the response. When unauthenticated, code search doesn't work at all and issue search stops working after like, 5 clicks at best.

This is egregious behavior because Microsoft hasn't been upfront about this while they were doing this. Many open source projects are probably unaware that their issue tracker has been walled off, creating headaches unbeknownst to them.

show 1 reply
hannobtoday at 5:21 AM

> Several people in the comments seem to be blaming Github for taking this step for no apparent reason.

I mean...

* Github is owned by Microsoft.

* The reason for this are AI crawlers.

* The reason AI crawlers exist in masses is an absurd hype around LLM+AI technology.

* The reason for that is... ChatGPT?

* The main investor of ChatGPT happens to be...?

show 1 reply
VladVladikofftoday at 1:24 AM

Have you noticed significant slowdown and CPU usage from failban with that many banned IPs? I saw it becoming a huge resource hog with far less IPs than that.

show 1 reply
knowitnonetoday at 12:15 AM

you mean AI crawlers from Microsoft, owners of Github?

show 2 replies
londons_exploreyesterday at 11:41 PM

Surely most AI trawlers have special support for git and just clone the repo once?

show 4 replies