logoalt Hacker News

richardw10/01/20241 replyview on HN

Anyone know how companies like this maintain tabs on so much of the GitHub repos? I assume very distributed crawling/cloning.


Replies

mdaniel10/02/2024

I'd use their "firehose" API if I were doing it: <https://docs.github.com/en/rest/activity/events?apiVersion=2...> and <https://docs.gitlab.com/ee/api/events.html#list-a-projects-v...>

I don't have experience to know if that's cheaper (for the hoster) than just periodically calling the $(git fetch --mirror) endpoint. I could see opening a conversation with the major providers asking which they would prefer, since it's in everyone's best interest to not unduely hammer them

show 1 reply