Explain to me how you self-host a git repo which is accessed millions of time a day from CI jobs pul...

machinationu • yesterday at 2:15 PM • 7 replies • view on HN

Explain to me how you self-host a git repo which is accessed millions of time a day from CI jobs pulling packages.

Replies

I'm not sure whether this question was asked in good faith, but is actually a damn good one.

I've looked into self hosting and git repo that has horizontal scalability, and it is indeed very difficult. I don't have the time to detail it in a comment here, but for anyone who is curious it's very informative to look at how GitLab handled this with gitaly. I've also seen some clever attempts to use object storage, though I haven't seen any of those solutions put heavily to the test.

I'd love to hear from others about ideas and approaches they've heard about or tried

https://gitlab.com/gitlab-org/gitaly

fulafel • today at 11:43 AM

Let's assume 3 million. That's about 30 per second.

From compute POV you can serve that with one server or virtual machine.

Bandwidth-wise, given a 100 MB repo size, that would make it 3.4 GB/s - also easy terrain for a single server.

➕ show 1 reply

fweimer • yesterday at 3:18 PM

These days, people solve similar problems by wrapping their data in an OCI container image and distribute it through one of the container registries that do not have a practically meaningful pull rate limit. Not really a joke, unfortunately.

➕ show 1 reply

ozim • yesterday at 2:19 PM

FTFY:

Explain to me how you self-host a git repo without spending any money and having no budget which is accessed millions of time a day from CI jobs pulling packages.

favflam • today at 2:58 AM

Is running the git binary as a read-only nginx backend not good enough? Probably not. Hosting tarballs is far more efficient.

adrianN • yesterday at 3:06 PM

You git init —-bare on a host with sufficient resources. But I would recommend thinking about your CI flow too.

➕ show 1 reply

alt Hacker News

Replies