I’m building Cargo/UV for C. Good article. I thought about this problem very deeply.
Unfortunately, when you’re starting out, the idea of running a registry is a really tough sell. Now, on top of the very hard engineering problem of writing the code and making a world class tool, plus the social one of getting it adopted, I need to worry about funding and maintaining something that serves potentially a world of traffic? The git solution is intoxicating through this lense.
Fundamentally, the issue is the sparse checkouts mentioned by the author. You’d really like to use git to version package manifests, so that anyone with any package version can get the EXACT package they built with.
But this doesn’t work, because you need arbitrary commits. You either need a full checkout, or you need to somehow track the commit a package version is in without knowing what hash git will generate before you do it. You have to push the package update and then push a second commit recording that. Obviously infeasible, obviously a nightmare.
Conan’s solution is I think just about the only way. It trades the perfect reproduction for conditional logic in the manifest. Instead of 3.12 pointing to a commit, every 3.x points to the same manifest, and there’s just a little logic to set that specific config field added in 3.12. If the logic gets too much, they let you map version ranges to manifests for a package. So if 3.13 rewrites the entire manifest, just remap it.
I have not found another package manager that uses git as a backend that isn’t a terrible and slow tool. Conan may not be as rigorous as Nix because of this decision but it is quite pragmatic and useful. The real solution is to use a database, of course, but unless someone wants to wire me ten thousand dollars plus server costs in perpetuity, what’s a guy supposed to do?
How about the Arch Linux AUR approach?
Every package has its own git repository which for binary packages contains mostly only the manifest. Sources and assets, if in git, are usually in separate repos.
This seems to not have the issues in the examples given so far, which come from using "monorepos" or colocating. It also avoids the "nightmare" you mention since any references would be in separate repos.
The problematic examples either have their assets and manifests colocated, or use a monorepo approach (colocating manifests and the global index).
The alluring thing is storing the repository on S3 (or similar). Recall early docker registries making requests so complicated that backing image storage with S3 was unfeasible, without a proxy service.
The thing that scales is dumb HTTP that can be backed by something like S3.
You don't have to use a cloud, just go with a big single server. And if you become popular, find a sponsor and move to cloud.
If money and sponsor independence is a huge concern the alternative would be: peer-to-peer.
I haven't seen many package managers do it, but it feels like a huge missed opportunity. You don't need that many volunteers to peer inorder to have a lot of bandwidth available.
Granted, the real problem that'll drive up hosting cost is CI. Or rather careless CI without caching. Unless you require a user login, or limit downloads for IPs without a login, caching is hard to enforce.
For popular package repositories you'll likely see extremely degenerate CI systems eating bandwidth as if it was free.
Disclaimer: opinions are my own.
Before you managed to build a popular tool it is unlikely that you need to serve many users. Directly going for something that can serve the world is probably premature
Is there a reason the users must see all of the historic data too? Why not just have a post-commit hook render the current HEAD to static files, into something like GitHub Pages?
That can be moved elsewhere / mirrored later if needed, of course. And the underlying data is still in git, just not actively used for the API calls.
It might also be interesting to look at what Linux distros do, like Debian (salsa), Fedora (Pagure), and openSUSE (OBS). They're good for this because their historic model is free mirrors hosted by unpaid people, so they don't have the compute resources.
> Unfortunately, when you’re starting out, the idea of running a registry is a really tough sell. Now, on top of the very hard engineering problem of writing the code and making a world class tool, plus the social one of getting it adopted, I need to worry about funding and maintaining something that serves potentially a world of traffic? The git solution is intoxicating through this lense.
So you need a decentralized database? Those exist (or you can make your own, if you're feeling ambitious), probably ones that scale in different ways than git does.
I wonder how meson wraps' story fits with this. They used not to, but now they're throwing everything into a single repository [0]. I wonder about the motivation and how it compares to your project.
0: https://github.com/mesonbuild/wrapdb/tree/master/subprojects
> I’m building Cargo/UV for C.
Interesting! Do you mind sharing a link to the project at this point?
Think about the article from a different perspective: several of the most successful and widely used package managers of all time started out using Git, and they successfully transitioned to a more efficient solution when they needed to.