Package managers keep using Git as a database, it never works out

703 points • by birdculture • yesterday at 12:46 PM • 387 comments • view on HN

Comments

iamwil • yesterday at 5:20 PM

This sounds like a missing piece of software in the OSS world. If you have the inclination, you should write it.

➕ show 1 reply

keithgroves • yesterday at 4:14 PM

When building https:/enact.tools we considered this. I'm glad we didn't go this route.

miyuru • yesterday at 1:55 PM

Funnily enough, I clicked the homebrew GitHub link in the post, only to get a rate limited error page from GitHub.

pxc • yesterday at 7:24 PM

Loved this article. Just enough detail to make the broad scope compatible with a reasonable length, and well-argued.

I feel sometimes like package management is a relatively second-class topic in computer science (or at least among many working programmers). But a package manager's behavior can be the difference between a grotesque, repulsive experience and a delightful, beautiful one. And there aren't quite yet any package managers that do well everything that we collectively have learned how to do well, which makes it an interesting space imo.

Re: Nixpkgs, interestingly, pre-flakes Nix distributes all of the needed Nix expressions as tarballs, which does play nice with CDNs. It also distributes an index of the tree as a SQLite database to obviate some of the "too many files/directories" problem with enumerating files. (In the meantime, Nixpkgs has also started bucketing package directories by name prefix, too.) So maybe there was a lesson learned here that would be useful to re-learn.

On the other hand, IIRC if you use the GitHub fetcher rather than the Git one, including for fetching flakes, Nix will download tarballs from GitHub instead of doing clones. Regardless, downloading and unpacking Nixpkgs has become kinda slow. :-\

weiwenhao • yesterday at 5:20 PM

For package management software that is rarely used, free is the biggest motivation.

grumbel • yesterday at 6:02 PM

Do we have distributed databases that regular users can clone, modify and merge?

sghiassy • yesterday at 3:35 PM

Use the git clone —shallow option and you’ll only download the most recent commits. Yeesh

➕ show 1 reply

born-jre • yesterday at 2:08 PM

lol I see this as I plan on using Git for my thing store. https://github.com/blue-monads/potatoverse

khc • yesterday at 10:19 PM

seems like the issue isn't with using git as a database, but using github as a distribution mechanism?

0xbadcafebee • yesterday at 3:10 PM

YOLO software engineering, the hallmark of the 21st century

frumplestlatz • yesterday at 2:19 PM

Since ~2002, Macports has used svn or git, but users, by default, rsync the complete port definitions + a server-generated index + a signature.

The index is used for all lookups; it can also be generated or incrementally updated client-side to accommodate local changes.

This has worked fine for literally decades, starting back when bandwidth and CPU power was far more limited.

The problem isn’t using SCM, and the solutions have been known for a very long time.

venturecruelty • yesterday at 11:25 PM

This is why I don't use programming languages that do that.

juped • yesterday at 9:45 PM

These are actually all problems with using Github as an ersatz CDN.

skywhopper • yesterday at 8:48 PM

Not sure I can agree with the takeaway. It works well at first, but doesn’t scale, so folks found workarounds. That’s how literally every working system grows. There are always bottlenecks eventually. And you address them when they become an issue, not five years earlier.

leoh • yesterday at 7:44 PM

The conclusion reached in this essay is 100% wrong. See " The reftable backend What it is, where it's headed, and why should you care?"

>With release 2.45, Git has gained support for the “reftable” backend to read and write references in a Git repository. While this was a significant milestone for Git, it wasn‘t the end of GitLab’s journey to improve scalability in repositories with many references. In this talk you will learn what the reftable backend is, what work we did to improve it even further and why you should care.

https://www.youtube.com/watch?v=0UkonBcLeAo

Also see Scalar, which Microsoft used to scale their 300GiB Windows repository, https://github.com/microsoft/scalar.

notorandit • yesterday at 7:13 PM

Repsy

stephenlf • yesterday at 6:18 PM

Omarchy

BlueTemplar • yesterday at 4:53 PM

Wait, isn't fossil based on sqlite ?

Or does fossil itself still have the same issues ?

holyknight • yesterday at 4:46 PM

It’s basically the same thing that always happens when you choose a technology because it’s convenient rather than a great fit for your problem. Sooner or later, you’ll hit a wall. Just because you can cook a salmon in your dishwasher doesn’t mean you should.

encom • yesterday at 2:34 PM

>[Homebrew] Auto-updates now run every 24 hours instead of every 5 minutes[...]

That is such an insane default, I'm at a loss for words.

➕ show 1 reply

gjvc • yesterday at 2:08 PM

sqlite seems to be ideal for a package manager

➕ show 2 replies

aniou • yesterday at 2:18 PM

As side note. Maybe someone knows, why rust devs chose an already used name for language changes proposal? "RFC" was already taken and well-established and I simply refuse to accept that someone wasn't aware about Request For Comments - and if it was true and clash was created deliberately, then it was rude and arrogant.

Every, ...king time, when I read something like "RFC 2789 introduced a sparse HTTP protocol." my brain suffers from a short-circuit. BTW: RFC 2789 is a "Mail Monitoring MIB".

➕ show 1 reply

eviks • yesterday at 1:39 PM

Indeed, the seductive nature of bad tools lying close to your hand - no need to lift your butt to get them!

unit149 • today at 12:23 AM

[dead]

alt Hacker News

Package managers keep using Git as a database, it never works out

Comments