logoalt Hacker News

Package managers keep using Git as a database, it never works out

703 pointsby birdcultureyesterday at 12:46 PM387 commentsview on HN

Comments

iamwilyesterday at 5:20 PM

This sounds like a missing piece of software in the OSS world. If you have the inclination, you should write it.

show 1 reply
keithgrovesyesterday at 4:14 PM

When building https:/enact.tools we considered this. I'm glad we didn't go this route.

miyuruyesterday at 1:55 PM

Funnily enough, I clicked the homebrew GitHub link in the post, only to get a rate limited error page from GitHub.

pxcyesterday at 7:24 PM

Loved this article. Just enough detail to make the broad scope compatible with a reasonable length, and well-argued.

I feel sometimes like package management is a relatively second-class topic in computer science (or at least among many working programmers). But a package manager's behavior can be the difference between a grotesque, repulsive experience and a delightful, beautiful one. And there aren't quite yet any package managers that do well everything that we collectively have learned how to do well, which makes it an interesting space imo.

Re: Nixpkgs, interestingly, pre-flakes Nix distributes all of the needed Nix expressions as tarballs, which does play nice with CDNs. It also distributes an index of the tree as a SQLite database to obviate some of the "too many files/directories" problem with enumerating files. (In the meantime, Nixpkgs has also started bucketing package directories by name prefix, too.) So maybe there was a lesson learned here that would be useful to re-learn.

On the other hand, IIRC if you use the GitHub fetcher rather than the Git one, including for fetching flakes, Nix will download tarballs from GitHub instead of doing clones. Regardless, downloading and unpacking Nixpkgs has become kinda slow. :-\

weiwenhaoyesterday at 5:20 PM

For package management software that is rarely used, free is the biggest motivation.

grumbelyesterday at 6:02 PM

Do we have distributed databases that regular users can clone, modify and merge?

sghiassyyesterday at 3:35 PM

Use the git clone —shallow option and you’ll only download the most recent commits. Yeesh

show 1 reply
born-jreyesterday at 2:08 PM

lol I see this as I plan on using Git for my thing store. https://github.com/blue-monads/potatoverse

khcyesterday at 10:19 PM

seems like the issue isn't with using git as a database, but using github as a distribution mechanism?

0xbadcafebeeyesterday at 3:10 PM

YOLO software engineering, the hallmark of the 21st century

frumplestlatzyesterday at 2:19 PM

Since ~2002, Macports has used svn or git, but users, by default, rsync the complete port definitions + a server-generated index + a signature.

The index is used for all lookups; it can also be generated or incrementally updated client-side to accommodate local changes.

This has worked fine for literally decades, starting back when bandwidth and CPU power was far more limited.

The problem isn’t using SCM, and the solutions have been known for a very long time.

venturecrueltyyesterday at 11:25 PM

This is why I don't use programming languages that do that.

jupedyesterday at 9:45 PM

These are actually all problems with using Github as an ersatz CDN.

skywhopperyesterday at 8:48 PM

Not sure I can agree with the takeaway. It works well at first, but doesn’t scale, so folks found workarounds. That’s how literally every working system grows. There are always bottlenecks eventually. And you address them when they become an issue, not five years earlier.

leohyesterday at 7:44 PM

The conclusion reached in this essay is 100% wrong. See " The reftable backend What it is, where it's headed, and why should you care?"

>With release 2.45, Git has gained support for the “reftable” backend to read and write references in a Git repository. While this was a significant milestone for Git, it wasn‘t the end of GitLab’s journey to improve scalability in repositories with many references. In this talk you will learn what the reftable backend is, what work we did to improve it even further and why you should care.

https://www.youtube.com/watch?v=0UkonBcLeAo

Also see Scalar, which Microsoft used to scale their 300GiB Windows repository, https://github.com/microsoft/scalar.

notorandityesterday at 7:13 PM

Repsy

stephenlfyesterday at 6:18 PM

Omarchy

BlueTemplaryesterday at 4:53 PM

Wait, isn't fossil based on sqlite ?

Or does fossil itself still have the same issues ?

holyknightyesterday at 4:46 PM

It’s basically the same thing that always happens when you choose a technology because it’s convenient rather than a great fit for your problem. Sooner or later, you’ll hit a wall. Just because you can cook a salmon in your dishwasher doesn’t mean you should.

encomyesterday at 2:34 PM

>[Homebrew] Auto-updates now run every 24 hours instead of every 5 minutes[...]

That is such an insane default, I'm at a loss for words.

show 1 reply
gjvcyesterday at 2:08 PM

sqlite seems to be ideal for a package manager

show 2 replies
aniouyesterday at 2:18 PM

As side note. Maybe someone knows, why rust devs chose an already used name for language changes proposal? "RFC" was already taken and well-established and I simply refuse to accept that someone wasn't aware about Request For Comments - and if it was true and clash was created deliberately, then it was rude and arrogant.

Every, ...king time, when I read something like "RFC 2789 introduced a sparse HTTP protocol." my brain suffers from a short-circuit. BTW: RFC 2789 is a "Mail Monitoring MIB".

show 1 reply
eviksyesterday at 1:39 PM

Indeed, the seductive nature of bad tools lying close to your hand - no need to lift your butt to get them!

unit149today at 12:23 AM

[dead]