logoalt Hacker News

noirscapetoday at 9:54 AM8 repliesview on HN

I can understand in theory why they wouldn't want to back up .git folders as-is. Git has a serious object count bloat problem if you have any repository with a good amount of commit history, which causes a lot of unnecessary overhead in just scanning the folder for files alone.

I don't quite understand why it's still like this; it's probably the biggest reason why git tends to play poorly with a lot of filesystem tools (not just backups). If it'd been something like an SQLite database instead (just an example really), you wouldn't get so much unnecessary inode bloat.

At the same time Backblaze is a backup solution. The need to back up everything is sort of baked in there. They promise to be the third backup solution in a three layer strategy (backup directly connected, backup in home, backup external), and that third one is probably the single most important one of them all since it's the one you're going to be touching the least in an ideal scenario. They really can't be excluding any files whatsoever.

The cloud service exclusion is similarly bad, although much worse. Imagine getting hit by a cryptoworm. Your cloud storage tool is dutifully going to sync everything encrypted, junking up your entire storage across devices and because restoring old versions is both ass and near impossible at scale, you need an actual backup solution for that situation. Backblaze excluding files in those folders feels like a complete misunderstanding of what their purpose should be.


Replies

adithyassekhartoday at 11:27 AM

I don’t think this is the right way to see this.

Why should a file backup solution adapt to work with git? Or any application? It should not try to understand what a git object is.

I’m paying to copy files from a folder to their servers just do that. No matter what the file is. Stay at the filesystem level not the application level.

show 2 replies
rmccuetoday at 11:01 AM

I think it's understandable for both Backblaze and most users, but surely the solution is to add `.git` to their default exclusion list which the user can manage.

maalhamdantoday at 10:22 AM

I think they shouldn't back up git objects individually because git handles the versioning information. Just compress the .git folder itself and back it up as a single unit.

show 2 replies
rcxdudetoday at 11:03 AM

It's probably primarily because Linus is a kernel and filesystem nerd, not a database nerd, so he preferred to just use the filesystem which he understood the performance characteristics of well (at least on linux).

yangm97today at 11:30 AM

You don’t see ZFS/BTRFS block based snapshot replication choking on git or any sort of dataset. Use the right job for the tool or something.

ciupicritoday at 10:20 AM

> If it'd been something like an SQLite database instead (just an example really)

See Fossil (https://fossil-scm.org/)

P.S. There's also (https://www.sourcegear.com/vault/)

> SourceGear Vault Pro is a version control and bug tracking solution for professional development teams. Vault Standard is for those who only want version control. Vault is based on a client / server architecture using technologies such as Microsoft SQL Server and IIS Web Services for increased performance, scalability, and security.

grumbelbart2today at 11:05 AM

Git packs objects into pack-files on a regular basis. If it doesn't, check your configuration, or do it manually with 'git repack'.

show 1 reply