logoalt Hacker News

duskwufflast Sunday at 12:58 AM1 replyview on HN

That's less helpful than you might imagine - gzip isn't seekable by default; if all you know is the seek point, you still have to decompress everything up to that point to start decompressing from there. And if you have to do that, reading the tar headers as you go isn't a serious burden.

What might help is saving the state of the decompressor periodically, rather than just the index in the file. But that's getting pretty far into the weeds for an optimization to an infrequently used feature.


Replies

mikepurvisyesterday at 1:46 PM

Interesting, yeah that makes sense— and I agree, that would be tricky to figure out the proper balance of caching the actual contents somewhere vs just caching the decompressor state, and whether that caching goes to RAM or disk. There isn't an obvious right answer for either, nor is there necessarily a right way to expose that option to the user.

Can definitely see why systems like python's wheel would choose zip as it's just always been natively seekable out of the box. I believe Nix now does something similar with flake repo archives being zipfiles in the store, as they can be seeked and evaluated without a full decompression, saving a lot of disk space.