"I/O is the bottleneck" is only true in the loose sense that "reading files" is slow.
Strictly speaking, the bottleneck was latency, not bandwidth.
Same thing applies to other system aspects:
compressing the kernel loads it faster on RAM even if it still has to execute the un compressing operation. Why?
Load from disk to RAM is a larger bottleneck than CPU uncompressing.
Same is applied to algorithms, always find the largest bottleneck in your dependent executions and apply changes there as the rest of the pipeline waits for it. Often picking the right algorithm “solves it” but it may be something else, like waiting for IO or coordinating across actors (mutex if concurrency is done as it used to).
That’s also part of the counterintuitive take that more concurrency brings more overhead and not necessarily faster execution speeds (topic largely discussed a few years ago with async concurrency and immutable structures).
there are a loooot of languages/compilers for which the most wall-time expensive operation in compilation or loading is stat(2) searching for files
Amazing article, thanks for sharing. I really appreciate the deep investigations in response to the comments
Zip with no compression is a nice contender for a container format that shouldn't be slept on. It effectively reduces the I/O, while unlike TAR, allowing direct random to the files without "extracting" them or seeking through the entire file, this is possible even via mmap, over HTTP range queries, etc.
You can still get the compression benefits by serving files with Content-Encoding: gzip or whatever. Though it has builtin compression, you can just not use that and use external compression instead, especially over the wire.
It's pretty widely used, though often dressed up as something else. JAR files or APK files or whatever.
I think the articles complaints about lacking unix access rights and metadata is a bit strange. That seems like a feature more than a bug, as I wouldn't expect this to be something that transfers between machines. I don't want to unpack an archive and have to scrutinize it for files with o+rxst permissions, or have their creation date be anything other than when I unpacked them.