I'm struggling with the caching right now. I'm trying to switch from the Github actions to just running stuff in containers, and it works. Except for caching.
Buildkit from Docker is just a pure bullshit design. Instead of the elegant layer-based system, there's now two daemons that fling around TAR files. And for no real reason that I can discern. But the worst thing is that the caching is just plain broken.
I went down this rabbit hole before, you have to ignore all the recommended approaches. The real solution is to have a build server with a global Docker install and a script to prune cache when the disk usage goes above a certain percentage. Cache is local and instant. Pushing and pulling cache images is an insane solution.
The layers are tar files, I’m confused what behavior you actually want that isn’t supported.
Huh?
Each layer is a tarball.
So build your tarballs (concurrently!), and then add some metadata to make an image.
From your comment elsewhere it seems maybe you are expecting the docker build paradigm of running a container and snapshotting it at various stages.
That is messy and has a number of limitations — not the least of which is cross-compilation. Reproducibility being another. But in any case, that definitely not what these rules are trying to do.
Buildkit can be very efficient at caching, but you need to design your image build around it. Once any step encounters a cache miss, all remaining steps will too.
I'd also avoid loading the result back into the docker daemon unless you really need it there. Buildkit can output directly to a registry, or an OCI Layout, each of which will maintain the image digest and support multi-platform images (admittedly, those problems go away with the containerd storage changes happening, but it's still an additional export/import that can be skipped).
All that said, I think caching is often the wrong goal. Personally, I want reproducible builds, and those should bypass any cache to verify each step always has the same output. Also, when saving the cache, every build caches every step, even if they aren't used in future builds. As a result, for my own projects, the net result of adding a cache could be slower builds.
Instead of catching the image build steps, I think where we should be spending a lot more effort is in creating local proxies of upstream dependencies, removing the network overhead of pulling dependencies on every build. Compute intensive build steps would still be slow, but a significant number of image builds could be sped up with a proxy at the CI server level without tuning builds individually.