logoalt Hacker News

We reduced a container image from 800GB to 2GB

69 pointsby untrimmed10/27/202565 commentsview on HN

Comments

perlgeekyesterday at 12:02 PM

The real lesson they should learn is to not rely on running images and then using "docker commit" to turn it into an image, but instead to use proper image building tools.

If you absolutely have to do it that way, be very deliberate about what you actually need. Don't run an SSH daemon, don't run cron, don't an SMTP daemon, don't run the suite of daemons that run on a typical Linux server. Only run precisely what you need to create the files that you need for a "docker commit".

Each service that you run can potentially generate log files, lock files, temp files, named pipes, unix sockets and other things you don't want in your image.

Taking a snapshot from a working, regular VM and using that as a docker image is one of the worst ways to built one.

show 4 replies
cebertyesterday at 12:40 PM

I’m shocked that a company would share how amazingly bad their layer management had become. This may be a great internal blog, but I wouldn’t share it publicly.

show 4 replies
BinaryIgoryesterday at 11:36 AM

Interesting, although something about the language makes me think it was written by a LLM; I like the ending though:

"The key insight is to treat container images not as opaque black boxes, but as structured, manipulable archives. Deeply understanding the underlying technology, like the OCI image specification, allows for advanced optimization and troubleshooting that goes far beyond standard tooling. This knowledge is essential for preventing issues like Kubernetes disk space exhaustion before they start."

show 1 reply
gkfasdfasdfyesterday at 3:07 PM

> image-manip squash: This is the key to reclaiming disk space and the core of our strategy to squash the image layers. The tool creates a temporary container, applies all 272 layers in sequence to an empty root filesystem, and then exports the final, merged filesystem as a single new layer. This flattens the image's bloated history into a lean, optimized final state.

Wouldn't a multistage Dockerfile have accomplished the same thing? smth like

FROM bigimage

RUN rm bigfile

FROM scratch

COPY --from=0 / /

bmitch3020yesterday at 12:31 PM

This whole article could have been much better written as: learn to build images with a Dockerfile/ Containerfile or similar tooling, and store logs in a volume rather than the image filesystem. Everyone that builds a process around `docker commit` is simply in a race against time before they learn this lesson.

trenchpilgrimyesterday at 11:05 AM

In the comments: People who didn't read the article assuming they were literally building 800GB images (the example in the article is an 11GB image that was amplified by copying behaviors)

show 2 replies
zatkinyesterday at 12:34 PM

272 layers in a single image seems really unusual, is that just due to my lack of experience with containers? I've never seen an image with more than maybe a few dozen in my career...

show 3 replies
cannonpalmsyesterday at 12:50 PM

TIL of `docker commit`. What is the use case for this command? Quick debugging or something, to share with a coworker?

show 1 reply
SurceBeatsyesterday at 11:12 AM

Fascinating deep dive into OverlayFS CoW behavior. The 11GB btmp file getting copied 271 times is a perfect storm scenario. Did they consider mounting /var/log outside the image layers? Seems like that would prevent any log file from causing this amplification. Also interested in image-manip... Does it handle metadata differently than docker export/import?

show 2 replies
untrimmed10/27/2025

Our platform is designed to solve a very specific workflow, and the DevBox is only the first step in that process.

Our users need to connect their local VS Code, Cursor, or JetBrains IDEs to the cloud environment. The industry-standard extensions for this only speak the SSH protocol. So, to give our users the tools they love, the container must run an SSHD to act as the host.

We aren't just a CDE like Coder or Codespaces. We're trying to provide a fully integrated, end-to-end application lifecycle in one place.

The idea is that a developer on Sealos can:

1. Spin up their DevBox instantly. 2. Code and test their feature in that environment (using their local IDE). 3. Then, from that same platform, package their application into a production-ready, versioned image. 4. And finally, deploy that image directly to a production Kubernetes environment with one click.

That "release" feature was how we let a developer "snapshot" their entire working environment into a deployable image without ever having to write a Dockerfile.

show 3 replies
apexalphayesterday at 2:59 PM

Title makes it seem like 800GB images are a normal occurance: it is not.

2GB is the expected and default size for a docker image. It's a bit bloated even.

andaiyesterday at 1:05 PM

I'm not a sysadmin but doesn't the root cause sound like a missing fail2ban or something? (Sounds like a whole bunch of problems stacked on top of each other honestly.)

show 1 reply
hsbauauvhabzbyesterday at 10:55 AM

This seems very much like a ‘we mis configured our containers; then we realised, then we fixed it, then we blogged about it’ post of very little value.

show 1 reply
KronisLVyesterday at 11:26 AM

Images don't seem to be working:

https://sealos.io/_next/image?url=.%2Fimages%2Fcontainerd-hi...

https://sealos.io/_next/image?url=.%2Fimages%2Fbloated-conta...

Either way, hope the user was communicated with or alerted to what's going on.

At the same time, someone said that 800 GB container images are a problem in of themselves no matter the circumstances and they got downvoted for saying so - yet I mostly agree.

Most of mine are about 50-250 MB at most and even if you need big ones with software that's GB in size, you will still be happier if you treat them as something largely immutable. I've never had odd issues with them thanks to this. If you really care about data persistence, then you can use volumes/bind mounts or if you don’t then just throw things into tmpfs.

I'm not sure whether treating containers as something long lived with additional commmits/layers is a great idea, but if it works for other people, then good for them. Must be a pain to run something so foundational for your clients, though, cause you'll be exposed to most of the edge cases imaginable sooner or later.

reddozenyesterday at 11:05 AM

Is it spooky that they said they looked inside a customer's image to fix this? A bunch of engineers just had access to their customer's intellectual property, security keys, git repos, ...

show 1 reply
SJC_Hackeryesterday at 12:01 PM

I did something on a smaller scale by ripping out large parts of Boost, which was nearly 50% of the image size

thaumasiotesyesterday at 1:21 PM

What's up with the images that are supposed to appear in the article? They appear to be coded to load from "./images/containerd-high-disk-io-iotop.png", but https://sealos.io/blog/images/containerd-high-disk-io-iotop.... and https://sealos.io/images/containerd-high-disk-io-iotop.png both fail.

(And indeed, the images are broken in Firefox and Edge. Is there another browser where they're not broken?)

BoredPositronyesterday at 11:01 AM

If your image is 800GB you are doing something wrong in the first place.

show 1 reply