logoalt Hacker News

thaumasiotesyesterday at 1:26 PM0 repliesview on HN

Well, as described...

> Here's how the disaster unfolded:

> 1. A user's container is under a brute-force attack, and /var/log/btmp grows to 11GB.

> 2. The user performs a commit, creating a new image layer.

> 3. A single new failed login is appended to /var/log/btmp.

> 4. Because of CoW, OverlayFS doesn't just write the new line. It copies the entire 11GB file into the new, upper layer.

> 5. This process repeated 271 times.

So the user is creating hundreds of layers for unclear reasons. The article refers to this as "exponential growth", but for that to be the case those commits would need to be triggered in proportion to the number of existing layers, which seems unlikely. Assuming the commits are caused by the user for reasons unrelated to the size of the existing image, this is growth that is quadratic† (in the number of layers; it's hard to characterize as a function of time or whatever), and it'd be nice to know why there were so many layers.

† Note that while the growth is technically quadratic, I don't think that impacted them. They say that the problem occurred when one 11GB file got copied into each of 272 image layers. That would require 2,992 GB, but they also say that the image exhibiting this problem was only 800GB.

I suspect that the answer here is that only some of the layers modified (and therefore copied) the log file. Probably about 72 of the layers. This is more like growth that's linear (still technically slightly superlinear, but probably not quadratic) in the number of failed SSH login attempts. ~75% of layers aren't contributing to the problem at all.