> If your K8s cluster (or etcd) shits the bed, everything dies. When etcd and/or kubelet s...

jchw • yesterday at 3:30 PM • 0 replies • view on HN

> If your K8s cluster (or etcd) shits the bed, everything dies.

When etcd and/or kubelet shits the bed, it shouldn't do anything other than halt scheduling tasks. The actual runtime might vary between setups, but typically containerd is the one actually handling the individual pod processes.

Of course, you can also run Kubernetes pods in a VM if you want to, there have always been a few different options for this. I think right now the leading option is Kata Containers.

Does using Kata Containers improve isolation? Very likely: you have an entire guest kernel for each domain. Of course, the entire isolation domain is subject to hardware bugs, but I think people do generally regard hardware security boundaries somewhat higher than Linux kernel security boundaries.

But, does using Kata Containers improve reliability? I'd bet not, no. In theory it would help mitigate reliability issues caused by kernel bugs, but frankly that's a bit contrived as most of us never or extremely infrequently experience the type of bug that mitigates. In practice, what happens is that the point of failure switches from being a container runtime like containerd to a VMM like qemu or Firecracker.

> The equivalent to that for VMs is the hypervisor dying, but IME it’s far more likely that K8s or etcd has an issue than a hypervisor. If nothing else, the latter as a general rule is much older, much more mature, and has had more time to work out bugs.

The way I see it, mature code is less likely to have surprise showstopper bugs. However, if we're talking qemu + KVM, that's a code base that is also rather old, old enough that it comes from a very different time and place for security practices. I'm not saying qemu is bad, obviously it isn't, but I do believe that many working in high-assurance environments have decided that qemu's age and attack surface is large enough to have become a liability, hence why Firecracker and Cloud Hypervisor exist.

I think the main advantage of a VMM remains the isolation of having an entire separate guest kernel. Though, you don't need an entire Linux VM with complete PC emulation to get that; micro VMs with minimal PC emulation (mostly paravirtualization) will suffice, or possibly even something entirely different, like the way gVisor is a VMM but the "guest kernel" is entirely userland and entirely memory safe.

alt Hacker News