> Unfortunately even podman etc.. are still limited by OCIs decision to copy the Docker model. ...

ragall • yesterday at 6:12 PM • 1 reply • view on HN

> Unfortunately even podman etc.. are still limited by OCIs decision to copy the Docker model.

Which parts of the model are you referring to ?

Replies

OCI Container Runtimes like OCI's runc are "container runtimes", so the runtime spec[2]

Basically, docker started using lxc, but wanted a go native option, and wrote runc. If you look at [0] you can see how it actually instantiates the container. Here is a random blog that describes it fairly well [1]

crun is the podman related project written in c, which is more efficient than the go based runc.

You can try this even as the user nobody 65534:65534, but you may need to make some dirs, or set envs.

Here is an example pulling an image with podman to make it easier, but you could just make an OCI spec bundle and run it:

    mkdir hello
    cd hello
    podman pull docker.io/hello-world
    podman export $(podman create hello-world) > hello-world.tar
    mkdir rootfs
    tar -C rootfs -xf hello-world.tar
    runc spec --rootless
    sed -i 's;"sh";"/hello";' config.json
    runc run container1
    
    Hello from Docker!

runc doesn't support any form of constraints like a bounding set on seccomp, selinux, apparmor, etc.. but it will apply profiles you pass it.

Basically it fails open, and with the current state of apparmor and selinux it is trivial to bypass the minimal userns restrictions they place.

Historically, before rootless containers this was less of an issue, because you had to be a privileged user to launch a container. But with the holes in the LSMs, no ability to set administrative bounding sets, and the reality that none of the defaults constrain risky kernel functionality like vsock, openat2 etc... there are a million ways to break netns isolation etc...

Originally the docker project wanted to keep all the complexity of mutating LSM rules etc... in containerd. and they also fought even basic controls like letting an admin disable the `--privileged` flag at the daemon level.

Unfortunately due to momentum, opinions, and friction in general, that means that now those container runtimes have no restrictions on callers, and cannot set reasonable defaults.

Thus now we have to resort to teaching every person who launches a container to be perfect and disable everything, which they never do.

If you run a k8s cluster with nodes on VMs, try this for example, if it doesn't error out, any pod can talk to any other pod on the node, with a protocol you aren't logging, and which has limited ability to log anyway. (if your k8s nodes are running systemd v256+ and you aren't using containerd which blocked vsock, but cri-o, podman, etc... don't (at least up to a couple of weeks ago)

    socat - VSOCK-LISTEN:3000

You can also play around with other af_families as IPX, Appletalk, etc... are all available by default, or see if you can use openat2 to use some file in /proc to break out.

[0] https://manpages.debian.org/testing/runc/runc-spec.8.en.html [1] https://mkdev.me/posts/the-tool-that-really-runs-your-contai... [2] https://github.com/opencontainers/runtime-spec/blob/main/REA...

alt Hacker News

Replies