How do you guys, who run Docker in production deal with managing nftables firewall on hosts running containers? By design docker daemon creates and manages a set of firewall rules to forward traffic between containers and ingress traffic into containers as well as masquarades the outgoing container traffic. That is all well until admin needs to alter hosts firewall to allow and deny other traffic unrelated to docker - and restarting nftables or even applying new nftables rules usually ( flush ruleset in /etc/nftables.conf ) purges all the docker created rules and effectively breaks everything until docker daemon is restarted and rules re-created. I have partially solved this by using nftables filter chains with different names - admin_input/admin_output and using input hook with negative priority - so that traffic I choose to block is evaluated before docker rules are applied - that feels a bit like hack, but so far is the only way I have found. It is good practice in this day and age to run local firewalls on all hosts with policy deny, so that only traffic explicitly allowed can pass, that can severely limit blast radius during compromise.
My containers run in dedicated "docker host" VMs. And I never expose ports on 0.0.0.0, just the private internal IP. Most (all) of my docker hosts do not have a public IP anyway. I use wireguard to access them myself. If they need to be public I reverse proxy with caddy from my web server (or use Authentik's embedded proxy). These servers have access to the same private LAN which could be hardened without having the issues you brought up.
By the way most docker based implementations do not actually need the userland proxy docker runs automatically. Disable it in /etc/docker/daemon.js
{
"userland-proxy": false
}I put a firewall ahead of the Docker host so that they aren't running on the same system. Docker can do what it wants to on the host without stepping on my firewall rules.
I use UFW, and this config: github.com/chaifeng/ufw-docker
The only modification is that I pin containers to an IPv4 address so I can limit the forward rule to that address.
I don't. I'd run other workloads on separate hosts
On my docker hosts there is no other traffic unrelated to docker. Everything goes in containers.
[dead]
I reverse proxy everything through a Caddy instance running on the same machine so I avoid the firewall dance entirely by just prefixing all my port assignments in the compose file with the loopback IP (eg. 127.0.0.1:3000:3000). Nftables denies all but 80 and 443 and I don't have to worry about restarts/flushes breaking things.