>"apt update && apt upgrade", Across 10k-100k+ servers, all running services ...

hmmm-i-wonder • 12/11/2024 • 1 reply • view on HN

>"apt update && apt upgrade",

Across 10k-100k+ servers, all running services and needing to orchestrate restarting across the whole fleet, while providing 0 downtime or impact to thousands of clients with terabytes of data being processed and analyzed at any given time.

Sure whats so hard about changing a tire? Well try to do it on an 18-wheeler while its driving down the highway without any impact to its speed.

> Are you aware of how effective SELinux and systemd containers are? Just a simple firewall at the OS level?

Part of a layered and in-depth system but one that introduces complexity.

>Maybe even just using Tailscale (or the open source Headscale) to introduce zero trust access capabilities.

Tailscale in an enterprise production environment? Not going to pass any sort of security audit and probably violates a number of certifications customer require at the enterprise level for network access controls, visibility and auditing.

Just managing the git/jenkins/spinnaker/terraform infrastructure in dozens of locations deploying to and maintaining tens of thousands of servers/pods requires a 24x7 team on top of the hundreds of teams and tens of thousands of devs using it.

If you're small enough that doesn't make sense, then you might be small enough one Ops person can handle the load (One is never enough if you're smart but...), but you are dealing with a very small amount of infrastructure and services at this point.

Replies

CRConrad • 12/20/2024

> Across 10k-100k+ servers

If you "need" that many servers (and aren't Google), you've built your systems massively wrong.

alt Hacker News

Replies