logoalt Hacker News

axelriettoday at 10:35 AM1 replyview on HN

“No, there is another”—Yoda, The Empire Strikes Back :)

What you describe carries the risk that secrets end up in crash dumps and be exfiltrated.

Imagine an attacker owns the host to some extent and can do that. The data is then on disk first, then stored somewhere else.

You probably need per-tenant/per-VM encryption in your cache, since you can never protect against someone with elevated privileges from crashing or dumping your process, memory-safe or not.

Then someone can try to DoS you, etc.

Finally it’s not good practice to mix tenant’s secrets in hostile multi-tenancy environments, so you probably need a cache per VM in separate processes…

IMHO, an alternative is to keep the VM's private data inside the VMs, not on the host.

Then the real wtf is the unsecured HTTP endpoint, an open invitation for “explorations” of the host (or the SoC when they get there) on Azure.

eBPF+signing agent helps legitimate requests but does nothing against attacks on the server itself; say, you send broken requests hoping to hit a bug. It does not matter if they are signed or not.

This is a path to own the host, an unnecessary risk with too many moving parts.

Many VM escapes abuse a device driver, and I trust the kernel guys who write them a lot more than the people who write hostable web servers running inproc on the host.

Removing these was a subject of intense discussions (and pushbacks from the owning teams) but without leaking any secret I can tell you that a lot of people didn’t like the idea of a customer-facing web server on the nodes.


Replies

cyberaxtoday at 5:32 PM

Of course, putting the metadata service into its own separate system is better. That's how Amazon does it with the modern AWS. A separate Nitro card handles all the networking and management.

But if you're within the classic hypervisor model, then it doesn't really matter that much. The attack surface of a simple plain HTTP key-value storage is negligible compared to all other privileged code that needs to run on the host.

Sure, each tenant needs to have its own instance of the metadata service, and it should be bound to listen on the tenant-specific interface. AWS also used to set the max TTL on these interface to 1, so the packets would be dropped by routers.