logoalt Hacker News

ben_slast Thursday at 3:39 PM1 replyview on HN

Fabric Manager itself is not open source. It's NVIDIA-provided software, and today it's required to bring up and manage the NVLink/NVSwitch fabric on HGX systems. What we meant by "open" is that everything around it - the hypervisor, our control plane logic, partition selection, host configuration, etc. - is implemented in the open and available in our repos. You're right that this isn't a fully open GPU stack.

On isolation: in Shared NVSwitch Multitenancy mode, isolation is enforced at multiple layers. Fabric Manager programs the NVSwitch routing tables so GPUs in different partitions cannot exchange NVLink traffic, and each VM receives exclusive ownership of its assigned GPUs via VFIO passthrough. Large providers apply additional hardening and operational controls beyond what we describe here. We're not claiming this is equivalent to AWS's internal threat model, but it does rely on NVIDIA's documented isolation mechanisms.


Replies