Virtualizing Nvidia HGX B200 GPUs with Open Source

100 points • by ben_s • yesterday at 2:04 PM • 25 comments • view on HN

Comments

(author of the blog post here)

For me, the hardest part was virtualizing GPUs with NVLink in the mix. It complicates isolation while trying to preserve performance.

AMA if you want to dig into any of the details.

➕ show 2 replies

ckastner • yesterday at 5:32 PM

A lot of this coincides with my own experiments I did to pass-through consumer AMD GPUs into VMs [1], which the Debian ROCm Team uses in their CI.

The Debian package rocm-qemu-support ships scripts that facilitate most of this. I've since generalized this by adding NVIDIA support, but I haven't uploaded the new gpuisol-qemu package [2] to the official Archive yet. It still needs some polishing.

Just dumping this here, to add more references (especially the further reading section, the Gentoo and Arch wikis had a lot of helpful data).

[1]: https://salsa.debian.org/rocm-team/community/team-project/-/...

[2]: https://salsa.debian.org/ckk/gpu-isolation-tools

➕ show 1 reply

mindcrash • yesterday at 3:18 PM

In case all of this sounds interesting:

After skimming the article I noticed a large chunk of this article (specifically the bits on deattaching/attaching drivers, qemu and vfio) applies more or less to general GPU virtualization under Linux too!

1) Replace any "nvidia" for "amdgpu" for Team Red based setups when needed

2) The PCI ids are all different, so you'll have look them up with lspci yourselves

3) Note that with consumer GPU's you need to deattach and attach a pair of two devices (GPU video and GPU audio); else things might get a bit wonky

➕ show 1 reply

otterley • yesterday at 3:17 PM

Is Nvidia’s Fabric Manager and other control plane software Open Source? If so, that’s news to me. It’s not clear that anything in this article relates to Open Source at all; publishing how to do VM management doesn’t qualify. Maybe “open kimono.”

Also, how strong are the security boundaries among multiple tenants when configured in this way? I know, for example, that AWS is extremely careful about how hardware resources are shared across tenants of a physical host to prevent cross-tenant data leakage.

➕ show 1 reply

tptacek • yesterday at 4:17 PM

Did you ever manage to get vGPU's working in any other hardware configuration? I know it's not what Hx00 customers want. I bloodied my forehead on that for a month or two with Cloud Hypervisor --- I got to the "light reverse engineering of drivers" stage before walking away.

➕ show 1 reply

girfan • yesterday at 3:01 PM

Cool post. Have you looked at slicing a single GPU up for multiple VMs? Is there anything other than MIG that you have come across to partition SMs and memory bandwidth within a single GPU?

➕ show 2 replies

moondev • yesterday at 4:00 PM

In Shared NVSwitch Multitenancy Mode - are there any considerations for leveraging infiniband devices inside each vm at full performance?

➕ show 1 reply

tryauuum • yesterday at 5:16 PM

can someone explain me like I'm 10 what is a BAR?

Like it says something about mmaping 256 GB of per GPU. But wouldn't it waste 2T of RAM? or do I fail in my understanding of what "mmap" is as well..

EDIT: yes, seems like my understanding of mmap wasn't good, it wastes not RAM but address space

➕ show 1 reply

alt Hacker News

Virtualizing Nvidia HGX B200 GPUs with Open Source

Comments