logoalt Hacker News

ZiiSyesterday at 6:12 PM1 replyview on HN

Full fat VMs with GPU passthough I trust a lot less then CPU ones.


Replies

elsombreroyesterday at 6:37 PM

from my understanding, you can run the inference server (llama.cpp/vllm/whatever) and the agent/harness in different contexts, event different machines.

The risky part is in the agent/harness and what tools it has access to.

You don't need to give GPU passthrough to the VM running the agent/harness.

There is still a risk of a prompt messing with the inference server, but I think that's a much lower risk compared to an agent doing whatever on its own.

show 1 reply