Full fat VMs with GPU passthough I trust a lot less then CPU ones.

ZiiS • yesterday at 6:12 PM • 1 reply • view on HN

Replies

from my understanding, you can run the inference server (llama.cpp/vllm/whatever) and the agent/harness in different contexts, event different machines.

The risky part is in the agent/harness and what tools it has access to.

You don't need to give GPU passthrough to the VM running the agent/harness.

There is still a risk of a prompt messing with the inference server, but I think that's a much lower risk compared to an agent doing whatever on its own.

➕ show 1 reply

alt Hacker News

Replies