Would it be possible to implement "virtual memory" for a GPU this way? Let's say you ...

spwa4 • last Thursday at 8:17 PM • 1 reply • view on HN

Would it be possible to implement "virtual memory" for a GPU this way? Let's say you have GPUs at 30% utilization, but memory limited. Could you run 2 workloads by offloading the GPU memory when not in use?

Replies

ben_s • yesterday at 8:02 AM

Once you oversubscribe GPU memory, performance usually collapses. Frameworks like vLLM can explicitly offload things like the KV cache to CPU memory, but that's an application-level tradeoff, not transparent GPU virtual memory.

alt Hacker News

Replies