Once you oversubscribe GPU memory, performance usually collapses. Frameworks like vLLM can explicitl...

ben_s • yesterday at 8:02 AM • 0 replies • view on HN

Once you oversubscribe GPU memory, performance usually collapses. Frameworks like vLLM can explicitly offload things like the KV cache to CPU memory, but that's an application-level tradeoff, not transparent GPU virtual memory.

alt Hacker News