logoalt Hacker News

hu3yesterday at 12:05 AM1 replyview on HN

> only a small fraction will be needed in VRAM at any given time

I don't think that's true. At least not without heavy performance loss in which case "just be memory mapped" is doing a lot of work here.

By that logic GPUs could run models much larger than their VRAM would otherwise allow, which doesn't seem to be the case unless heavy quantization is involved.


Replies

zozbot234yesterday at 12:31 AM

Existing GPU API's are sadly not conducive to this kind of memory mapping with automated swap-in. The closest thing you get AIUI is "sparse" allocations in VRAM, such that only a small fraction of your "virtual address space" equivalent is mapped to real data, and the mapping can be dynamic.