> The RAM is split between CPU and GPU at a user-configurable ratio.
I believe the fixed split thing is a historical remnant. These days, the OS can allocate memory for the GPU to use on the fly.
It's not a fixed split. I don't know if it's possible live, or if it requires a reboot, but it's not hardwired.
I want to know if it's possible. 4GB for Linux, a bit of room for the calculations, and then you can load a 122GB model entirely into VRAM.
How would that perform in real life? Someone please benchmark it!
Indeed it can be reallocated, needs a reboot though. I've gotten up to around 110 GB before running into OOM issues. I set it at 108 GB to provide a little headroom: https://www.jeffgeerling.com/blog/2025/increasing-vram-alloc...