There's been some theories floating around that the 128gb version could be the best value for on-premise LLM inference. The RAM is split between CPU and GPU at a user-configurable ratio.
So this might be the holy grail of "good enough GPU" and "over 100GB of VRAM" if the rest of the system can keep up.
> The RAM is split between CPU and GPU at a user-configurable ratio.
I believe the fixed split thing is a historical remnant. These days, the OS can allocate memory for the GPU to use on the fly.