logoalt Hacker News

zyx321last Sunday at 1:54 PM1 replyview on HN

There's been some theories floating around that the 128gb version could be the best value for on-premise LLM inference. The RAM is split between CPU and GPU at a user-configurable ratio.

So this might be the holy grail of "good enough GPU" and "over 100GB of VRAM" if the rest of the system can keep up.


Replies

yencabulatorlast Sunday at 2:27 PM

> The RAM is split between CPU and GPU at a user-configurable ratio.

I believe the fixed split thing is a historical remnant. These days, the OS can allocate memory for the GPU to use on the fly.

show 2 replies