logoalt Hacker News

zozbot234yesterday at 11:03 PM0 repliesview on HN

Your 120B model likely has way more active parameters, so it can probably only fit a few shared layers in the VRAM for your dGPU. You might be better off running that model on a unified memory platform, slower VRAM but a lot more of it.