> Unfortunately that is nowhere close to being able to run a 750B param model. For something lik...

zozbot234 • today at 4:29 PM • 1 reply • view on HN

> Unfortunately that is nowhere close to being able to run a 750B param model. For something like that, we're getting closer to 1TB VRAM

You don't have to run a model from VRAM, or even from a sizeable amount of RAM. These choices only ever make sense when serving the model at scale, to hundreds of simultaneous users or more.

Replies

bitmasher9 • today at 5:45 PM

For workstation inference a unified memory architecture would be a good cost/performance balance, while keeping COGs reasonable.

512GB unified memory macs are available, with the ram upgrade costing a few grand.

alt Hacker News

Replies