logoalt Hacker News

program_whiztoday at 3:01 PM1 replyview on HN

"approaching" is doing some work there. $30K today will get you 90-144GB usable VRAM with solid system RAM and disk and CPU. A single B200 chip at 180GB is $40K. Unfortunately that is nowhere close to being able to run a 750B param model. For something like that, we're getting closer to 1TB VRAM (8+ H200/B200), and then 1M context KV cache is many more GBs on top of that.

That's a $500K-$1M+ rig as of now. That's a lot of $200 subscriptions to break even, but reasonable if you are paying Anthropic $25/M tokens. Then of course there's the power, cooling, and maintenance to consider...

But yeah, I can see if the prices come down 10x in a few years, or crater after the bubble, $30-40k might get you a decent machine.


Replies

zozbot234today at 4:29 PM

> Unfortunately that is nowhere close to being able to run a 750B param model. For something like that, we're getting closer to 1TB VRAM

You don't have to run a model from VRAM, or even from a sizeable amount of RAM. These choices only ever make sense when serving the model at scale, to hundreds of simultaneous users or more.

show 1 reply