Nvidia will sell you an entire server rack ready for inference. Or maybe you can roll out your own Blackwell based system.
We’re approaching a world where running a primer frontier model is possible on a workstation, probably will have something under $30k that looks like a desktop for Nvidia’s next generation. It sounds expensive, until you look at your Anthropic bill.
It’s similar unit economics as could computing for the open models. You can save a ton on the expenses by buying the hardware, but it requires a lot of in-house expertise, and you get the most value if you keep the system operating around the clock. The big kink is open models are usually 2 quarters behind frontier, and your competitors are probably trying to get access to mythos.
"approaching" is doing some work there. $30K today will get you 90-144GB usable VRAM with solid system RAM and disk and CPU. A single B200 chip at 180GB is $40K. Unfortunately that is nowhere close to being able to run a 750B param model. For something like that, we're getting closer to 1TB VRAM (8+ H200/B200), and then 1M context KV cache is many more GBs on top of that.
That's a $500K-$1M+ rig as of now. That's a lot of $200 subscriptions to break even, but reasonable if you are paying Anthropic $25/M tokens. Then of course there's the power, cooling, and maintenance to consider...
But yeah, I can see if the prices come down 10x in a few years, or crater after the bubble, $30-40k might get you a decent machine.