> doesn't make financial sense to self-host
I guess that's debatable. I regularly run out of quota on my claude max subscription. When that happens, I can sort of kind of get by with my modest setup (2x RTX3090) and quantized Qwen3.
And this does not even account for privacy and availability. I'm in Canada, and as the US is slowly consumed by its spiral of self-destruction, I fully expect at some point a digital iron curtain will go up. I think it's prudent to have alternatives, especially with these paradigm-shattering tools.
Self-hosting training (or gaming) makes a lot of sense, and once you have the hardware self-hosting inference on it is an easy step.
But if you have to factor in hardware costs self-hosting doesn't seem attractive. All the models I can self-host I can browse on openrouter and instantly get a provider who can get great prices. With most of the cost being in the GPUs themselves it just makes more sense to have others do it with better batching and GPU utilization
> I regularly run out of quota on my claude max subscription. When that happens, I can sort of kind of get by with my modest setup (2x RTX3090) and quantized Qwen3.
When talking about fallback from Claude plans, The correct financial comparison would be the same model hosted on OpenRouter.
You could buy a lot of tokens for the price of a pair of 3090s and a machine to run them.
Did the napkin math on M3 Ultra ROI when DeepSeek V3 launched: at $0.70/2M tokens and 30 tps, a $10K M3 Ultra would take ~30 years of non-stop inference to break even - without even factoring in electricity. Clearly people aren't self-hosting to save money.
I've got a lite GLM sub $72/yr which would require 138 years to burn through the $10K M3 Ultra sticker price. Even GLM's highest cost Max tier (20x lite) at $720/yr would buy you ~14 years.
Your $5,000 PC with 2 GPUs could have bought you 2 years of Claude Max, a model much more powerful and with longer context. In 2 years you could make that investment back in pay raise.
Unless you already had those cards, it probably still doesn’t make sense from a purely financial perspective unless you have other things you’re discounting for.
Doesn’t mean you shouldn’t do it though.
How does your quantized Qwen3 compares in code quality to Opus?
I think AI may be the only place you could get away with calling a 2x350W GPU rig "modest".
That's like ten normal computers worth of power for the GPUs alone.