I was looking into self-hosting deekseek v4 pro since frankly cache reads are an absolute scam and they're 90% of the cost, but then I looked at the ROI and it will never pay off fast enough because the hardware will become obsolete faster even if you were running 10 token generation streams 24/7.
The napkin math resulted that renting is around 27 times cheaper than owning (not including power). I think we're really screwed when it comes to having owned access to AI unless intel comes out swinging with a c series card that has 128gb vram so we can run them in a 4x128gb configuration, but seems unlikely since nvidia has a large share in them.
This was calculated expecting around 30tok/s, of course you can get 2-5tok/s much much cheaper, but it's unusable for my workflow.
The only way to profitable serve AI is to have large batch sizes - run 500 requests at the same time.
If you serve a single user you'll never get your electricity price back, nevermind hardware costs.
Ironically the few people not scamming you for cache reads are Deepseek.
Everyone else charges a ridiculous amount but Deepseeks API is $0.003625 / M tok.
I'm surprised no one talks about this because of how significant it is. GPT 5.5 for example costs a ridiculous $0.50 / M tok cached. It's literally almost 140 times cheaper which matters a lot for tool calls.