I was looking into self-hosting deekseek v4 pro since frankly cache reads are an absolute scam and t...

himata4113 • yesterday at 5:14 PM • 3 replies • view on HN

I was looking into self-hosting deekseek v4 pro since frankly cache reads are an absolute scam and they're 90% of the cost, but then I looked at the ROI and it will never pay off fast enough because the hardware will become obsolete faster even if you were running 10 token generation streams 24/7.

The napkin math resulted that renting is around 27 times cheaper than owning (not including power). I think we're really screwed when it comes to having owned access to AI unless intel comes out swinging with a c series card that has 128gb vram so we can run them in a 4x128gb configuration, but seems unlikely since nvidia has a large share in them.

This was calculated expecting around 30tok/s, of course you can get 2-5tok/s much much cheaper, but it's unusable for my workflow.

Replies

kingstnap • yesterday at 5:36 PM

Ironically the few people not scamming you for cache reads are Deepseek.

Everyone else charges a ridiculous amount but Deepseeks API is $0.003625 / M tok.

I'm surprised no one talks about this because of how significant it is. GPT 5.5 for example costs a ridiculous $0.50 / M tok cached. It's literally almost 140 times cheaper which matters a lot for tool calls.

➕ show 1 reply

dist-epoch • yesterday at 7:40 PM

The only way to profitable serve AI is to have large batch sizes - run 500 requests at the same time.

If you serve a single user you'll never get your electricity price back, nevermind hardware costs.

varispeed • yesterday at 7:23 PM

Would you mind sharing the napkin maths?

➕ show 1 reply

alt Hacker News

Replies