Yea, I actually looked into a similar thing myself recently. I was looking at how we could replace C...

ok_dad • yesterday at 7:19 PM • 1 reply • view on HN

Yea, I actually looked into a similar thing myself recently. I was looking at how we could replace Cursor, and I found that for ~10 people we'd need a half dozen H100's or something on that scale, which would cost ~$1500 per developer or so to build and maintain on cloud infra, and to buy it would cost roughly 3 developers yearly salaries or so (this aligns with your numbers). We don't use that much inference, so we decided paying Cursor ~$200-300 per dev per month is better, for now, but in the future we might regret that when prices normalize more. However, we also don't use cloud agents or independent agents, we basically use AI as a pair programmer, so if we had to drop AI coding assistants completely our process wouldn't break too badly. I wish I could task my 3080 gaming card to do some inference, but I can only get ~10B models on there, so it's kinda worthless unless it's for something a small model can do.

Replies

zozbot234 • yesterday at 8:10 PM

The best deal is arguably to buy as much on prem inference as you can reasonably expect to use by running the hardware around the clock, even at slower throughput, and use 3rd-party inference for things that are genuinely latency-sensitive. I just don't see how this resolves to needing a half-dozen V100, surely you're not using that much compute? You don't need to place your entire model on GPU, engines for on prem inference generally support CPU/RAM-based offload.

alt Hacker News

Replies