logoalt Hacker News

justrunitlocalyesterday at 6:48 PM4 repliesview on HN

We've been running our 10 dev org on 8 H100s on open models (with some tweaks). Sure they aren't as good as the big providers but they 1. don't go down 2. have pretty damn high tok/s. It pays for itself.

Posting with a fresh account because I'm not supposed to share these details for obvious reason. If you want help on setting this up, just reply with a way to reach you.


Replies

kgeistyesterday at 9:00 PM

We're planning to do the same thing - buy something like 8xH100 and run all coding there. The CTO almost agreed to find the budget for it but I need to make sure there are no risks before we buy (i.e. it's a viable/usable setup for professional AI-assisted coding)

Can you share what models you run and find best performing for this setup? That would help a lot. I already run a smaller AI server in the office but only 32b models fit there. I already have experience optimizing inference, I'm just interested what models you think are great for 8xH100 for coding, I'll figure out the details how to fit it :)

show 3 replies
ok_dadyesterday at 6:52 PM

yea just buy 300k worth of hardware and bob's your uncle

show 2 replies
johndoughyesterday at 7:57 PM

> Sure they aren't as good as the big providers

If you haven't done so already, finetune the model on all your company's code that you can get your hands on. This is one of the great advantages that you get when running local models. I like the style of the generated code much better now, I have to rewrite much less, and my prompts can be shorter too. But maybe these already are the "tweaks" that you mentioned.

show 1 reply
2ndorderthoughtyesterday at 6:59 PM

This is the actual answer. Man I hope to find a company like yours sometime soon. I am sick of all the issues with having 3rd party IP generation