logoalt Hacker News

echelontoday at 7:10 PM4 repliesview on HN

I would rather we give up the idea of running open models on RTX cards and instead focus on running much bigger open models on H200s.

1. The hardware will eventually catch up.

2. This keeps the delta between frontier models smaller.

3. We can still fine tune and own the weights.

4. The models will be more useful, faster, and reliable.

RTX is hobbyist tier, not professional tier.

Gated cloud models from hyperscalers treat us like hobbyists in their own right.

We need equivalent scale models, but open.


Replies

zozbot234today at 7:53 PM

H200s and other enterprise datacenter GPUs are completely overkill in any realistic single- or few-users inference scenario. They're hugely unbalanced towards compute capacity which will go almost entirely unused (i.e. wasted) unless you're running huge batches on a continued basis. I've argued many times that local inference engines should support batched inference on a somewhat smaller scale for a variety of reasons (especially given the unexpected effectiveness of SSD streamed inference with larger-than-RAM models), but even I don't think we can realistically go to 300x or so for real-time inference, which is the range that pencils out quite consistently from a simple roofline model of these datacenter cards.

show 1 reply
dofmtoday at 10:16 PM

Pressure on small model quality and design is absolutely what is needed. There are still gains to be made.

SR2Ztoday at 7:30 PM

That GPU costs 25k which means you really should have a rack to put it in. It's not realistic.

MrLeaptoday at 7:21 PM

There's a lot more professionals that have RTX cards than H200s. You're inevitably see more development and experimentation on things actual humans have lmao.