logoalt Hacker News

CubsFan1060today at 11:26 AM6 repliesview on HN

Knowing very little about how to run these, how close are we to medium or larger businesses starting to buy hardware to run models like this to keep the models local?

It’s expensive, and not as capable as the frontier models, but would have some pretty big benefits around privacy and agency.


Replies

wongarsutoday at 11:54 AM

I know of multiple businesses in Europe that have been doing that for a while with 70B models, and are upgrading hardware to run the new crop of 700B-1T models (really started around Kimi K2, but buying and hosting that kind of hardware takes time)

Not everyone is willing (or even legally able) to send their trade secrets to OpenAI or Anthropic

show 2 replies
MikhailTaltoday at 11:58 AM

This is not a new situation. This was happening also when good vision models like alexa net were coming through, especially for OCR. Companies had choice between cloud or self hosting with GPUs. But turns out, problem is usage patterns.

Your usage will peak during certain timezone work hours(even if you are a huge multinational company most of your engineers/users tend to be from only a few locations), so then you have a bunch of gpus doing nothing the rest of the day. especially with latency sensitive stuff, this is a decades old tradeoff problem, its not unique to llms

Havoctoday at 11:44 AM

It’s a ~750B model so still a hell of a lot of vram

Would need to be a pretty determined medium biz

moffkalasttoday at 11:32 AM

So far there seems to be one major use-case for complete privacy, and that is legal work. You don't need top of the line models to search vast amounts of text in discovery and it needs to be completely confidential. There's quite a few lawyers over on r/localllama showing off their multi-GPU builds. Coincidentally they also have the vast funding required for it.

petesergeanttoday at 11:31 AM

Unless you have genuine national security concerns, you’d be better off just negotiating a commercial agreement with privacy protections with a couple of existing vendors.

show 2 replies
re-thctoday at 11:28 AM

> how close are we to medium or larger businesses starting to buy hardware to run models like this to keep the models local?

Years.

Even Microsoft said they don't have enough for Github and need to call Amazon.

Getting a few even at decent prices is hard. Unless the shortages goes down...