It would be awful if running models locally became the primary way of using LLMs. On dedicated serve...

9dev • today at 7:16 AM • 3 replies • view on HN

It would be awful if running models locally became the primary way of using LLMs. On dedicated servers sharing GPUs across requests, energy usage and environmental impact is way lower overall than if everyone and their mother suddenly needs beefy GPUs. It’s the equivalent of everyone commuting alone in their own car instead of a train picking up hundreds at once.

Replies

zozbot234 • today at 7:29 AM

You can batch requests when running locally too, if you're using a model with low-enough requirements for KV-cache; essentially targeting the same resource efficiencies that the big providers rely on. This is useful since it gives you more compute throughput "for free" during decode, even when running on very limited hardware.

➕ show 1 reply

duskdozer • today at 8:55 AM

Maybe people would target their use more appropriately, then.

➕ show 1 reply

amelius • today at 8:24 AM

It's even more awful if the compute capital is owned by only a handful of players.

alt Hacker News

Replies