LLMs don't use much energy at all to run, they use it all at the beginning for training, which ...

HNisCIS • yesterday at 10:55 PM • 4 replies • view on HN

LLMs don't use much energy at all to run, they use it all at the beginning for training, which is happening constantly right now.

TLDR this is, intentionally or not, an industry puff piece that completely misunderstands the problem.

Also, even if everyone is effectively running a a dishwasher cycle every day, this is still a problem that we can't just ignore, that's still a massive increase in ecological impact.

Replies

simonw • yesterday at 11:08 PM

The training cost for a model is constant. The more individual use that model gets the lower the training-cost-per-inference-query gets, since that one-time training cost is shared across every inference prompt.

It is true that there are always more training runs going, and I don't think we'll ever find out how much energy was spent on experimental or failed training runs.

➕ show 1 reply

kingstnap • yesterday at 11:29 PM

You underestimate the amount of inference and very much overestimate what training is.

Training is more or less the same as doing inference on an input token twice (forward and backward pass). But because its offline and predictable it can be done fully batched with very high utilization (efficiently).

Training is guestimate maybe 100 trillion total tokens but these guys apparently do inference on the quadrillion token monthly scales.

jeffbee • yesterday at 11:38 PM

Training is pretty much irrelevant in the scheme of global energy use. The global airline industry uses the energy needed to train a frontier model, every three minutes, and unlike AI training the energy for air travel is 100% straight-into-your-lungs fossil carbon.

➕ show 1 reply

linolevan • yesterday at 11:00 PM

I'm not convinced that LLM training is at such a high energy use that it really matters in the big picture. You can train a (terrible) LLM on a laptop[1], and frankly that's less energy efficient than just training it on a rented cloud GPU.

Most of the innovation happening today is in post-training rather than pre-training, which is good for people concerned with energy use because post-training is relatively cheap (I was able to post-train a ~2b model in less than 6 hours on a rented cluster[2]).

[1]: https://github.com/lino-levan/wubus-1 [2]: https://huggingface.co/lino-levan/qwen3-1.7b-smoltalk

alt Hacker News

Replies