I've seen the Microsoft Aurora team make a compelling argument that weather is an interesting c...

apawloski • last Friday at 9:50 PM • 7 replies • view on HN

I've seen the Microsoft Aurora team make a compelling argument that weather is an interesting contradiction of the AI-energy-waste narrative. Once deployed at scale, inference with these models is actually a sizable energy/compute improvement over classical simulation and forecasting methods. Of course it is energy intensive to train the model, but the usage itself is more energy efficient.

Replies

gwern • yesterday at 4:43 AM

There's also the efficiency argument from new capability: even a tiny bit better weather forecast is highly economically valuable (and saves a lot of wasted energy) if it means that 1 city doesn't have to evacuate because of an erroneous hurricane forecast, say. But how much would it cost to do that with the rivals? I don't know but I would guess quite a lot.

And one of the biggest ironies of AI scaling is that where scaling succeeds the most in improving efficiency, we realize it the least, because we don't even think of it as an option. An example: a Transformer (or RNN) is not the only way to predict text. We have scaling laws for n-grams and text perplexity (most famously, from Jeff Dean et al at Google back in the 2000s), so you can actually ask the question, 'how much would I have to scale up n-grams to achieve the necessary perplexity for a useful code writer competitive with Claude Code, say?' This is a perfectly reasonable, well-defined question, as high-order n-grams could in theory write code without enough data and big enough lookup tables, and so it can be answered. The answer will look something like 'if we turned the whole earth into computronium, it still wouldn't be remotely enough'. The efficiency ratio is not 10:1 or 100:1 but closer to ∞:1. The efficiency gain is so big no one even thinks of it as an efficiency gain, because you just couldn't do it before using AI! You would have humans do it, or not do it at all.

➕ show 3 replies

AStrangeMorrow • yesterday at 12:23 AM

Obviously much simpler Neural Nets, but we did have some models in my domain whose role was to speed up design evaluation.

Eg you want to find a really good design. Designs are fairly easy to generate, but expensive to evaluate and score. Understand we can quickly generate millions of designs but evaluating one can take 100ms-1s. With simulations that are not easy to GPU parallelize. We ended up training models that try to predict said score. They don’t predict things perfectly, but you can be 99% sure that the actual score designs is within a certain distance of said score.

So if normally you want to get the 10 best design out of your 1 million, we can now first have the model predict the best 1000 and you can be reasonably certain your top 10 is a subset of these 1000. So you only need to run your simulation on these 1000.

➕ show 1 reply

klysm • last Friday at 10:53 PM

It's definitely interesting that some neural nets can reduce compute requirements, but that's certainly not making a dent on the LLM part of the pie.

➕ show 1 reply

esafak • yesterday at 1:11 AM

And an LLM can be more energy efficient than a human -- and that's precisely when you should use it.

➕ show 1 reply

threemux • last Friday at 11:08 PM

This jumped out at me as well - very interesting that it actually reduces necessary compute in this instance

➕ show 1 reply

throwaway613745 • last Friday at 11:04 PM

"it's more efficient if you ignore the part where it's not"

➕ show 5 replies

goodolddays9090 • last Friday at 10:15 PM

[dead]

alt Hacker News

Replies