Scaling Laws, Carefully

65 points • by tehnub • last Friday at 5:55 PM • 16 comments • view on HN

Comments

When I first saw scaling laws in that deep speech experiment notebook, I didn’t believe it could be real. I was worried for months that we made a mistake, or that it only worked for that one dataset.

I started to believe it after we (Joel Hestness in particular) reproduced it in so many experiments in “scaling is predictable empirically”.

The OpenAI work replicated it in a completely different environment, and at that point I was sure it was real.

Sometimes people ask me why I was so surprised by it. Prior work like Banko and Brill and the unreasonable effectiveness of data argued for more data. ML theory had similar models for toy problems, eg coin flips.

At the time I thought deep learning was supposed to be complex. Speech and language datasets seemed much more complex than toy problems. Optimization of deep transformers was complex.

The idea that it was possible for the whole thing to be governed by a 3 term equation seemed too simple. The implication was that it was simple to manufacture intelligence.

Ten years later, I still think it is still the most interesting observation I have seen. We are still learning what it looks like to live in a world where it is possible to manufacture intelligence.

➕ show 1 reply

aspenmartin • last Friday at 10:36 PM

I really wish more people skeptical of AI capabilities would read about scaling laws -- Lilian is always so marvelous at giving a deep overview of the technical side but the whole point of this is: there are scaling laws, and they hold and continue to hold. This is such a huge basis for the predictions about AI capabilities for the past like 5 years.

➕ show 3 replies

ekelsen • today at 6:02 AM

Jeff Dean has a paper in 2007 that has proto scaling law plots for ngram language models.

https://aclanthology.org/anthology-files/anthology-files/pdf...

➕ show 1 reply

alt Hacker News

Scaling Laws, Carefully

Comments