On the slow death of scaling

106 points • by sethbannon • yesterday at 3:48 AM • 25 comments • view on HN

pdf: https://download.ssrn.com/2026/1/6/5877662.pdf?response-cont...

Comments

Hooker’s argument lands for me because it ties the technical scaling story to institutional incentives: as progress depends more on massive training runs, it becomes capital-intensive, less reproducible and more secretive; so you get a compute divide and less publication.

I’m trying to turn that into something testable with a simple constraint: “one hobbyist GPU, one day.” If meaningful progress is still possible under tight constraints, it supports the idea that we should invest more in efficiency/architecture/data work, not just bigger runs.

My favorite line >> Somewhat humorously, the acceptance that there are emergent properties which appear out of nowhere is another way of saying our scaling laws don’t actually equip us to know what is coming.

Regarding this paragraph >> 3.3 New algorithmic techniques compensate for compute. Progress over the last few years has been as much due to algorithmic improvements as it has been due to compute. This includes extending pre-training with instruction finetuning to teach models instruction following ..., model distillation using synthetic data from larger more performant "teachers" to train highly capable, smaller "students" ..., chain-of-thought reasoning ..., increased context-length ..., retrieval augmented generation ... and preference training to align models with human feedback ...

I would consider algorithmic improvements to be the following 1. architecture like ROPE, MLA 2. efficiency using custom kernels

The errors in the paper 1. Transformers for language modeling (Vaswani et al., 2023). => this shd be 2017

Disclosure: my proposed experiments: https://ohgodmodels.xyz/

charcircuit • yesterday at 8:43 AM

>the acceptance that there are emergent properties which appear out of nowhere is another way of saying our scaling laws don’t actually equip us to know what is coming.

Is this actually accepted? Ever since [0], I thought people recognized that they don't appear out of nowhere.

[0] https://arxiv.org/pdf/2304.15004

gdiamos • yesterday at 7:50 AM

It was an interesting read Sara, thanks for sharing it.

I especially agree with your point that scaling laws really killed open research. That's a shame and I personally think we could benefit from more research.

I originally didn't like calling them scaling laws.

In addition to the law part seeming a bit much, I've found that researchers often overemphasize the scale part. If scaling is predictable, then you don't need to do most experiments at very large scale. However, that doesn't seem to stop researchers from starting there.

Once you find something good, and you understand how it scales, then you can pour system resources into it. So I originally thought it would encourage research. I find it sad that it seems to have had the opposite effect.

empiko • yesterday at 6:24 PM

Scaling works, the problem is that it is practically impossible to scale much. There is only so much energy, text data, GPUs, etc. The folly of scaling is that we are living in a finite world. The huge investments in AI for the past few years are probably hitting the practical limits of scaling for now.

I also feel like most insiders were fully aware of this fact, but it was a neat sales pitch.

drob518 • yesterday at 1:38 PM

I suspect scaling will not die a slow death, but rather slow for a while and then all at once. Further, I think we’re at the knee. We know scaling doesn’t work for resolving the fundamental issues we have with large models at this point. If it did, the latest models would have solved the issues. Now, we’re in the acceptance phase. That’s not technical, it’s human psychology. People who made bold claims and huge promises that things would get better if we just spent a few more billion dollars on data centers and GPUs need to unwind those claims and find a way to save face.

ironbound • yesterday at 4:37 AM

It's a mistake to publish papers without a code repo, with the majority of new ML paper's being noise at best.

➕ show 1 reply

FuriouslyAdrift • yesterday at 3:12 PM

Iron law of efficiency: a system will always expand to use all resources available to it.

You want to make an existing system more efficient, then take away resources.

wiz21c • yesterday at 9:10 AM

FTA:

"One thing is certain, is the less reliable gains from compute makes our purview as computer scientists interesting again. We can now stray from the beaten path of boring, predictable gains from throwing compute at the problem."

Isn't Ilya Sutskever who said some months ago that we were going back to research ?

officialchicken • yesterday at 10:18 AM

Let's not forget about the myriad of basic problems that still remain - like deploying data, caching/distribution, and server resilience.

There is absolutely NO reason why that PDF shouldn't load today.

Haaargio • yesterday at 1:33 PM

Its not dying slowly right now at all.

Compute is a massive driver for everything ML. From number of experiments you can run in paralle, to how much RL you can try out, how long stuff is running etc.

ML is pushing scaling on dimensions we haven't had before (number of Datacenters, amount of energy we put into them) and ML is currently seen as the holy grail.

But i'm definitly very very curious how this compute and current progress is playing out in the next few years. It could be that we hit a hard ceiling were every single % point becomes tremendesly costly before we hit a % point of benchmark archievements which makes all of that usable daily. OR we will se a significant change to our society.

I do not think its something in between tbh because it def feels like in an expoential progress curve we are currently in.

tbrownaw • yesterday at 3:52 PM

> A pervasive belief in scaling has resulted in a massive windfall in capital for industry labs and fundamentally reshaped the culture of conducting science in our field.

People spend money on this because it works. It seems odd to call observable reality a "pervasive belief".

> Academia has been marginalized from meaningfully participating in AI progress and industry labs have stopped publishing.

Firstly, I still see news items about new models that are supposed to do more with less. If these are neither from academia nor industry, where are they coming from?

Secondly, "has been marginalized"? Really? Nobody's going to be uninterested in getting better results with less compute spend, attempts have just had limited effectiveness.

> However, it is unclear why we need so many additional weights. What is particularly puzzling is that we also observe that we can get rid of most of these weights after we reach the end of training with minimal loss

I thought the extra weights were because training takes advantage of high-dimensional bullshit to make the math tractable. And that there's some identifiable point where you have "enough" and more doesn't help.

I hadn't heard that anyone had a workable way to remove the extra ones after training, so that's cool.

The impression I had is that there's a somewhat-fuzzy "correct" number of weights and amount of training for any given architecture and data set / information content. And that when you reach that point is when you stop getting effort-free results by throwing hardware at the problem.

octoberfranklin • yesterday at 4:06 AM

> Academia has been marginalized from meaningfully participating in AI progress and industry labs have stopped publishing

Exactly like semiconductor wafer processing.

➕ show 1 reply

newsoftheday • yesterday at 4:06 PM

I read scaling and assumed it would be about scaling but seems to be about AI possibly but didn't read further.

zerosizedweasle • yesterday at 1:44 PM

https://am.jpmorgan.com/us/en/asset-management/institutional...

darig • yesterday at 10:00 AM

[dead]

alt Hacker News

On the slow death of scaling

Comments