I think what everyone underestimated was the absolute bonkers amount of compute it will take and how that compute must scale in order to keep up with larger and larger models.
I was involved in three efforts to commercialize foundation models before they were ready in the 2010s so I have a good picture of how progress works at this sort of thing and the pace a lot of the industry has been talking about is unrealistic: like people were disappointed with the rate of development of Apple Intelligence but it's actually progressed at about the rate I expected.
Is that a problem for Meta though? They recently announced they're going to sell their excess compute, so I imagine the actual problem is they're resorting to doing that because AI isn't having nearly the effect/usage it was supposed to and now Zuck is being a sore winner about it
It will scale inefficiently until efficiency breakthroughs occur, but it's really hard to predict when those breakthroughs will happen. Plan on the worst, but be ready and capable of capitalizing when it happens!
That seems like such an easy thing to estimate with a bit of basic napkin math.
I thought thats exactly what everyone anticipates? "Scaling laws" are all about exponential increased in compute and all that.
Altman was trying to get $1T of infra investment years ago
And yet this doesn't turn out to be Meta's problem at all.
https://uk.pcmag.com/ai/165970/meta-exploring-option-to-sell...
Meta bought too many GPUs, has spare GPU capacity and they are exploring renting that capacity out.
The problem is not that the models need too much to do the job. If that were the case, Meta would not have spare capacity.
The problem is that the models currently can't be made to do the job.
Did we? Many of us have been saying that the amount of compute going into the models is unsustainable and that the models aren’t improving enough to justify that for over a year. The emperor has no clothes is true yet again.
They also believed they would be able to build that compute without restrictions. Between hardware costs and massive public opposition, scaling as they had anticipated is in jeopardy.
Bonkers compute only in the beginning. Over time it'll reduce as models are made more efficient.
No I don't think there was any systemic underestimation of compute. I see the opposite - every company understands compute is important and tries to get hold of it.
More than that, I think people overestimate how much AI will progress as you throw more compute at it. It’s the “9 women can’t deliver a baby in a month” equivalent of AI. Additional compute won’t magically give you AGI.