logoalt Hacker News

bccdeelast Thursday at 3:47 PM2 repliesview on HN

The creation of a model which is "co-state-of-the-art" (assuming it wasn't trained on the benchmarks directly) is not a win for scaling laws. I could just as easily claim out that xAI's failure to significantly outperform existing models despite "throwing more compute at Grok 3 than even OpenAI could" is further evidence that hyper-scaling is a dead end which will only yield incremental improvements.

Obviously more computing power makes the computer better. That is a completely banal observation. The rest of this 2000-word article is groping around for a way to take an insight based on the difference between '70s symbolic AI and the neural networks of the 2010s and apply it to the difference between GPT-4 and Grok 3 off the back of a single set of benchmarks. It's a bad article.


Replies

starspangledyesterday at 1:30 AM

> The creation of a model which is "co-state-of-the-art" (assuming it wasn't trained on the benchmarks directly) is not a win for scaling laws.

Just based on the comparisons linked in the article, it's not "co-state-of-the-art", it's the clear leader. You might argue those numbers are wrong or not representative, but you can't accept them then claim it's not outperforming existing models.

show 1 reply
horsawlarwaylast Thursday at 5:33 PM

I agree.

There's a lot of attention being paid to metrics that often don't align all that well with actual production use-cases, and frankly the metrics are good but hardly breath-taking.

They have an absolutely insane outlay of additional compute, which appears to have given them a relatively paltry increase in capabilities.

15 times the compute for 5-15% better performance is basically the exact opposite of the bitter lesson.

Hell - it genuinely seems like the author didn't even read the actual bitter lesson.

The lesson is not "scale always wins" the lesson was "We have to learn the bitter lesson that building in how we think we think does not work in the long run."

And somewhat ironically - the latest advances seem to genuinely undermine the lesson. It turns out that building in reasoning/thinking (a heuristic that copies human behavior) is the biggest performance jump we've seen in the last year.

Does that mean we won't scale out of the current capabilities? No, we definitely might. But we also definitely might not.

The diminishing returns we're seeing for scale hint strongly that just throwing more compute at the problem is not enough by itself. Possibly still required, but definitely not sufficient.