logoalt Hacker News

cheesecompileryesterday at 3:32 PM4 repliesview on HN

The reverse is possible too: throwing massive compute at a problem can mask the existence of a simpler, more general solution. General-purpose methods tend to win out over time—but how can we be sure they’re truly the most general if we commit so hard to one paradigm (e.g. LLMs) that we stop exploring the underlying structure?


Replies

willvarfartoday at 10:14 AM

I think the "bitter lesson" is that, while startup A is trying to tune and optimise what they do in order to be able to train their model with hardware quantity B, there is another startup called C that will be lucky enough to have B*2 hardware (credits etc) and will not try so hard to optimise and will reach the end quicker?

Of course deepseek was forced to take the optimisation approach but got to the end in time to stake a claim. So ymmv.

falcor84yesterday at 4:34 PM

The way I see this, from the explore-exploit point of view, it's pretty rational to put the vast majority of your effort into the one action that has shown itself to bring the most reward, while spending a small amount of effort exploring other ones. Then, if and when that one action is no longer as fruitful compared to the others, you switch more effort to exploring, now having obtained significant resources from that earlier exploration, to help you explore faster.

apiyesterday at 5:43 PM

CS is full of trivial examples of this. You can use an optimized parallel SIMD merge sort to sort a huge list of ten trillion records, or you can sort it just as fast with a bubble sort if you throw more hardware at it.

The real bitter lesson in AI is that we don't really know what we're doing. We're hacking on models looking for architectures that train well but we don't fully understand why they work. Because we don't fully understand it, we can't design anything optimal or know how good a solution can possibly get.

show 2 replies
logicchainsyesterday at 4:14 PM

We can be sure via analysis based on computational theory, e.g. https://arxiv.org/abs/2503.03961 and https://arxiv.org/abs/2310.07923 . This lets us know what classes of problems a model is able to solve, and sufficiently deep transformers with chain of thought have been shown to be theoretically capable of solving a very large class of problems.

show 4 replies