Justifiable.
There are a lot more degrees of freedom in world models.
LLMs are fundamentally capped because they only learn from static text -- human communications about the world -- rather than from the world itself, which is why they can remix existing ideas but find it all but impossible to produce genuinely novel discoveries or inventions. A well-funded and well-run startup building physical world models (grounded in spatiotemporal understanding, not just language patterns) would be attacking what I see as the actual bottleneck to AGI. Even if they succeed only partially, they may unlock the kind of generalization and creative spark that current LLMs structurally can't reach.
The sum of human knowledge is more than enough to come up with innovative ideas and not every field is working directly with the physical world. Still I would say there's enough information in the written history to create virtual simulation of 3d world with all ohysical laws applying (to a certain degree because computation is limited).
What current LLMs lack is inner motivation to create something on their own without being prompted. To think in their free time (whatever that means for batch, on demand processing), to reflect and learn, eventually to self modify.
I have a simple brain, limited knowledge, limited attention span, limited context memory. Yet I create stuff based what I see, read online. Nothing special, sometimes more based on someone else's project, sometimes on my own ideas which I have no doubt aren't that unique among 8 billions of other people. Yet consulting with AI provides me with more ideas applicable to my current vision of what I want to achieve. Sure it's mostly based on generally known (not always known to me) good practices. But my thoughts are the same way, only more limited by what I have slowly learned so far in my life.
I don't understand this view. How I see it the fundamental bottleneck to AGI is continual learning and backpropagation. Models today are static, and human brains don't learn or adapt themselves with anything close to backpropagation. World models don't solve any of these problems; they are fundamentally the same kind of deep learning architectures we are used to work with. Heck, if you think learning from the world itself is the bottleneck, you can just put a vision-action LLM on a reinforcement learning loop in a robotic/simulated body.
Honestly, how do people who know so little have this much confidence to post here?
I have a pet peeve with the concept of "a genuinely novel discovery or invention", what do you imagine this to be? Can you point me towards a discovery or invention that was "genuinely novel", ever?
I don't think it makes sense conceptually unless you're literally referring to discovering new physical things like elements or something.
Humans are remixers of ideas. That's all we do all the time. Our thoughts and actions are dictated by our environment and memories; everything must necessarily be built up from pre-existing parts.
Whether it is text or an image, it is just bits for a computer. A token can represent anything.
The term LLM is confusing your point because VLMs belong to the same bin according to Yann.
Using the term autoregressive models instead might help.
why LLMs (transformers trained on multimodal token sequences, potentially containing spatiotemporal information) can't be a world model?
There will be no "unlocking of AGI" until we develop a new science capable of artificial comprehension. Comprehension is the cornucopia that produces everything we are, given raw stimulus an entire communicating Universe is generated with a plethora of highly advanceds predator/prey characters in an infinitely complex dynamic, and human science and technology have no lead how to artificially make sense of that in a simultaneous unifying whole. That's comprehension.
A lot more justifiable than say, Thinking Machines at least. But we will "see".
World models and vision seems like a great use case for robotics which I can imagine that being the main driver of AMI.
Was Alphago's move 37 original?
In the last step of training LLMs, reinforcement learning from verified rewards, LLMs are trained to maximize the probability of solving problems using their own output, depending on a reward signal akin to winning in Go. It's not just imitating human written text.
Fwiw, I agree that world models and some kind of learning from interacting with physical reality, rather than massive amounts of digitized gym environments is likely necessary for a breakthrough for AGI.