As a machine learning researcher, I don't get why these are called world models.
Visually, they are stunning. But it's nowhere near physical. I mean look at that video with the girl and lion. The tail teleports between legs and then becomes attached to the girl instead of the tiger.
Just because the visuals are high quality doesn't mean it's a world model or has learned physics. I feel like we're conflating these things. I'm much happier to call something a world model if its visual quality is dogshit but it is consistent with its world. And I say its world because it doesn't need to be consistent with ours
>As a machine learning researcher, I don't get why these are called world models.
It's called "world models" because it's a grift. An out-in-the-open, shameless grift. Investors, pile on.
> Visually, they are stunning.
The input images are stunning, model's result is another disappointing trip to uncanny valley. But we feel Ok as long as the sequence doesn't horribly contradict the original image or sound. That is the world model.