logoalt Hacker News

in-silicoyesterday at 6:51 PM11 repliesview on HN

Everyone here seems too caught up in the idea that Genie is the product, and that its purpose is to be a video game, movie, or VR environment.

That is not the goal.

The purpose of world models like Genie is to be the "imagination" of next-generation AI and robotics systems: a way for them to simulate the outcomes of potential actions in order to inform decisions.


Replies

benlivengoodyesterday at 7:40 PM

Agreed; everyone complained that LLMs have no world model, so here we go. Next logical step is to backfill the weights with encoded video from the real world at some reasonable frame rate to ground the imagination and then branch the inference on possible interventions (actions) in the near future of the simulation, throw the results into a goal evaluator and then send the winning action-predictions to motors. Getting timing right will probably require a bit more work than literally gluing them together, but probably not much more.

show 1 reply
avaeryesterday at 7:01 PM

Soft disagree; if you wanted imagination you don't need to make a video model. You probably don't need to decode the latents at all. That seems pretty far from information-theoretic optimality, the kind that you want in a good+fast AI model making decisions.

The whole reason for LLMs inferencing human-processable text, and "world models" inferencing human-interactive video, is precisely so that humans can connect in and debug the thing.

I think the purpose of Genie is to be a video game, but it's a video game for AI researchers developing AIs.

I do agree that the entertainment implications are kind of the research exhaust of the end goal.

show 11 replies
oceanplexianyesterday at 9:33 PM

Yeah and the goal of Instagram was to share quirky pictures you took with your friends. Now it’s a platform for influencers and brainrot; arguably it has done more damage than drugs to younger generations.

As soon as this thing is hooked up to VR and reaches a tipping point with the general public we all know exactly what is going to happen. The creation of the most profitable, addictive and ultimately dystopian technology Big Tech has ever come up with.

show 1 reply
rzmmmyesterday at 11:42 PM

I feel that this is too costly for that kind of usage. Probably quote different architecture is needed for robotics.

pizzafeelsrightyesterday at 7:54 PM

Environment mapping to AI generated alternative outcomes is the holodeck.

I prefer real danger as living in the simulation is derivative.

whytakayesterday at 8:33 PM

I think this is the key component of developing subjective experience.

reactordevyesterday at 7:49 PM

Still cool though…

show 1 reply
echelonyesterday at 7:30 PM

Whoa, whoa, whoa. That's just one angle. Please don't bin that as the only use case for "world models"!

First of all, there are a variety of different types of world models. Simulation, video, static asset, etc. It's a loaded term, just as the use cases are widespread.

There are world models you can play in your browser inferred entirely by your CPU:

https://madebyoll.in/posts/game_emulation_via_dnn/ (my favorite, from 2022!)

https://madebyoll.in/posts/world_emulation_via_dnn/ (updated, in 3D)

There are static asset generating world models, like WorldLabs' Marble. These are useful for video games, previz, and filmmaking.

https://marble.worldlabs.ai/

I wrote open source software to leverage marble for filmmaking (I'm a filmmaker, and this tech is extremely useful for scene consistency):

https://www.youtube.com/watch?v=wJCJYdGdpHg

https://github.com/storytold/artcraft

There are playable video-oriented models, many of which are open source and will run on your 3080 and above:

https://diamond-wm.github.io/

https://github.com/Robbyant/lingbot-world

There are things termed "world models" that really shouldn't be:

https://github.com/Tencent-Hunyuan/HunyuanWorld-1.0

There are robotics training oriented world models:

https://github.com/leggedrobotics/robotic_world_model

Genie is not strictly robotics-oriented.

show 1 reply
dyauspitryesterday at 7:17 PM

That’s part of it but if you could actually pull out 3D models from these worlds, it would massively speed up game development.

show 1 reply
cyanydeezyesterday at 8:48 PM

Like LLMs, though: Do you really think a simulation will get them to all the corner cases robots/AI needs to know about, or will it be largely the same problem -- they'll be just good enough to fool the engineers and make the business ops drool and they'll be put into production and suddenly we'll see in a year or two stories about robots crushing peoples hands, stepping in drains and falling over or falling off roofs cause of some bizarre miscommunication between training and reality.

So, like, it's very important to understand the lineage of training and not just the "this is it"

slashdaveyesterday at 7:52 PM

This is a video model, not a world model. Start learning on this, and cascading errors will inevitably creep into all downstream products.

You cannot invent data.

show 3 replies