logoalt Hacker News

yummypaintyesterday at 10:03 PM5 repliesview on HN

By leveraging Genie’s immense world knowledge, it can simulate exceedingly rare events—from a tornado to a casual encounter with an elephant—that are almost impossible to capture at scale in reality. The model’s architecture offers high controllability, allowing our engineers to modify simulations with simple language prompts, driving inputs, and scene layouts. Notably, the Waymo World Model generates high-fidelity, multi-sensor outputs that include both camera and lidar data.

How do you know the generated outputs are correct? Especially for unusual circumstances?

Say the scenario is a patch of road is densely covered with 5 mm ball bearings. I'm sure the model will happily spit out numbers, but are they reasonable? How do we know they are reasonable? Even if the prediction is ok, how do we fundamentally know that the prediction for 4 mm ball bearings won't be completely wrong?

There seems to be a lot of critical information missing.


Replies

IMTDbyesterday at 10:28 PM

The idea is that, over time, the quality and accuracy of world-model outputs will improve. That, in turn, lets autonomous driving systems train on a large amount of “realistic enough” synthetic data.

For example, we know from experience that Waymo is currently good enough to drive in San Francisco. We don’t yet trust it in more complex environments like dense European cities or Southeast Asian “hell roads.” Running the stack against world models can give a big head start in understanding what works, and which situations are harder, without putting any humans in harm’s way.

We don’t need perfect accuracy from the world model to get real value. And, as usual, the more we use and validate these models, the more we can improve them; creating a virtuous cycle.

joshfeeyesterday at 10:11 PM

Isn't that true for any scenario previously unencountered, whether it is a digital simulation or a human? We can't optimize for the best possible outcome in reality (since we can't predict the future), but we can optimize for making the best decisions given our knowledge of the world (even if it is imperfect).

In other words it is a gradient from "my current prediction" to "best prediction given my imperfect knowledge" to "best prediction with perfect knowledge", and you can improve the outcome by shrinking the gap between 1&2 or shrinking the gap between 2&3 (or both)

ses1984yesterday at 10:25 PM

You could train it in simulation and then test it in reality.

show 1 reply
fookeryesterday at 10:10 PM

> from a tornado to a casual encounter with an elephant

A sims style game with this technology will be pretty nice!

aaaaloneyesterday at 10:06 PM

They probably just look at the results of the generation.

I mean would I like a in-depth tour of this? Yes.

But it's a marketing blog article, what do you expect?

show 1 reply