> you don't need to make a video model. You probably don't need to decode the latents a...

sailingparrot • yesterday at 9:26 PM • 0 replies • view on HN

> you don't need to make a video model. You probably don't need to decode the latents at all.

If you don't decode, how do you judge quality in a world where generative metrics are famously very hard and imprecise? How do you go about integrating RLHF/RLAF in your pipeline if you don't decode, which is not something you can skip anymore to get SotA?

Just look at the companies that are explicitly aiming for robotics/simulation, they *are* doing video models.

alt Hacker News