If I had to use the models as they exist right now I'd use them in a procedural Myst-like where I incorporate the temporal inconsistency into the setting. The player's actions and state would affect the prompts used for conditioning the video generation. It would probably be weird and buggy but could be fun.
You could also use these models to generate assets for a game during development whether that's simple cutscenes or assets produced through gaussian splatting or some other process.
If these models and others can be run cost effectively on a cloud service or even locally at some point then you could do some interesting things by combining them with 3D mesh generation, img2img, vid2vid, etc. just think about even simple games like Papers Please and the whole genre it spawned that uses short episodes where you have to make a guess based on what you see, there's a lot of potential for creating new mechanics around generative imagery.