What you're thinking of is much more like the Genie model from DeepMind [0]. That one is like Veo, but interactive (but not publically available)
[0] https://deepmind.google/discover/blog/genie-2-a-large-scale-...