I really really want to see how these images are starting to form into videos. The stills are clearly getting better and better, but what about when you need the stills to organically conform to a keyed script?
I'm seeing more and more AI video memes and they are getting really good. Still just bunch of short clips, long shots are not working well enough, but typical Hollywood movies have few second cuts anyway so this is almost good enough to make a marvel fanfic.
the workflow right now would be to take this images, make a sequence of them for key "shots" and send them to an I2V model. LTX-2 is the model the r/stablediffusion folks are playing with right now, but there are a fair few.
Check out Seedance 2: https://seed.bytedance.com/en/seedance2_0
Nano Banana was technically impressive the first time, but after Seedance it's not really. It's all just an internet pollution machine anyway.