With real art you can start from somewhere and keep building on that foundation. Say you pick an angle to shoot from and test different actors and scenes from that angle. With AI you’re re-rolling the dice for every iteration. If you’re happy that it looks 80% correct then sure it’s maybe passable.
I think people are getting way ahead of their skis here. Even in 2D I can’t for example generate inventory images for weapons and items for a game yet. Which is an orders of magnitude simpler test case than video. They all are slightly different styles. If I don’t care that they all look different in strange ways then it’s useful - but any consumer will think it looks like crap