> Again, tell me how exactly your amazing magical AI director will know which wardrobe to chose, which camera angles to setup, which typography to use, which sound effects to make just from the script you hand in?
I think you're still assuming I always want to choose those things. That's why we're talking past each other. A good movie making model would choose for me unless I give explicit directions. Today we don't see long-range coherence in the results of movie (or game engine) models, but the range is increasing, and I'm willing to bet we will see movie-length coherence in the next decade or so.
By the way, I also bet that if I pasted exactly the No Country for Old Men script scene description from up this thread into Midjourney today it would produce at least some compelling images with decent choices of wardrobe, lighting, set dressing, camera angle, exposure, etc etc. That's what these models do, because they're extrapolating and interpolating between the billion images they've seen that contained these human choices.
AFAIK Midjourney produces single images, so the relevant scope of consistency is inside the single image only. Not between images. A movie model needs coherence across ~160,000 images, which is beyond the state of the art today but I don't see why it's impossible or unreasonable in the long run.
> A general prompt will give you a general result that is usually very far from what you actually have in mind.
Which is only a problem if I have something in mind. Alternatively I can give no guidance, or loose guidance, make half a dozen variations, pick the one I like best. Maybe iterate a couple of times into that variation tree. Just like the image generators do.