So, we went from "just hand off movie script to automated director/DP/editor" we're now rapidly approaching:
- you have to provide correct detailed instructions on lighting
- you have to provide correct detailed instructions on props
- you have to provide correct detailed instructions on clothing
- you have to provide correct detailed instructions on camera position and movement
- you have to provide correct detailed instructions on blocking
- you have to provide correct detailed instructions on editing
- you have to provide correct detailed instructions on music
- you have to provide correct detailed instructions on sound effects
- you have to provide correct detailed instructions on...
- ...
- repeat that for literally every single scene in the movie (up to 200 in extreme cases)
There's a reason I provided a few links for you to look at. I highly recommend the talk by Annie Atkins. Watch it, then open any movie script, and try to find any of the things she is talking about there (you can find actual movie scripts here: https://imsdb.com)
That’s the same thing with digital art, even with the most effortless one (matte painting), there’s a plethora of decisions to make and techniques to use to have a coherent result. There’s a reason people go to school or trained themselves for years to get the needed expertise. If it was just data, someone would have written a guide that others would mindlessly follow.
Not sure why you jumped there. I was thinking more like ‘make it look like Bladerunner if Kurosawa directed it, with a score like Zimmer.’
You’re really failing to let go of the idea that you need to prescribe every little thing. Like Midjourney today, you’ll be able to give general guidance.
Now, I don’t expect we’ll get the best movies this way. But paint by numbers stuff like many movies already are? A Hallmark Channel weepy? I bet we will.
There's two reasons to be hopeful about it though: AI/LLMs are very good at filling in all those little details so humans can cherry pick the parts that they like. I think that's where the real value is in for the masses - once these models can generate coherent scenes, people can start using them to explore the creative space and figure out what they like. Sort of like SegmentAnything and masking in inpainting but for the rest of the scene assembly. The other reason is that the models can probably be architected to figure out environmental/character/light/etc embeddings and use those to build up other coherent scenes, like we use language embeddings for semantic similarity.
That's how I've been using the image generators - lots of experimentation and throwing out the stuff that doesn't work. Then once I've got enough good generated images collected out of the tons of garbage, I fine tune a model and create a workflow that more consistently gives me those styles.
Now the models and UX to do this at a cinematic quality are probably 5-10 years away for video (and the studios are probably the only ones with the data to do it), but I'm relatively bullish on AI in cinema. I don't think AI will be doing everything end to end, but it might be a shortcut for people who can write a script and figure out the UX to execute the rest of the creative process by trial and error.