> but the result is not naively going to understand the level of reality the script is going for…
We can already get detailed style guidance into picture generation. Declaring you want Picasso cubist, Warner brothers cartoon, or hyper realistic works today. So does lighting instructions, color palettes, on and on.
These future models will not be large language models, they will be multi-modal. Large movie models if you like. They will have tons of context about how scenes within movies cohere, just as LLMs do within documents today.
This is such an incredibly confident comment. I'm in awe.
So, we went from "just hand off movie script to automated director/DP/editor" we're now rapidly approaching:
- you have to provide correct detailed instructions on lighting
- you have to provide correct detailed instructions on props
- you have to provide correct detailed instructions on clothing
- you have to provide correct detailed instructions on camera position and movement
- you have to provide correct detailed instructions on blocking
- you have to provide correct detailed instructions on editing
- you have to provide correct detailed instructions on music
- you have to provide correct detailed instructions on sound effects
- you have to provide correct detailed instructions on...
- ...
- repeat that for literally every single scene in the movie (up to 200 in extreme cases)
There's a reason I provided a few links for you to look at. I highly recommend the talk by Annie Atkins. Watch it, then open any movie script, and try to find any of the things she is talking about there (you can find actual movie scripts here: https://imsdb.com)