logoalt Hacker News

throwup23812/10/20241 replyview on HN

There's two reasons to be hopeful about it though: AI/LLMs are very good at filling in all those little details so humans can cherry pick the parts that they like. I think that's where the real value is in for the masses - once these models can generate coherent scenes, people can start using them to explore the creative space and figure out what they like. Sort of like SegmentAnything and masking in inpainting but for the rest of the scene assembly. The other reason is that the models can probably be architected to figure out environmental/character/light/etc embeddings and use those to build up other coherent scenes, like we use language embeddings for semantic similarity.

That's how I've been using the image generators - lots of experimentation and throwing out the stuff that doesn't work. Then once I've got enough good generated images collected out of the tons of garbage, I fine tune a model and create a workflow that more consistently gives me those styles.

Now the models and UX to do this at a cinematic quality are probably 5-10 years away for video (and the studios are probably the only ones with the data to do it), but I'm relatively bullish on AI in cinema. I don't think AI will be doing everything end to end, but it might be a shortcut for people who can write a script and figure out the UX to execute the rest of the creative process by trial and error.


Replies

troupo12/10/2024

> AI/LLMs are very good at filling in all those little details so humans can cherry pick the parts that they like.

Where did you find AI/ML that are good at filling in actual required and consistent details.

I beg of you to watch Annie Atkins' presentation I linked: https://www.youtube.com/watch?v=SzGvEYSzHf4 and tell me how much intervention would AI/ML need to create all that, and be consistent throughout the movie?

> once these models can generate coherent scenes, people can start using them to explore the creative space and figure out what they like.

Define "coherent scene" and "explore". A scene must be both coherent and consistent, and conform to the overall style of the movie and...

Even such a simple thing as shot/reverse shot requires about a million various details and can be shot in a million different ways. Here's an exploration of just shot/reverse shot: https://www.youtube.com/watch?v=5UE3jz_O_EM

All those are coherent scenes, but the coherence comes from a million decisions: from lighting, camera position, lens choice, wardrobe, what surrounds the characters, what's happening in the background, makeup... There's no coherence without all these choices made beforehand.

Around 4:00 mark: "Think about how well you know this woman just from her clothes, and workspace". Now watch that scene. And then read its description in the script https://imsdb.com/scripts/No-Country-for-Old-Men.html:

--- start quote ---

    Chigurh enters. Old plywood paneling, gunmetal desk, litter
          of papers. A window air-conditioner works hard.
          A fifty-year-old woman with a cast-iron hairdo sits behind
          the desk.
--- end quote ---

And right after that there's a section on the rhythm of editing. Another piece in the puzzle of coherence in a scene.

> Then once I've got enough good generated images collected out of the tons of garbage, I fine tune a model and create a workflow that more consistently gives me those styles.

So, literally what I wrote here: https://news.ycombinator.com/item?id=42375280 :)