The adage "a picture is worth a thousand words" has the nice corollary "A thousand words isn't enough to be precise about an image".
Now expand that to movies and games and you can get why this whole generative-AI bubble is going to pop.
“A frame is worth a billion rays”
The last production I worked on averaged 16 hours per frame for the final rendering. The amount of information encoded in lighting, models, texture, maps, etc is insane.
The point is not to be precise. It's to be "good enough".
Trust me, even if you work with human artists, you'll keep saying "it's not quite I initially invisioned, but we don't have budget/time for another revision, so it's good enough for now." all the time.
Corollary: I couldn't create an original visual piece of art to save my life, so prompting is infinitely better than what I could do myself (or am willing to invest time in building skills). The gen-AI bubble isn't going to burst. Pareto always wins.
If you can build a system that can generate engaging games and movies, from an economic (bubble popping or not popping) point of view it's largely irrelevant whether they conform to fine-grained specifications by a human or not.
Maybe your AI bubble! If you define AI to be something like just another programming language yes you will be sadly disappointed. You see it as an employee with its own intuitions and ways of doing things that you're trying to micromanage.
I have a bad feeling that you'd be a horrible manager if you ever were one.
(2020) https://arxiv.org/abs/2010.11929 : an image is worth 16x16 words transformers for image recognition at scale
(2021) https://arxiv.org/abs/2103.13915 : An Image is Worth 16x16 Words, What is a Video Worth?
(2024) https://arxiv.org/abs/2406.07550 : An Image is Worth 32 Tokens for Reconstruction and Generation
You are half right. Its funny because I use the same same. Mine is "A picture is worth a thousand words. thats why it takes 1000 words to describe the exact image that you want! Much better to just use Image to Image instead".
Thats my full quote on this topic. And I think it stands. Sure, people won't describe a picture. instead, they will take an existing picture or video, and do modifications of it, using AI. That is much much simpler and more useful, if you can file a scene, and then animate it later with AI.
> Now expand that to movies and games and you can get why this whole generative-AI bubble is going to pop.
The prior sentence does not imply the conclusion.
A picture is worth a thousand words.
A word is worth a thousand pictures. (E.g Love)
It is abstraction all the way
Actually, I've gotten some great results with image2text2image with less than a thousand words. Maybe not enough for a video, but for some not too crazy images, it is enough!
Sure it's going to pop. But when is the important question.
Being too early about this and being wrong are the same.
Comment was probably rather about the 360 degree turning heads etc.
I agree that people who want any meaningful precision in their visual results will inevitably be disappointed.
> Now expand that to movies and games and you can get why this whole generative-AI bubble is going to pop.
What will save it is that, no matter how picky you are as a creator, your audience will never know what exactly was that you dreamed up, so any half-decent approximation will work.
In other words, a corollary to your corollary is, "Fortunately, you don't need them to be, because no one cares about low-order bits".
Or, as we say in Poland, "What the eye doesn't see, the heart doesn't mourn."