You are half right. Its funny because I use the same same. Mine is "A picture is worth a thousand words. thats why it takes 1000 words to describe the exact image that you want! Much better to just use Image to Image instead".
Thats my full quote on this topic. And I think it stands. Sure, people won't describe a picture. instead, they will take an existing picture or video, and do modifications of it, using AI. That is much much simpler and more useful, if you can file a scene, and then animate it later with AI.