If you have a specific vision, you will have to express the detailed information of that vision into the digital realm somehow. You can use (more) direct tools like premiere if you are fluent enough in their "language". Or you can use natural language to express the vision using AI. Either way you have to get the same amount of information into a digital format.
Also, AI sucks at understanding detail expressed in symbolic communication, because it doesn't understand symbols the way linguistic communication expects the receiver to understand them.
My own experience is that all the AI tools are great for shortcutting the first 70-80% or so. But the last 20% goes up an exponential curve of required detail which is easier and easier to express directly using tooling and my human brain.
Consider the analogy to a contract worker building or painting something for you. If all you have is a vague description, they'll make a good guess and you'll just have to live with that. But the more time you spend with them communicating (through description, mood boards rough sketches etc) the more accurate to your detailed version it will get. But you only REALLY get exactly what you want if you do it yourself, or sit beside them as they work and direct almost every step. And that last option is almost impossible if they can't understand symbolic meaning in language.