something like a white paper with a mood board, color scheme, and concept art as the input might work. This could be sent into an LLM "expander" that increases the words and speficity. Then multiple reviews to tap things in the right direction.
And I think this realistically is going to be the shape of the tools to come in the foreseeable future.
I expect this kind of thing is actually how it's going to work longer term, where AI is a copilot to a human artist. The human artist does storyboarding, sketching in backdrops and character poses in keyframes, and then the AI steps in and "paints" the details over top of it, perhaps based on some pre-training about what the characters and settings are so that there's consistency throughout a given work.
The real trick is that the AI needs to be able to participate in iteration cycles, where the human can say "okay this is all mostly good, but I've circled some areas that don't look quite right and described what needs to be different about them." As far as I've played with it, current AIs aren't very good at revisiting their own work— you're basically just tweaking the original inputs and otherwise starting over from scratch each time.