It can generate something well produced, but it's really bad at applying taste or direction in the way a human does.
The workflow feels wrong. it should be closer to a DAW with chat, where the model outputs stems, samples and arrangement parts instead of one finished track. Then you could target a specific sound, section or idea and actually develop it.
I agree with your DAW UX suggestion very much. I think the writing is on the wall, and Suno is doing exactly that with their Suno Studio.