I've read several comments in the last few months citing this kind of approach. What I'm trying to do is to make the implementation phase more reliable.
By the way, if you're not doing that yet, something that can really help when doing UI/UX work is to have the agent create some mockups, and then tests based on those - I'm using Cucumber with some extra sauce for this. It's a very nice way to guide the agent in a falsifiable way.
I have experimented with that a bit but not in a rigorous way, it's good to know that there is value in doing it so I'll try to integrate it into my process. Thanks for the tip!