You have to think LLM as the genie that tries to trick you.
First make it write a contract (REQ/ARCH/IMPL documents). Skim through those for any mistakes.
Then based on those ask it to write tests. Again skim through them.
Now you have a context full of guardrails. It’s less likely to surprise you.
I find a second LLM can do this at least as well as I can, usually, and just ask the harness to surface anything they can't agree on.