I've added an instruction: "do not implement anything unless the user approves the plan using the exact word 'approved'".
This has fixed all of this, it waits until I explicitly approve.
There’s an extension to this problem which I haven’t got past. More generally I’d like the agent to stop and ask questions when it encounters ambiguity that it can’t reasonably resolve itself. If someone can get agents doing this well it’d be a massive improvement (and also solve the above).
"NOT approved!"
"The user said the exact word 'approved'. Implementing plan."