My experience is that the first iteration output from a single agent is not what I want to be on the hook for. What squares it for me with "not writing code anymore" is the iterative process to improve outputs:
1) Having review loops between agents (spawn separate "reviewer" agents) and clear tests / eval criteria improved results quite a bit for me. 2) Reviewing manually and giving instructions for improvements is necessary to have code I can own
Is that… actually faster than just doing it yourself, tho? Like, “I could write the right thing, or I could have this robot write the wrong thing and then nag it til it corrects itself” seems to suggest a fairly obvious choice.
I’ve yet to see these things do well on anything but trivial boilerplate.