I absolutely cannot trust Claude code to independently work on large tasks. Maybe other people work on software that's not significantly complex, but for me to maintain code quality I need to guide more of the design process. Teams of agents just sounds like adding a lot more review and refactoring that can just be avoided by going slower and thinking carefully about the problem.
I agree, but I've found that making an "adversarial" model within claude helps with the quality a lot. One agent makes the change, the other picks holes in it, and cycle. In the end, I'm left with less to review.
This sounds more like an automation of that idea than just N-times the work.
Humans can't handle large tasks either, which is why you break them into manageable chunks.
Just ask claude to write a plan and review/edit it yourself. Add success criteria/tests for better results.
Exactly, one out of four or three prompts require tuning, nudging or just stopping it. However it takes seniority to see where it goes astray. I suspect that lots of folks dont even notice that CC is off. It works, it passes the tests, so it is good.
There is research[0] currently being done on how to divide tasks and combine the answers to LLMs. This approach allows LLMs reach outcomes (solving a problem that requires 1 million steps) which would be impossible otherwise.
you need a reviewer agent for every step of the process - review the plan generated by the planner, the update made by the task worker subagent, and a final reviewer once all tasks are done.
this does eat up tokens _very_ quickly though :(
You definitely have to create some sort of PLAN.md and PROGRESS.md via a command and an implement command that delegates work. That is the only way that I can get bigger things done no matter how „good“ their task feature is.
You run out of context so quickly and if you don’t have some kind of persistent guidance things go south
You write a generic architecture document on how you want your code base to be organized, when to use pattern x vs pattern y, examples of what that looks like in your code base, and you encode this as a skill.
Then, in your prompt you tell it the task you want, then you say, supervise the implementation with a sub agent that follows the architecture skill. Evaluate any proposed changes.
There are people who maximize this, and this is how you get things like teams. You make agents for planning, design, qa, product, engineering, review, release management, etc. and you get them to operate and coordinate to produce an outcome.
That's what this is supposed to be, encoded as a feature instead of a best practice.