sort of, except I think the future of llms will be to to have the llm try 5 separate attempts to create a fix in parallel, since llm time is cheaper than human time... and once you introduce this aspect into the workflow, you'll want to spin up multiple containers, and the benefits of the terminal aren't as strong anymore.
Who or what will review the 5 PRs (including their updates to automated tests)? If it's just yet another agent, do we need 5 of these reviews for each PR too?
In the end, you either concede control over 'details' and just trust the output or you spend the effort and validate results manually. Not saying either is bad.
Having command line tools to spin up multiple containers and then to collect their results seems like it would be a pretty natural fit.
dagger does this: https://www.youtube.com/watch?v=C2g3vdbffOI
Why would spinning containers remove the benefits? Presumably there is a terminal too interacting with the containers.
Nah, if parallelism will help, it'll be abstracted away from the user.
Tmux?
I feel like the better approach would be to throw away PRs when they're bad, edit your prompt, and then let the agent try again using the new prompt. Throwing lots of wasted compute at a problem seems like a luxury take on coding agents, as these agents can be really expensive.
So the process becomes: Read PR -> Find fundamental issues -> Update prompt to guide agent better -> Re-run agent.
Then your job becomes proof-reading and editing specification documents for changes, reviewing the result of the agent trying to implement that spec, and then iterating on it until it is good enough. This comes from the belief that better, more expensive, agents will usually produce better code than 5 cheaper agents running in parallel with some LLM judge to choose between or combine their outputs.