Not an engineer but I think this is where my mind was going after reading the post. Seems like what will be useful is continuously generated "decision documentation." So the system has access to what has come before in a dynamic way. (Like some mix of RAG with knowledge graph + MCP?) Maybe even pre-outlining "decisions to be made," so if an agent is checking in, it could see there is something that needs to be figured out but hasn't been done yet.
I actually have a "LLM as a judge" loop on all my codebases. I have an architecture panel that debates improvements given an optimization metric and convergence criteria and I feed their findings into a deterministic spec generator (cue /w validation) that can emit unit/e2e tests, scaffold terraform. It's pretty magical.
This cue spec gets decomposed into individual tasks by an orchestrator that does research per ticket and bundles it.