The general rule seems to be, the more layers you automate with LLMs, the worse each successive layer gets. Piping LLM output as input into new LLM calls, you're already starting to notice how things fall apart and get lost quickly.
If you have the idea, more or less the implementation plan, let the LLM do the coding, you can end up with something maintainable and nice, it's basically up to you.
Strip away one layer, so you have the idea, but let the LLM come up with the implementation plan, then also the implementation, and things end up a lot less than ideal.
Remove another layer, let the LLM do it all, and it's all a mess.
It's like those sequences of images where we ask the LLM to reproduce the same image exactly, and we just get some kind of grotesque collapse after a few dozen iterations. The same happens with text and code. I call this "semantic collapse".
I conjecture that after some years of LLMs reading a SharePoint site, producing summaries, then summaries of those summaries, etc... We will end up with a grotesque slurry.
At some point, fresh human input is needed to inject something meaningful into the process.
People like to make the comparison between zip file compressions, where you can degrade something by continually compressing. Same with using jpeg or mp3. But I like to use the analogy of the game "Telephone" (also called "Chinese Whispers"). I think it also highlights how fraught natural language is and just how quickly it can degrade. I think a lot of people are insufficiently impressed with how good we are at communicating at all.
I think this principle applies only if you lack feedbacks. Yes, when you go through multiple layers of open loop control, you're going to get less precise at each layer. It's less clear that the situation is as dire if each level has metrics and can self-adjust to optimize its performance.
It's all about how full the context is, right? For a task that can be completed in 20% of the context it doesn't matter, but you don't want to fill your context with exploration before you do the hard part.
I have actually found something close to the opposite. I work on a large codebase and I often use the LLM to generate artifacts before performing the task (for complex tasks). I use a prompt to say "go explore this area if the code and write about it". It documents concepts and has pointers to specific code. Then a fresh session can use that without reading the stuff that doesn't matter. It uses more tokens overall, but includes important details that can get totally missed when you just let it go.