This article was more true than not a year ago but now the harnesses are so far past the simple agent loop that I'd argue that this is not even close to an accurate mental model of what claude code is doing.
But does that extra complexity actually improve performance?
https://www.tbench.ai/leaderboard/terminal-bench/2.0 says yes, but not as much as you'd think. "Terminus" is basically just a tmux session and LLM in a loop.
The article was also published one year ago on january 2025.
(Should have 2025 in the title? Time flies)
Less true than you think. A lot of the progress in the last year has been tightening agentic prompts/tools and getting out of the way so the model can flex. Subagents/MCP/Skills are all pretty mid, and while there has been some context pruning optimization to avoid carrying tool output along forever, that's mainly a benefit to long running agents and for short tasks you won't notice.
it seems to have changed a ton in recent versions too — I would love more details on what exactly
I find it doing what I in the past had to interrupt and tell it to do fairly frequently now
Agreed. You can get a better model using the codex-cli repo and having an agent help you analyze the core functionality.
Obviously modern harnesses have better features but I wouldn't say it invalidates the mental model. Simpler agents aren't that far behind in performance if the underlying model is the same, including very minimal ones with basic tools.
I'd say it's similar to how a "make your own relational DB" article might feature a basic B-tree with merge-joins. Yeah, obviously real engines have sophisticated planners, multiple join methods, bloom filters, etc., but the underlying mental model is still accurate.