logoalt Hacker News

prodigycorplast Thursday at 8:19 PM7 repliesview on HN

This article was more true than not a year ago but now the harnesses are so far past the simple agent loop that I'd argue that this is not even close to an accurate mental model of what claude code is doing.


Replies

qsortlast Thursday at 8:36 PM

Obviously modern harnesses have better features but I wouldn't say it invalidates the mental model. Simpler agents aren't that far behind in performance if the underlying model is the same, including very minimal ones with basic tools.

I'd say it's similar to how a "make your own relational DB" article might feature a basic B-tree with merge-joins. Yeah, obviously real engines have sophisticated planners, multiple join methods, bloom filters, etc., but the underlying mental model is still accurate.

show 1 reply
alright2565last Thursday at 8:31 PM

But does that extra complexity actually improve performance?

https://www.tbench.ai/leaderboard/terminal-bench/2.0 says yes, but not as much as you'd think. "Terminus" is basically just a tmux session and LLM in a loop.

show 1 reply
lukanlast Thursday at 8:29 PM

The article was also published one year ago on january 2025.

(Should have 2025 in the title? Time flies)

show 1 reply
CuriouslyClast Thursday at 9:28 PM

Less true than you think. A lot of the progress in the last year has been tightening agentic prompts/tools and getting out of the way so the model can flex. Subagents/MCP/Skills are all pretty mid, and while there has been some context pruning optimization to avoid carrying tool output along forever, that's mainly a benefit to long running agents and for short tasks you won't notice.

show 1 reply
dkdciolast Thursday at 8:22 PM

it seems to have changed a ton in recent versions too — I would love more details on what exactly

I find it doing what I in the past had to interrupt and tell it to do fairly frequently now

show 1 reply
pamalast Thursday at 10:54 PM

Agreed. You can get a better model using the codex-cli repo and having an agent help you analyze the core functionality.

splikelast Thursday at 8:21 PM

I'm interested, could you expand on that?

show 1 reply