The benchmark point is interesting but I think it undersells what the complexity buys you in practic...

jackfranklyn • last Thursday at 8:46 PM • 1 reply • view on HN

The benchmark point is interesting but I think it undersells what the complexity buys you in practice. Yes, a minimal loop can score similarly on standardised tasks - but real development work has this annoying property of requiring you to hold context across many files, remember what you already tried, and recover gracefully when a path doesn't work out.

The TODO injection nyellin mentions is a good example. It's not sophisticated ML - it's bookkeeping. But without it, the agent will confidently declare victory three steps into a ten-step task. Same with subagents - they're not magic, they're just a way to keep working memory from getting polluted when you need to go investigate something.

The 200-line version captures the loop. The production version captures the paperwork around the loop. That paperwork is boring but turns out to be load-bearing.

Replies

dfajgljsldkjag • last Thursday at 8:52 PM

[flagged]

➕ show 5 replies

alt Hacker News

Replies