logoalt Hacker News

dgb23today at 12:48 PM0 repliesview on HN

I would push back on that question a little, because it has a baked in assumption about how these things work that conflict with my mental model and experience with them.

The reason is that sometimes it spits out something or does a workflow that's pretty sophisticated, and sometimes it fails spectacularly in the most basic ways.

I don't think there is a complexity or domain knowledge limit as there would be with a human. Or at least not in the same sense. As long as it can repeat and remix patterns that it is trained on, then it will do its thing well. The same seems to be true for "reasoning" loops and workflows. It can spit out code that has been done N times before in a similar manner for a large N.

They can still break down because of very trivial issues and assumptions that happen to be baked in, go off the rails and get stuck long loops that are completely insane if you think of them as imitating human programmers.

When I use an agent, I always interview it first about the task. Ask how they would go about it, probe them, give them info that they lack.

Never go from prompt to action. Have them define their approach first, then split the approach into pieces, from gathering data to cleaning it up and so on. If applicable, front-load work that can be achieved with scripts, so you have testable and repeatable steps rather than let it go wild.

So the TLDR is: I think the limitation is simply that it's a non-deterministic token machine that produces useful results enough of the time so it appears to be reasonable.