This is true, but also: everything I try works! I simply cannot come up with tasks the LLMs can&#x...

jeffrallen • yesterday at 3:54 PM • 2 replies • view on HN

This is true, but also: everything I try works!

I simply cannot come up with tasks the LLMs can't do, when running in agent mode, with a feedback loop available to them. Giving a clear goal, and giving the agent a way to measure it's progress towards that goal is incredibly powerful.

With the problem in the original article, I might have asked it to generate 100 test cases, and run them with the original Perl. Then I'd tell it, "ok, now port that to Typescript, make sure these test cases pass".

Replies

johnfn • yesterday at 4:46 PM

Really, you haven't found a single task they can't do? I like agents, but this seems a little unrealistic? Recently, I asked Codex and Claude both to "give me a single command to capture a performance profile while running a playwright test". Codex worked on this one for at least 2 hours and never succeeded, even though it really isn't that hard.

eru • yesterday at 7:37 PM

> I simply cannot come up with tasks the LLMs can't do, when running in agent mode, with a feedback loop available to them. Giving a clear goal, and giving the agent a way to measure it's progress towards that goal is incredibly powerful.

It's really easy to come up with plenty of algorithmic tasks that they can't do.

Like: implement an algorithm / data structure that takes a sequence of priority queue instructions (insert element, delete smallest element) in the comparison model, and return the elements that would be left in the priority queue at the end.

This is trivial to do in O(n log n). The challenge is doing this in linear time, or proving that it's not possible.

(Spoiler: it's possible, but it's far from trivial.)

alt Hacker News

Replies