I don't understand how to get even bad results. Or any results at all. I'm at a level wher...

alkonaut • today at 12:01 PM • 2 replies • view on HN

I don't understand how to get even bad results. Or any results at all. I'm at a level where I'm going "This can't just be me not having read the manual".

I get the same change applied multiple times, the agent having some absurd method of applying changes that conflict with what I say it like some git merge from hell and so on. I can't get it to understand even the simplest of contexts etc.

It's not really that the code it writes might not work. I just can't get past the actual tool use. In fact, I don't think I'm even at the stage where the AI output is even the problem yet.

Replies

kace91 • today at 1:50 PM

>I don't understand how to get even bad results. Or any results at all. I'm at a level where I'm going "This can't just be me not having read the manual".

>I get the same change applied multiple times, the agent having some absurd method of applying changes that conflict with what I say it like some git merge from hell and so on. I can't get it to understand even the simplest of contexts etc.

That is weird. results have a ton of variation, but not that much.

Say you get a claude subscription, point it to a relatively self contained file in your project, hand it the command to run relevant tests, and tell it to find quick win refactoring opportunities, making sure that the business outcome of the tests is maintained even if mocks need to change.

You should get relevant suggestions for refactoring, you should be able to have the changes applied reasonably, you should have the tests passing after some iterations of running and fixing by itself. At most you might need to check that it doesn't cheat by getting a false positive in a test or something similar.

Is such an exercise not working for you? I'm genuinely curious.

TeMPOraL • today at 12:42 PM

> I'm at a level where I'm going "This can't just be me not having read the manual".

Sure it can, because nobody is reading manuals anymore :).

It's an interesting exercise to try: take your favorite tool you use often (that isn't some recent webshit, devoid of any documentation), find a manual (not a man page), and read it cover to cover. Say, GDB or Emacs or even coreutils. It's surprising just how much powerful features good software tools have, and how much you'll learn in short time, that most software people don't know is possible (or worse, decry as "too much complexity") just because they couldn't be arsed to read some documentation.

> I just can't get past the actual tool use. In fact, I don't think I'm even at the stage where the AI output is even the problem yet.

The tools are a problem because they're new and a moving target. They're both dead simple and somehow complex around the edges. AI, too, is tricky to work, particularly when people aren't used to communicating clearly. There's a lot of surprising problems (such as "absurd method of applying changes") that come from the fact that AI is solving a very broad class of problems, everywhere at the same time, by virtue of being a general tool. Still needs a bit of and-holding if your project/conventions stray away from what's obvious or popular in particular domain. But it's getting easier and easier as months go by.

FWIW, I too haven't developed a proper agentic workflow with CLI tools for myself just yet; depending on the project, I either get stellar results or garbage. But I recognize this is only a matter of time investment: I didn't have much time to set aside and do it properly.

alt Hacker News

Replies