I'm so happy someone else says this, because I'm doing exactly the same. I tried to use agent mode in vs code and the output was still bad. You read simple things like: "We use it to write tests". I gave it a very simple repository, said to write tests, and the result wasn't usable at all. Really wonder if I'm doing it wrong.
You didn't actually just say "write tests" though right? What was the actual prompt you used?
I feel like that matters more than the tooling at this point.
I can't really understand letting LLMs decide what to test or not, they seem to completely miss the boat when it comes to testing. Half of them are useless because they duplicate what they test, and the other half doesn't test what they should be testing. So many shortcuts, and LLMs require A LOT of hand-holding when writing tests, more so than other code I'd wager.
“Write tests“ may not be enough; provide it with a test harness, and instruct it to “write tests until they pass “. Next would be “your feature isn’t complete without N% coverage”. These require the ‘agentic’ piece, which is at its simplest some prompts run in a loop until an exit condition is met.
> I gave it a very simple repository, said to write tests, and the result wasn't usable at all. Really wonder if I'm doing it wrong.
I think so. The humans should be writing the spec. The AI can then (try to) make the tests pass.
No, you have similar experience as a lot of people have.
LLMs just fail (hallucinate) in less known fields of expertise.
Funny: Today I have asked Claude to give me syntax how to run Claude Code. And its answer was totally wrong :) So you go to documentation… and its parts are obsolete as well.
LLM development is in style “move fast and break things”.
So in few years there will be so many repos with gibberish code because “everybody is coder now” even basketball players or taxi drivers (no offense, ofc, just an example).
It is like giving F1 car to me :)
you need to write a test suite to check his test generation (soft /s)
I’m not particularly proAI but I struggle with the mentality some engineers seem to apply to trying.
If you read someone say “I don’t know what’s the big deal with vim, I ran it and pressed some keys and it didn’t write text at all” they’d be mocked for it.
But with these tools there seems to be an attitude of “if I don’t get results straight away it’s bad”. Why the difference?