logoalt Hacker News

cadamsdotcomtoday at 3:53 AM3 repliesview on HN

If you reject AI code that works then your mindset is still too hands on. Put another way - you still have some loops to work on taking yourself out of. The agent should’ve delivered code that was acceptable as a first pass.

Agents respond really well to feedback! They have no ego and they’ll happily improve code if told where and how. But you need to provide the tools that provide that feedback without your involvement - otherwise you can’t scale.

All the linting and autoformatting you can put in, is a good start. Next, create custom scripts that check for every single dumb AI-ism you can think of, tell the agent about them, tell it to use them to check its work, and put them in hooks so the harness refuses to let the agent stop until all your linters show no errors.

Then, keep iterating basically forever. Any dumb AI-ism you see, make a linter for it, give it to the agent, and enforce it using the harness.

I’ve spent months doing this. When I review a PR - which was built by the agent with TDD so it definitely works - I’m no longer asking if it did dumb stuff or confirming it conformed to the architecture or duplicated code or missed opportunities for reuse. That’s all linted for. I don’t worry about duplication or outdated docstrings/comments because the self review caught all that. I mostly read it to look for opportunities to make the feature even better & more useful.

If this makes no sense or you disagree it’s possible, my contact details are on my profile and I’ll be happy to give a demo.


Replies

royal__today at 4:35 AM

The problem I have with this kind of approach is 1) it emphasizes scaling up a much as possible, which I don't believe is necessarily the most valuable thing, and 2) I really don't want my job to be band aiding agent problems, because it's like herding cats and there will never be an end to it. I'd rather just...get hands on and be involved in the code I am working to create.

show 1 reply
equinumeroustoday at 3:56 AM

I am very curious what some of your lint rules look like in practice. In my mind a lot of the AI-isms in my code that I hate are stylistic or a matter of taste, not necessarily something I could write a deterministic rule to check. But I want to hear more. Like, what kind of linters did you create and which were highest impact?

show 1 reply
unknownfuturetoday at 4:05 AM

Frankly, if that's truly your flow, then you cannot possibly know if the code really does what you expect it to do.

"TDD" isn't some magic trick. The tests codify the expected behavior. But if you don't review them for correctness, if you let the LLM build them blindly, then you have no idea what those tests assert and can make no claims about whether the code then does what you expect.

That's fine. That's your choice.

But you have to acknowledge you've chosen to accept that you personally cannot vouch for the quality or correctness of that code.

I fully expect this to be the direction the industry goes, where increasingly complex systems exist that no human actually understands or can reason about.

I think it's bad for the industry. Very bad.

But I'm not making those decisions, so... it is what it is, I guess.

show 2 replies