logoalt Hacker News

dakiolyesterday at 10:11 PM6 repliesview on HN

Honest question: if you're using multiple agents, it's usually to produce not a dozen lines of code. It's to produce a big enough feature spanning multiple files, modules and entry points, with tests and all. So far so good. But once that feature is written by the agents... wouldn't you review it? Like reading line by line what's going on and detecting if something is off? And wouldn't that part, the manual reviewing, take an enormous amount of time compare to the time it took the agents to produce it? (you know, it's more difficult to read other people's/machine code than to write it yourself)... meaning all the productivity gained is thrown out the door.

Unless you don't review every generated line manually, and instead rely on, let's say, UI e2e testing, or perhaps unit testing (that the agents also wrote). I don't know, perhaps we are past the phase of "double check what agents write" and are now in the phase of "ship it. if it breaks, let agents fix it, no manual debugging needed!" ?


Replies

Leynosyesterday at 10:50 PM

Here's what I suggest:

Serious planning. The plans should include constraints, scope, escalation criteria, completion criteria, test and documentation plan.

Enforce single responsibility, cqrs, domain segregation, etc. Make the code as easy for you to reason about as possible. Enforce domain naming and function / variable naming conventions to make the code as easy to talk about as possible.

Use code review bots (Sourcery, CodeRabbit, and Codescene). They catch the small things (violations of contract, antipatterns, etc.) and the large (ux concerns, architectural flaws, etc.).

Go all in on linting. Make the rules as strict as possible, and tell the review bots to call out rule subversions. Write your own lints for the things the review bots are complaining about regularly that aren't caught by lints.

Use BDD alongside unit tests, read the .feature files before the build and give feedback. Use property testing as part of your normal testing strategy. Snapshot testing, e2e testing with mitm proxies, etc. For functions of any non-trivial complexity, consider bounded or unbounded proofs, model checking or undefined behaviour testing.

I'm looking into mutation testing and fuzzing too, but I am still learning.

Pause for frequent code audits. Ask an agent to audit for code duplication, redundancy, poor assumptions, architectural or domain violations, TOCTOU violations. Give yourself maintenance sprints where you pay down debt before resuming new features.

The beauty of agentic coding is, suddenly you have time for all of this.

show 3 replies
Salgatyesterday at 10:21 PM

This is the biggest bottleneck for me. What's worse is that LLMs have a bad habit of being very verbose and rewriting things that don't need to be touched, so the surface area for change is much larger.

show 3 replies
jwilliamstoday at 6:31 AM

It’s a blend. There are plenty of changes in a production system that don’t necessarily need human review. Adding a help link. Fixing a typo. Maybe upgrades with strong CI/CD or simple ui improvements or safe experiments.

There are features you can skip safely behind feature flags or staged releases. As you push in you fine with the right tooling it can be a lot.

If you break it down often quite a bit can be deployed safely with minimal human intervention (depends naturally on the domain, but for a lot of systems).

I’m aiming to revamp the while process - I wrote a little on it here : https://jonathannen.com/building-towards-100-prs-a-day/

browningstreetyesterday at 10:39 PM

I use coding agents to produce a lot of code that I don’t ship. But I do ship the output of the code.

keedayesterday at 11:39 PM

> you know, it's more difficult to read other people's/machine code than to write it yourself

Not at all, it's just a skill that gets easier with practice. Generally if you're in the position to review a lot of PR's, you get proficient at it pretty quickly. It's even easier when you know the context of what the code is trying to do, which is almost always the case when e.g. reviewing your team-mates' PR's or the code you asked the AI to write.

As I've said before (e.g. https://news.ycombinator.com/item?id=47401494), I find reviewing AI-generated code very lightweight because I tend to decompose tasks to a level where I know what the code should look like, and so the rare issues that crop up quickly stand out. I also rely on comprehensive tests and I review the test cases more closely than the code.

That is still a huge amount of time-savings, especially as the scope of tasks has gone from a functions to entire modules.

That said, I'm not slinging multiple agents at a time, so my throughput with AI is way higher than without AI, but not nearly as much as some credible reports I've heard. I'm not sure they personally review the code (e.g. they have agents review it?) but they do have strategies for correctness.

show 1 reply
MattGaiseryesterday at 11:17 PM

Yep. In many cases I am just reviewing test cases it generated now.

> if it breaks, let agents fix it, no manual debugging needed!" ?

Pretty trivial to have every Sentry issue have an immediate first pass by AI now to attempt to solve the bug.