logoalt Hacker News

while1today at 3:12 PM1 replyview on HN

We're building AI testing tools at QA.tech and this matches my experience. Great post. The hard part was never generating code. It's figuring out if what came out is actually correct. Our team runs multiple AI agents in parallel writing code and honestly we spend way more time on verification than generation at this point. The ratio keeps getting worse as the models get better at producing plausible-looking stuff.

The codebase growth numbers feel right to me. Even conservative 2x productivity gains break most review processes. We ended up having to build our own internal review bot that checks the AI output because human review just doesn't keep up. But it has to be narrow and specific, not another general model doing vibes-based review.


Replies

sosnsbbstoday at 3:30 PM

> way more time on verification than generation

Was generation a bottle neck previously? My experience has been verification is always the slow part. Often times it’s quicker to do it myself than try to provide the perfect context (via agents.md, skills, etc) to the agent.

The times it’s able to 1 shot things is also code that would take me the shortest amount of time to write.