I feel like OP is still in the year 2025. > The AI will have gone off the rails multiple times ...

ed_mercer • yesterday at 10:41 PM • 5 replies • view on HN

I feel like OP is still in the year 2025.

> The AI will have gone off the rails multiple times and you will only notice it later when you actually try to use the software.

Except that said AI can now themselves use your software and find and fix bugs themselves, not to mention drive new features.

>Your agent might go “off the rails” and start doing something you don’t want it to do

This happens but far less often than it used to, and the case for full autonomous agents is getting stronger, not weaker.

>It is humanly impossible to build your own understanding of a codebase

This again feels outdated. I think we're mving towards humans no longer needing to understand a codebase, and letting AI drive it.

Replies

alexsmirnov • today at 3:15 AM

> This happens but far less often than it used to, and the case for full autonomous agents is getting stronger, not weaker.

This is that I do not see. My journey, just couple weeks ago, Claude Code + Opus 4.8. The task was not too complicated, 4 new API endpoint plus events streamed from client by websocket.

1. Multiply iterations on API definitions, refine request/response models, database schema, whole flow. A lot of corrections, removing contradictions, manual changes in document. Opus went of rails all the time. 500+ lines final document

2. API Integration tests. Once again, back and forth. AI was unable to create tests directly from document, so 2 iterations: Create placeholders with Given-When-Than comments, review an correct by hand. Second iteration was to implement tests. A lot of mistakes corrected after review.

3. Implementation. CC got api document, working tests ( modifications blocked by hook ), 6+ "best practices" skills ( most promptly ignored ), "rubber duck" and "code simplifier" agents, pre cooked scipts to run tests, linter, and check for compilation errors. Plan + execution + review, multiply corrections on the way. Feature implemented, all tests passed.

4. Code review. At average, found one issue per 20 lines of code. Not count code style, things like: Use in memory semaphore in kubernetes service (deployment described in CLAUDE.md ), 8 database calls to update the same record during a single request. One column at a time! Read-modify-save without transaction. Mistakes in business logic, failure recovery, authorization.

The result: almost one workweek, $100+ in tokens, and one thought: did it worth the effort ? P.S. I have a team of 2 developers. Just got PR to review from one of them. 80% slop.

➕ show 1 reply

not_a_bot_4sho • today at 2:51 AM

> I think we're mving towards humans no longer needing to understand a codebase, and letting AI drive it.

I can see this being true for non-critical software like entertainment, media, and so on.

Definitely not true for systems where security stakes are high. Like banking, aviation, defense, etc.. AI will surely contribute but not independent of human engineering understanding.

➕ show 1 reply

CodingJeebus • yesterday at 11:10 PM

> I think we're mving towards humans no longer needing to understand a codebase, and letting AI drive it.

Hard disagree. Even the best frontier models generate output that's not what I asked for. Sometimes I realize that I get lazy in my prompting and the lack of specificity winds up showing up in the output. Just the other day, a coworker built a huge feature using frontier models and it slipped an IDOR in.

I just don't see a world in which we completely cede control of the codebase to AI because it's still my ass on the line if I ship something that completely borks production. If I'm not reading code regularly, then I lose the ability to read code, and if I lose that ability, then I'm no longer a developer.

➕ show 1 reply

fatata123 • today at 1:41 AM

[dead]

alt Hacker News

Replies