It's starting to become obvious that if you can't effectively use AI to build systems it is a skill issue. At the moment it is a mysterious, occasionally fickle, tool - but if you provide the correct feedback mechanisms and provide small tweaks and context at idiosyncrasies, it's possible to get agents to reliably build very complex systems.
Say i buy into your mysticism based take, is it a useful tool if it blows up in damn well near every professionals face?
lets say i accept you and you alone have the deep majiks required to use this tool correctly, when major platform devs could not so far, what makes this tool useful? Billions of dollars and environment ruining levels of worth it?
I'd say the only real use for these tools to date has been mass surveillance, and sometimes semi useful boilerplate.
> It's starting to become obvious that if you can't effectively use AI to build systems it is a skill issue.
I think it's fair to say that you can get a long way with Claude very quickly if you're an individual or part of a very small team working on a greenfield project. Certainly at project sizes up to around 100k lines of code, it's pretty great.
But I've been working startups off and on since 2024.
My last "big" job was with a company that had a codebase well into the millions of lines of code. And whilst I keep in contact with a bunch of the team there, and I know they do use Claude and other similar tools, I don't get the vibe it's having quite the same impact. And these are very talented engineers, so I don't think it's a skill either.
I think it's entirely possible that Claude is a great tool for bootstrapping and/or for solo devs or very small teams, but becomes considerably less effective when scaled across very large codebases, multiple teams, etc.
For me, on that last point, the jury is out. Hopefully the company I'm working with now grows to a point where that becomes a problem I need to worry about but, in the meantime, Claude is doing great for us.
Partially agree, but I think "skill issue" undersells the genuine reliability problem the original post is describing.
The skill part is real — giving the agent the right context, breaking tasks into the right size, knowing when to intervene. Most people aren't doing that well and their results reflect it.
But the latent bug problem isn't really a skill issue. It's a property of how these systems work: the agent optimises for making the current test pass, not for building something that stays correct as requirements change. Round 1 decisions get baked in as assumptions that round 3 never questions — and no amount of better prompting fixes that.
The fix isn't better prompting. It's treating agent-generated code with the same scepticism you'd apply to code from a contractor who won't be around to maintain it — more tests, explicit invariants, and not letting the agent touch the architecture without a human reviewing the design first.
According to the https://blog.katanaquant.com/p/your-llm-doesnt-write-correct... previously discussed on HN, it may be at least partially true:
> The vibes are not enough. Define what correct means. Then measure.
> if you provide the correct feedback
And how do you define correct feedback? If the output is correct?
Truth Nuke
[dead]
I'd agree, I've been building a personal assistant (https://github.com/skorokithakis/stavrobot) and I'm amazed that, for the first time ever, LLMs manage to build reliably, with much fewer bugs than I'd expect from a human, and without the repo devolving to unmaintainability after a few cycles.
It's really amazing, we've crossed a threshold, and I don't know what that means for our jobs.
> At the moment it is a mysterious, occasionally fickle, tool - but if you provide the correct feedback mechanisms and provide small tweaks and context at idiosyncrasies, it's possible to get agents to reliably build very complex.
This sounds like arguing you can use these models to beat a game of whack-a-mole if you just know all the unknown unknowns and prompt it correctly about them.
This is an assertion that is impossible to prove or disprove.