I think the issue goes even deeper than verification. Verification is technically possible. You could, in theory, build a C compiler or a browser and use existing tests to confirm it works.
The harder problem is discovery: how do you build something entirely new, something that has no existing test suite to validate against?
Verification works because someone has already defined what "correct" looks like. There is possible a spec, or a reference implementation, or a set of expected behaviours. The system just has to match them.
But truly novel creation does not have ground truth to compare against and no predefined finish line. You are not just solving a problem. You are figuring out what the problem even is.
Because of the scale of generated code, often it is the AI verifying the AI's work.
At the end of the day you need humans who understand the business critical (or safety critical) systems that underpin the enterprise.
Someone needs to be held accountable when things go wrong. Someone needs to be able to explain to the CEO why this or that is impossible.
If you want to have AI generate all the code for your business critical software, fine, but you better make sure you understand it well. Sometimes the fastest path to deep understanding is just coding things out yourself - so be it.
This is why the truly critical software doesn’t get developed much faster when AI tools are introduced. The bottleneck isn’t how fast the code can be created, it’s how fast humans can construct their understanding before they put their careers on the line by deploying it.
Ofc… this doesn’t apply to prototypes, hackathons, POCs, etc. for those “low stakes” projects, vibe code away, if you wish.
I'm in the process of building v2.0 of my app using opus 4.6 and largely agree with this.
It's pretty awesome but still does a lot of basic idiotic stuff. I was implementing a feature that required a global keyboard shortcut and asked opus to define it, taking into account not to clash with common shortcuts. He built a field where only one modifier key was required. After mentioning that this was not safe since users could just define CTRL+C for the shortcut and we need more safeguards and require at least two modifier keys I got the usual "you're absolutely right" and proceeded to require two modifier keys. But then it also created a huge list of common shortcuts into a blacklist like copy, cut, paste, print, select all, etc.. basically a bunch of single modifier key shortcuts. Once I mentioned that since we're already forcing two modifier keys that's useless it said I'm right again and fixed it.
The counter point of this idiocy is that it's very good overall at a lot of what is (in my mind) much more complicated stuff. It's a .NET app and stuff like creating models, viewmodels, usercontrols, setting up the entire hosting DI with pretty much all best practices for .net it does it pretty awesomely.
tl;dr is that training wheels are still mandatory imho
No one does currently, and its going to take a few very painful and high profile failures of vital systems for this industry to RELEARN its lesson about the price of speed.
In fact it will probably need to happen a few times PER org for the dust to settle. It will take several years.
No one really. Code is for humans to read and for machines to compile and execute. Llms are enabling people to just write the code and not have anyone read it. It’s solving a problem that didn’t really exist (we already had code generators before llms).
It’s such an intoxicating copyright-abuse slot machine that a buddy who is building an ocaml+htmx tree editor told me “I always get stuck and end up going to the llm to generate code. Usually when I get to the html part.” I asked if he used a debugger before that, he said “that’s a good idea”.
[dead]
no one wants to believe this but there will be a point soon when an ai code review meets your compliance requirements to go to production. is that 2026? no. but it will come
In his latest essay, Leonardo de Moura makes a compelling case that if AI is going to write a significant portion of the world’s software, then verification must scale alongside generation. Testing and code review were never sufficient guarantees, even for human-written systems; with AI accelerating output, they become fundamentally inadequate. Leo argues that the only sustainable path forward is machine-checked formal verification — shifting effort from debugging to precise specification, and from informal reasoning to mathematical proof checked by a small, auditable kernel. This is precisely the vision behind Lean: a platform where programs and proofs coexist, enabling AI not just to generate code, but to generate code with correctness guarantees. Rather than slowing development, Lean-style verification enables trustworthy automation at scale.
You do. Even the latest models still frequently write really weird code. The problem is some developers now just submit code for review that they didn't bother to read. You can tell. Code review is more important than ever imho.