If you have decent unit and functional tests, why do you care how the code is written?
This feels like the same debate assembly programmers had about C in the 60s. "You don’t understand what the compiler is doing, therefore it’s dangerous". Eventually we realised the important thing isn’t how the code was authored but whether the behaviour is correct, testable, and maintainable.
If code generated by an LLM:
- passes a real test suite (not toy tests),
- meets performance/security constraints,
- goes through review like any other change,
then the acceptance criteria haven’t changed. The test suite is part of the spec. If the spec is enforced in CI, the authoring tool is secondary.The real risk isn’t "LLMs as compilers", it’s letting changes bypass verification and ownership. We solved that with C, with large dependency trees, with codegen tools. Same playbook applies here.
If you give expected input and get expected output, why does it matter how the code was written?
Because testing at this level is a likely impossible across all domains of programming. You can narrow the set of inputs and get relatively far, but the more complex the systems the broader the space of problems becomes. And even a simple crud app on an EC2 has a lot more failure modes than people are able to test for with current tools.
> passes a real test suite (not toy tests)
“not toy tests” is doing a lot of heavy lifting here. Like an immeasurable amount of lifting.