logoalt Hacker News

sarchertechyesterday at 12:31 PM3 repliesview on HN

A compiler uses rigorous modeling and testing to ensure that generated code is semantically equivalent. It can do this because it is translating from one formal language to another.

Translating a natural prompt on the other hand requires the LLM to make thousands of small decisions that will be different each time you regenerate the artifact. Even ignoring non-determinism, prompt instability means that any small change to the spec will result in a vastly different program.

A natural language spec and test suite cannot be complete enough to encode all of these differences without being at least as complex as the code.

Therefore each time you regenerate large sections of code without review, you will see scores of observable behavior differences that will surface to the user as churn, jank, and broken workflows.

Your tests will not encode every user workflow, not even close. Ask yourself if you have ever worked on a non trivial piece of software where you could randomly regenerate 10% of the implementation while keeping to the spec without seeing a flurry of bug reports.

This may change if LLMs improve such that they are able to reason about code changes to the degree a human can. As of today they cannot do this and require tests and human code review to prevent them from spinning out. But I suspect at that point they’ll be doing our job, as well as the CEOs and we’ll have bigger problems.


Replies

LogicFailsMeyesterday at 1:22 PM

I don't see a world where a motivated soul can build a business from a laptop and a token service as a problem. I see it as opportunity.

I feel similarly about Hollywood and the creation of media. We're not there in either case yet, but we will be. That's pretty clear. and when I look at the feudal society that is the entertainment industry here, I don't understand why so many of the serfs are trying to perpetuate it in its current state. And I really don't get why engineers think this technology is going to turn them into serfs unless they let that happen to them themselves. If you can build things, AI coding agents will let you build faster and more for the same amount of effort.

I am assuming given the rate of advance of AI coding systems in the past year that there is plenty of improvement to come before this plateaus. I'm sure that will include AI generated systems to do security reviews that will be at human or better level. I've already seen Claude find 20 plus-year-old bugs in my own code. They weren't particularly mission critical but they were there the whole time. I've also seen it do amazingly sophisticated reverse engineering of assembly code only to fall over flat on its face for the simplest tasks.

show 2 replies
petcatyesterday at 12:56 PM

> A compiler uses rigorous modeling and testing to ensure that generated code is semantically equivalent.

Here are the reported miscompilation bugs in GCC so far in 2026. The ones labeled "wrong-code".

https://gcc.gnu.org/bugzilla/buglist.cgi?chfield=%5BBug%20cr...

I count 121 of them.

show 2 replies
raw_anon_1111yesterday at 12:46 PM

As if when you delegate tasks to humans they are deterministic. I would hope that your test cases cover the requirements. If not, your implementation is just as brittle when other developers come online or even when you come back to a project after six months.

show 1 reply