logoalt Hacker News

lunar_mycroftlast Wednesday at 9:08 PM1 replyview on HN

> if at all realistic numbers are mentioned, I see people claiming 20 - 50%

IME most people claim small integer multiples, 2-5x.

> all the largest studies I've looked at mention this clearly and explain how they try to address it.

Yes, but I think pre-AI virtually everyone reading this would have been very skeptical about their ability to do so.

> My current hypothesis, based on the DORA and DX 2025 reports, is that quality is largely a function of your quality control processes (tests, CI/CD etc.)

This is pretty obviously incorrect, IMO. To see why, let's pretend it's 2021 and LLMs haven't come out yet. Someone is suggesting no longer using experienced (and expensive) first world developers to write code. Instead, they suggest hiring several barely trained boot camp devs (from low cost of living parts of the world so they're dirt cheap) for every current dev and having the latter just do review. They claim that this won't impact quality because of the aforementioned review and their QA process. Do you think that's a realistic assessment? If and on the off chance you think it is, why didn't this happen on a larger scale pre-LLM?

The resolution here is that while quality control is clearly important, it's imperfect, ergo the quality of the code before passing through that process still matters. Pass worse code in, and you'll get worse code out. As such, any team using the method described above might produce more code, but it would be worse code.

> the largest study in the link above explicitly discuss it and find that proxies for quality (like approval rates) indicate more improvement than a decline

Right, but my point is that that's a sanity check failure. The fact that shoving worse at your quality control system will lower the quality of the code coming out the other side is IMO very well established, as is the fact that LLM generated code is still worse than human generated (where the human knows how to write the code in question, which they should if they're going to be responsible for it). It follows that more LLM code generation will result in worse code, and if a study finds the opposite it's very likely that the it made some mistake.

As an analogy, when a physics experiment appeared to find that neutrino travel faster than the speed of light in a vacuum, the correct conclusion was that there had almost certainly been a problem with the experiment, not that neutrinos actually travel faster than the speed of light. That was indeed the explanation. (Note that I'm not claiming that "quality control processes cannot completely eliminate the effect of input code quality" and "LLM generated code is worse than human generated code" are as well established as relativity.)


Replies

keedalast Thursday at 12:20 AM

> Yes, but I think pre-AI virtually everyone reading this would have been very skeptical about their ability to do so.

That's not quite true: while everybody acknowledged it was folly to measure absolute individual productivity, there were aggregate metrics many in the industry were aligning on like DORA or the SPACE framework, not to mention studies like https://dl.acm.org/doi/abs/10.1145/3540250.3558940

Similarly, many of these AI coding studies do not look at productivity on an individual level at a point of time, but in aggregate and over an extended period of time using a randomized controlled trial. It's not saying Alice is more productive than Bob, it's saying Alice and Bob with AI are on average more productive than themselves without AI.

> They claim that this won't impact quality because of the aforementioned review and their QA process. Do you think that's a realistic assessment? If and on the off chance you think it is, why didn't this happen on a larger scale pre-LLM?

Interestingly, I think something similar did happen pre-LLM at industry-scale! My hypothesis (based on observations when personally involved) is that this is exactly what allowed offshoring to boom. The earliest attempts at offshoring were marked by high-profile disasters that led many to scoff at the whole idea. However companies quickly learned and instituted better processes that basically made failures an exception rather than the norm.

I expand a bit more and draw parallels to coding with AI here: https://news.ycombinator.com/item?id=44944717

> ... as is the fact that LLM generated code is still worse than human generated...

I still don't think that can be assumed as a fact. The few studies I've seen find comparable outcomes, with LLMs actually having a slight edge in some cases, e.g.

- https://arxiv.org/abs/2501.16857

- https://arxiv.org/html/2508.00700v1

show 1 reply