Hearing people on tech twitter say that LLMs always produce better code than they do by hand was pretty enlightening for me.
LLMs can produce better code for languages and domains I’m not proficient in, at a much faster rate, but damn it’s rare I look at LLM output and don’t spot something I’d do measurably better.
These things are average text generation machines. Yes you can improve the output quality by writing a good prompt that activates the right weights, getting you higher quality output. But if you’re seeing output that is consistently better than what you produce by hand, you’re probably just below average at programming. And yes, it matters sometimes. Look at the number of software bugs we’re all subjected to.
And let’s not forget that code is a liability. Utilizing code that was “cheap” to generate has a cost, which I’m sure will be the subject of much conversation in the near future.
After a certain experience level though, I think most of us get to the point of knowing what that difference in quality actually matters.
Some seniors love to bikeshed PRs all day because they can do it better but generally that activity has zero actual value. Sometimes it matters, often it doesn't.
Stop with the "I could do this better by hand" and ask "is it worth the extra 4 hours to do this by hand, or is this actually good enough to meet the goals?"
Claude Code is way better than I am at rummaging through Git history, handling merge conflicts, renaming things, writing SQL queries whose syntax I always forget (window functions and the like). But yeah. If I give it a big, non-specific task, it generates a lot of mediocre (or worse) code.
As someone who: 1.) Has a brain that is not wired to think like a computer and write code. 2.) Falls asleep at the keyboard while writing code for more than an hour or two. 3.) Has a lot of passion for sticking with an idea and making it happen, even if that means writing code and knowing the code is crap.
So, in short, LLMs write better code than I do. I'm not alone.
I've been playing with vibe coding a lot lately and I think in most cases, the current SOTA LLM's don't produce code that I'd be satisfied with. I kind of feel like LLM's are really really good at hacking on a messy and fragile structure, because they can "keep track many things in their head"
BUT
An LLM can write a PNG decoder that works in whatever language I choose in one or a few shots. I can do that too, but it will take me longer than a minute!
(and I might learn something about the png format that might be useful later..)
Also, us engineers can talk about code quality all day, but does this really matter to non-engineers? Maybe objectively it does, but can we convince them that it does?
LLMs are not "average text generation machines" once they have context. LLMs learn a distribution.
The moment you start the prompt with "You are an interactive CLI tool that helps users with software engineering at the level of a veteran expert" you have biased the LLM such that the tokens it produces are from a very non-average part of the distribution it's modeling.
I worked for a relatively large company (around 400 employees there are programmers). The people who embraced LLM-generated code clearly share one trait: they are feature pushers who love to say "yes!" to management. You see, management is always right, and these programmers are always so eager to put their requirements, however incomplete, into a Copilot session and open a pull request as fast as possible.
The worst case I remember happened a few months ago when a staff (!) engineer gave a presentation about benchmarks they had done between Java and Kotlin concurrency tools and how to write concurrent code. There was a very large and strange difference in performance favoring Kotlin that didn't make sense. When I dug into their code, it was clear everything had been generated by a LLM (lots of comments with emojis, for example) and the Java code was just wrong.
The competent programmers I've seen there use LLMs to generate some shell scripts, small python automations or to explore ideas. Most of the time they are unimpressed by these tools.
In my view they’re great for rough drafts, iterating on ideas, throwaway code, and pushing into areas I haven’t become proficient in yet. I think in a lot of cases they write ok enough tests.
It'd be rather surprising if you could train an AI on a bunch of average code, and somehow get code that's always above average. Where did the improvement come from?
We should feed the output code back in to get even better code.
> Hearing people on tech twitter say that LLMs always produce better code than they do by hand was pretty enlightening for me.
That's hilarious LLM code is always very bad. It's only merit is it occasionally works.
> LLMs can produce better code for languages and domains I’m not proficient in.
I am sure that's not true.
This has been my experience as well.
It's let me apply my general knowledge across domains, and do things in tech stacks or languages I don't know well. But that has also cost me hours debugging a solution I don't quite understand.
When working in my core stack though it's a nice force multiplier for routine changes.
How are you judging that you'd write "better" code? More maintainable? More efficient? Does it produce bugs in the underlying code it's generating? Genuinely curious where you see the current gaps.
> if you’re seeing output that is consistently better than what you produce by hand, you’re probably just below average at programming
even though this statement does not mathematically / statistically make sense - vast majority of SWEs are “below average.” therein lies the crux of this debate. I’ve been coding since the 90’s and:
- LLM output is better than mine from the 90’s
- LLM output is better than mine from early 2000’s
- LLM output is worse than any of mine from 2010 onward
- LLM output (in the right hands) is better than 90% of human-written code I have seen (and I’ve seen a lot)
The most prolific coders are also more competent than average. Their proliferations are what have trained these models. These models are trained on incredibly successful projects written by masters of their fields. This is usually where I find the most pushback is that the most competent SWEs see it as theft and also useless to them since they have already spend years honing skills to work relentlessly and efficiently towards solutions -- sometimes at great expense.
If firing up old coal plants and skyrocketing RAM prices and $5000 consumer GPUs and violating millions of developers' copyrights and occasionally coaxing someone into killing themselves is the cost of Brian Who Got Peter Principled Into Middle Management getting to enjoy programming again instead of blaming his kids for why he watches football and drinks all weekend instead of cultivating a hobby, I guess we have no choice but to oblige him his little treat.
I saw someone refer to it as future digital asbestos.
> These things are average text generation machines.
Funny... seems like about half of devs think AI writes good code, and half think it doesn't. When you consider that it is designed to replicate average output, that makes a lot of sense.
So, as insulting as OP's idea is, it would make sense that below-average devs are getting gains by using AI, and above-average devs aren't. In theory, this situation should raise the average output quality, but only if the training corpus isn't poisoned with AI output.
I have an anecdote that doesn't mean much on its own, but supports OP's thesis: there are two former coworkers in my linkedin feed who are heavy AI evangelists, and have drifted over the years from software engineering into senior business development roles at AI startups. Both of them are unquestionably in the top 5 worst coders I have ever worked with in 15 years, one of them having been fired for code quality and testing practices. Their coding ability, transition to less technical roles, and extremely vocal support for the power of vibe coding definitely would align with OP's uncharitable character evaluation.