Great article and good description of LLM code quality problems and problems that derive from that. And fair to not want a tidal wave of slop to displace your entire craft.
But this article is strangely lacking in foresight in terms of rapidly evolving model capabilities and output. One visual way to see this is to compare levels of SOTA video generation models. Look at outputs from Sora, to Veo, to Seedance 2.0, and now just released Seedance 2.5.
Or compare LLMs/VLMs as they have progressed: GPT-2, GPT-3, GPT-4, Opus, Fable/Mythos.
You can see the level of sloppiness or poor world understanding progress from comical nonsense to junior to senior with a few holes in their brain to an engineer you can actually almost trust to produce clean code if you mention the right guidelines in your instructions (such as avoiding overly local code).
As the model size/complexity increases, the intelligence increases, and so does code quality. We will also start specifically putting more high level code quality tasks into training datasets and training harnesses. I mean, Karpathy will probably see this article and make a huge dent in the issues without even larger models.
One thing people may not be aware of is that there is still a lot of room for hardware efficiency improvements and model size to grow. The compute-in-memory paradigm is just getting started in a way. Look at companies like Tensordyne and Mythic AI, but they are going to get blown out of the water by fully in-memory approaches.
For example look at the recent wurtzite ferroelectric nitrides breakthrough from the University of Michigan team (one of them tragically jumped from height after intense interrogation regarding national security concerns). The military is providing significant funding to move this towards development and scaling out of the lab.
That type or level of truly new paradigm system is going to boost efficiency by multiple orders of magnitude.
I know there are people who think Fable 5 was the end of the public LLM/VLM frontier moving, or that it is impossible to scale models further due to energy consumption. But there is zero chance that every high level VLM/LLM research team on the planet is going to stop publishing models or that the rapid progress in compute efficiency will stop.
Point being, within a year or two, the code coming out will be much cleaner. And within five or six years what you may see is that the leading models are 100+ trillion parameters and have sophisticated persistent context management etc. and they do not even produce application source code.
Instead, the database is in the context and is neurally rendered at 24 fps into whatever UI, schema and business logic you prompt it with in a broad way. The whole application is just precise thinking in an artificial brain ten times the complexity of an equivalent human brain.
And if you are disturbed by the current level of outsourcing for thinking to AI, it is just getting started. In a way it will be incredible, from another perspective horrific, but what I think we are seeing is the evolution of an ExoCortex. There will be an AI glasses stage where the integration is closer but still somewhat low bandwidth.
But sooner than later we are headed towards high bandwidth brain computer interfaces that make AI into an actual new cognitive layer.
So the waves of slop might make you feel sick, but that is nothing compared to the transhuman cyborgs powered by superhuman AI that are around the corner.