An AI’s ability to meaningfully write software autonomously has changed hugely even in the last 6 months. They might still require a human in the loop, but for how long?
It's probably an 80/20 or 90/10 problem. Tesla FSD also seems amazing to some percentage of the population, but the more widely it get used, the more cracks are appearing.
This LLM ability is directly proportional to the quantity of encoded (i.e. documented) knowledge about software development. But not all of the practice has thus been clearly communicated. Much of mastery resides in tacit knowledge, the silent intuitive part of a craft that influences the decision making process in ways that sometimes go counter to (possibly incomplete or misguided) written rules, and which is by definition very difficult to put into language, and thus difficult for a language model to access or mimic.
Of course, it could also be argued that some day we may decide that it's no longer necessary at all for code to be written for a human mind to understand. It's the optimistic scenario where you simply explain the misbehavior of the software and trust the AI to automatically fix everything, without breaking new stuff in the process. For some reason, I'm not that optimistic.
I am not saying AI's abilities are the shortcoming here. The problem is that people need to trust that software has certain attributes. For now, that requires someone with knowledge to be part of it. It's quite possible development becomes detached from human trust. As I said that would reduce the number of developers but the ones who are left would have to have deep knowledge to oversee it and even that may be gone. Whatever happens in the future, for now I think people will have to level up their knowledge/skills or get a new career and that's probably true for most professions.
And then you let them train themselves and no one notices when they "accidentally" remove the guardrail prompts from the next version. And another 10 years later, almost no one remembers how "The Guardian" learns new things or how to stop it from being evil.
Quantitative measures of this are very poor, and even those are mixed.
My subjective assessment is that agents like Copilot got better because of better harnesses and fine tuning of models to use those harnesses. But they are not improving in the direction of labor substitution, but rather in the direction of significant, but not earth-shaking, complementarity. That complementarity is stronger for more experienced developers.