So? The point is that humans do it much less often.
Let's say there are 10 subtasks that need to be done.
Let's say a human has 99% chance of getting each one of them right, by doing the proper testing etc. And let's say that the AI has a 95% chance of getting it right (being very generous here).
0.99^10 = a 90% chance of the human getting it to work properly. 0.95^10 = only a 60% chance. Almost a coin toss.
Even with 98% success rate, the compounding success rate still goes down to about 81%.
The thing is that LLM's aren't just "a little bit" worse than humans. In comparison they're cavemen.
So humans do it much less often yet we have 30 years of evidence to the contrary? Humans still can’t figure out how to write code not subject to sql injection after 25 years or how to write code and commit it to GitHub without exposing admin credentials