So? The point is that humans do it much less often.
Let's say there are 10 subtasks that need to be done.
Let's say a human has 99% chance of getting each one of them right, by doing the proper testing etc. And let's say that the AI has a 95% chance of getting it right (being very generous here).
0.99^10 = a 90% chance of the human getting it to work properly.
0.95^10 = only a 60% chance. Almost a coin toss.
Even with 98% success rate, the compounding success rate still goes down to about 81%.
The thing is that LLM's aren't just "a little bit" worse than humans. In comparison they're cavemen.
So? The point is that humans do it much less often.
Let's say there are 10 subtasks that need to be done.
Let's say a human has 99% chance of getting each one of them right, by doing the proper testing etc. And let's say that the AI has a 95% chance of getting it right (being very generous here).
0.99^10 = a 90% chance of the human getting it to work properly. 0.95^10 = only a 60% chance. Almost a coin toss.
Even with 98% success rate, the compounding success rate still goes down to about 81%.
The thing is that LLM's aren't just "a little bit" worse than humans. In comparison they're cavemen.