"I also encounter edge cases where they fail surprisingly badly multiple times per day. "
If 80% of the time they 10x my output, and the other 20% I can say "well they failed, I guess this one I have to do manually" - that's still an absolutely massive productivity boost.
But they don’t 10x my output - they write some code for a problem I/you have already thought about. The hard part isn’t writing the code, it never has been. It’s always been solving and breaking down the problem.