Just last week Opus 4.5 decided that the way to fix a test was to change the code so that everything else but the test broke.
When people say ”fix stuff” I always wonder if it actually means fix, or just make it look like it works (which is extremely common in software, LLM or not).
What did Opus do when you told it that it shouldn't have done that?
Sure, I get an occasional bad result from Opus - then I revert and try again, or ask it for a fix. Even with a couple of restarts, it's going to be faster than me on average. (And that's ignoring the situations where I have to restart myself)
Basically, you're saying it's not perfect. I don't think anyone is claiming otherwise.