Claude Opus 4.6 regularly makes up shit and hallucinates. I'm not a detractor by any means but "exceptionally rare" is fantasyland.
Can vouch for this, plus, when it does work, stuff can take forever. Then, if I let it unsupervised, higher risk of doing the wrong thing. If I supervise it, then I become agent nanny.
I have been experiencing it too.
I honestly am finding Codex considerably better, as much as I despise OpenAI.
Can vouch for this, plus, when it does work, stuff can take forever. Then, if I let it unsupervised, higher risk of doing the wrong thing. If I supervise it, then I become agent nanny.