I try GitHub Copilot every once in a while, and just last month it still managed to produce diffs with unbalanced curly braces, or tried to insert (what should be) a top-level function into the middle of another function and screw up everything. This wasn’t on a free model like GPT 4.1 or 5-mini, IIRC it was 5.2 Codex. What the actual fuck? Only explanation I can come up with is that their pay-per-request model made GHC really stingy with using tokens for context, even when you explicitly ask it to read certain files it ends up grepping and adding a couple lines.
I had my first go at using it (Github Copilot) last week, for a simple refactoring task. I'd have to say I reasonably specified it, yet it still managed to to fail to delete a closing brace when it removed the opening block as specified.
That was using the Claude Sonnet 4.5 model, I wonder if using the Opus 4.5 model would have managed to avoid that.
You're not using the good models and then blaming the tool? Just use claude models.
Copilot's main problem seems to be people don't know how to use it. They need to delete all their plugins except the vscode, CLI ones, and disable all models except anthropic ones.
The Claude Code reputation diff is greatly exaggerated beyond that.