Have you considered the opposite? Reflected on your own biases?
I’m listening to my own experience. Just today I gave it another fair shot. GitHub Copilot agent mode with GPT-4.1. Still unimpressed.
This is a really insightful look at why people perceive the usefulness of these models differently. It is fair to both sides without being dismissive as one side just not “getting it” or how we should be “so far” past debate:
https://ferd.ca/the-gap-through-which-we-praise-the-machine....
Do either of these impress you?
https://alexgaynor.net/2025/jun/20/serialize-some-der/ - using Claude Code to compose and have a PR accepted into llvm that implements a compiler optimization (more of my notes here: https://simonwillison.net/2025/Jun/30/llvm/ )
https://lucumr.pocoo.org/2025/6/21/my-first-ai-library/ - Claude Code for writing and shipping a full open source library that handles sloppy (hah) invalid XML
Examples from the past two weeks, both from expert software engineers.