Based on quite a few comments recently, it also looks like many have tried LLMs in the past, but haven't seriously revisited either the modern or more expensive models. And I get it. Not everyone wants to keep up to date every month, or burn cash on experiments. But at the same time, people seem to have opinions formed in 2024. (Especially if they talk about just hallucinations and broken code - tell the agent to search for docs and fix stuff) I'd really like to give them Opus 4.5 as an agent to refresh their views. There's lots to complain about, but the world has moved on significantly.
Just last week Opus 4.5 decided that the way to fix a test was to change the code so that everything else but the test broke.
When people say ”fix stuff” I always wonder if it actually means fix, or just make it look like it works (which is extremely common in software, LLM or not).
This has been the argument since day one. You just have to try the latest model, that's where you went wrong. For the record I use Claude Code quite a bit and I can't see much meaningful improvements from the last few models. It is a useful tool but it's shortcomings are very obvious.