It's interesting to me how ineffective LLMs are at refactoring, but when you think closely about how they work, it makes sense.
They are good at searching for things that have been done 10,000 times before, and slightly changing them. This is the majority of all "new" features.
Almost nothing is "new"...
Refactors are not this. If you can't just write a gsub to do the work, they need to essentially break it up into N problems to solve, each of them pretty slow and expensive. Sure, none of these problems individually are "new" - which is why they can do it. But they can't do it as effectively as you'd think.
Exactly my experience. I always refactor first myself then delegate boring tasks to AI. It saves me energy, time and also tokens. If code is not prepared for easy implementation agents always fail.