We had the same issue until we created a review skill that we run after a LLM is done implementing a feature. We give it a list of things to check that is based on the problems we have observed previously, like writing too verbose code, and ask it to report on issues and suggest improvements. The developer can then give feedback and let the LLM fix the issues, or just address them manually. It’s still early but I’ve been much happier now with the results. It makes it much easier as well for humans to review since there’s a report about what the change is about, why, things to keep an eye on etc. This is something you can do with any harness you may be using and there’s nothing to buy, just a suggestion from someone trying to make the best use of this insane technology.