I’ve found recent Claude to be much better in this regard. I think a lot rests on the quality of the harness and the work behind the scenes done to RAG up to date docs or search for docs proactively rather than guessing.
I also don’t have issues with quality of Python generated. It takes a bit of nudging to use list comps and generators rather than imperative forms but it tends to mimic code already in context. So if the codebase is ok, it does do better.