Same experience here! As an analogy, consider the model knows both about arabic or roman number representations. But in alternate universe, it has been trained so much on roman numbers ("Bad Code") that it won't give you the arabic ones ("Good Code") unless you prompt it directly, even when they are clearly superior.
I also believe that overall repository code quality is important for AI agents - the more "beautiful" it is, the more the agent can mimic the "beauty".