Chasing test-passing code is basically an invitation for models to learn all sorts of ugly workarounds and accidental patterns that humans would never tolerate for long. If you optimize only for "does it make CI go green" you'll eventually get code that's impossible to reason about and a codebase that accumulates landmines but the metrics sure look pretty for a while.