Tests can't tell you if the design of the code is fit for purpose, or about requirements you completely missed or punted on, or that a core new piece that's going to be built upon next is barely-coherent, poorly-performing slop that "works" but is going to need to be actually designed while being rewritten by the next person instead, or that you skipped trying to understand how the feature should work or thinking about the performance characteristics of the solution before you started and just let the LLM drive, so you never designed anything, arriving at something which "works" on your machine and passes the tests which were generated for it, but will hammer production under production loads. Neither will running it on your own machine or in Dev.
No amount of telling the LLM to "Dig up! Make no mistakes!" will help with non-designed slop code actively poisoning the context, but you have to admire the attempt when you see comments added while removing code, referring to the code that's being removed.
It's weird to see tickets now effectively go from "ready for PR" to 0% progress, but at least you're helping that person meet whatever the secret AI* usage quota is for their performance review this year.