Red-Green TDD is one of the main "agent patterns" Simon proposes, so it seemed relevant.
Also, the same thing applies to feedback loops with compilers and linters as well: they provide objective feedback that then the AI goes and fixes, verifiably resolving the feedback.
Even with less verifiable things like using specifications, the fact that it relies on less objective grounding metrics doesn't mean there's no change in the model's behavior. I'm sure if you looked at the code that a model produced and the amount of intervention necessary to get there for a model that was asked to produce something without a specification versus with one, you would definitely see an objective difference on average. We're already getting objective studies regarding AGENTS.MD