I think it's nonsensical to insist that it would only be a subjective improvement. The tests either exist and ensure that there aren't bugs in certain areas, or they don't. The agent is either in a feedback loop with those tests and continues to work until it has satisfied them or it doesn't.
That sounds like a very specific implementation strategy related to TDD