Well, yes ideally we would eventually also have metrics about error rates and reject rates. Like ideally at some point someone could do a study of "for every 100 PRs Gas Town generates, how many are accepted after code review and how many are rejected" or "for every 100 lines of code Gas Town generates, how many coding errors are detected by human reviewers".
Unfortunately I think things are moving so fast that by the time such a study was done, we would already be on to newer models and newer versions of gas town.