We have human reviews on every PR.
Quality and consistency are going up, not down. Partially because the agents follow the guidance much more closely than humans do and there is far less variance. Shortcuts that a human would make ("I'll just write a one-off here"), the agent does not...so long as our rules guide it properly ("Let me find existing patterns in the codebase.").
Part of it is the investment in docs we've made. Part of it is that we were already meticulous about commenting code. It turns out that when the agents stumble on this code randomly, it can read the comments (we can tell because it also updates them in PRs when it makes changes).
We are also delivering the bulk of our team level capabilities via remote MCP over HTTP so we have centralized telemetry via OTEL on tool activation, docs being read by the agents, phantom docs the agent tries to find (we then go and fill in those docs).
> Partially because the agents follow the guidance much more closely than humans do and there is far less variance.
Ouch. Managing human coders has been described as herding cats (with some justice). Getting humans to follow standards is... challenging. And exhausting.
Getting AIs to do so... if you get the rules right, and if the tool doesn't ignore the rules, then you should be good. And if you're not, you still have human reviews. And the AI doesn't get offended if you reject the PR because it didn't follow the rules.
This is actually one of the best arguments for AIs that I have seen.
> We have human reviews on every PR.
There are some studies about maintaining attention over longer periods of time when there is no action required. It will be difficult to keep that up forever so beware of review fatigue and bake in some measures to ensure that attention does not diminish over time.