Well, no: one of the first things it says is reviewers were blind to human vs. ai.
The comment you're replying to is talking about a hypothetical scenario.
In any case, the blinding didn't stop Reviewer #2 from calling out obvious AI slop. (Figure 5)
They might have tried, but this would be pretty hard to achieve for real - especially for the older/worse models. For changes that do more than alter a couple of lines llm output can be very obvious. Stripping all comments from the changeset might go a long way to making it more blind, but then you're missing context that you kinda need to review the code properly.