Yeah, Pangram does not provide any concrete proof, but it confirms many people's suspicions about their reviews. But it does flag reviews for a human to take a closer look and see if the review is flawed, low-effort, or contains major hallucinations.
> does not provide any concrete proof, but it confirms many people's suspicions
Without proof there is no confirmation.
Was there an analysis of flawed, low-effort reviews in similar conferences before generative AI models?
From what I remember, (long before generative AI) you would still occasionally get very crappy reviews (as author). When I participated (couple of times) to review committees, when there was a high variance between reviews the crappy reviews were rather easy to spot and eliminate.
Now it's not bad to detect crappy (or AI) reviews, but I wonder if it would change much the end result compared to other potential interventions.