It doesn't have to be reliable! It just has to flag things: "hey these graphs look like they were generated using (formula)" or "these graphs do not seem to represent realistic values / real world entrophy" - it just has to be a tool that stops very advanced fraud from slipping through when it already passed human peer review.
The only reason why this is helpful is because humans have natural biases and/or inverse of AI biases which allow them to find patterns that might just be the same graph being scaled up 5 to 10 times.
I hope I'm wrong but I haven't seen anything like this in practice. I would imagine we have the same problem as before where we could use it as an extra filter but the amount of shit that comes out makes the process not actually any more accurate, just faster.
Having seen from close-up how these reviews go, I get why people use tools like this unfortunately. it doesn't make me very hopeful for the near future of reviewing.