You can easily do this with normal GPT 5.2 in ChatGPT, just turn on thinking (better if extended) and web search, point a Wikipedia page to the model and tell it to check the claims for errors. I've tried it before and surprisingly it finds errors very often, sometimes small, sometimes medium. The less popular the page you linked is, the more likely it'll have errors.
This works because GPT 5.x actually properly use web search.
Can you describe some of the errors you have found this way?
I am sure that could be useful with proper post-request research.
As a technique though, never ask an LLM to find errors. Ask it to either find errors or verify that there are no errors. That way it can answer without hallucinating more easily.
Have you verified those errors?