I feel like this is such a tragedy of the commons for the LLM providers. Wikipedia probably makes up a huge bulk of their dataset, why taint it? Would be interesting if there was some kind of "you shall not use our platform on Wikipedia" stance adopted.
I don’t think it’s the providers doing this, it’s the awful users. They’re doing the same thing on GitHub. It’s maddening.
It would be random individuals.
Wikipedia having incorrect citations is way older than LLMs. As many other people have pointed out in this thread, if you start pulling strings a lot of what people write starts falling apart.
Its not even unique to Wikipedia. Its really not difficult to find very misleading statements cited through a citation that doesn't even support the claim when you check the original.