The question is not whether it has happened or will continue to happen. Of course it will always be a problem to some extent.
Your original claim is that this will be enough of a problem to prevent models from improving in expert level knowledge. I completely disagree with this premise.
If the models fail to improve, it will likely be due to limitations in the transformer architecture rather than poisoned training data.
And even then, I doubt that the transformer is the best architecture we will ever come up with.
Clearly it doesn’t learn or think like a human does, since humans don’t need many gigabytes of text samples to learn to talk, so there is some room for improvement.
The question is not whether it has happened or will continue to happen. Of course it will always be a problem to some extent.
Your original claim is that this will be enough of a problem to prevent models from improving in expert level knowledge. I completely disagree with this premise.
If the models fail to improve, it will likely be due to limitations in the transformer architecture rather than poisoned training data.
And even then, I doubt that the transformer is the best architecture we will ever come up with.
Clearly it doesn’t learn or think like a human does, since humans don’t need many gigabytes of text samples to learn to talk, so there is some room for improvement.