I don't really understand why model collapse would happen.
I understand that if I have an AI model and then feed it its own responses it will degrade in performance. But that's not what's happening in the wild though - there are extra filtering steps in-between. Users upvote and downvote posts, people post the "best" AI generated content (that they prefer), the more human sounding AI gets more engagement etc. All of these things filter AI output, so it's not the same thing as:
AI out -> AI in
It is:
AI out -> human filter -> AI in
And at that point the human filter starts acting like a fitness function for a genetic algorithm. Can anyone explain how this still leads to model collapse? Does the signal in the synthetic data just overpower the human filter?
> Users upvote and downvote posts, people post the "best" AI generated content (that they prefer), the more human sounding AI gets more engagement etc. All of these things filter AI output
At the same time though AI generated content can be generated much much faster than human generated content so eventually AI slop downs out anything else. You only have to check the popular social media platforms to see this in action and AI generated posts are widely promoted and pushed on users the same way most web searches return results with AI generated pages ranked highly.
Humans can't keep up and companies are actively working to bypass the human filter and intentionally promote AI generated content.