>You should check out "model collapse". It seems that an abundance of content, that is more and more AI generated these days, may not be a viable option.
Doom-saying about "model collapse" is kind of funny when OpenAI and Anthropic are mad at Chinese model makers for "distilling" their models, ie. using their outputs to train their own models.
Isn't there a difference between: distilling specific AI input/output vs scraping whatever random AI output (with unknown input)?
Totally different use cases. If you have nothing, getting 90% of a SOTA model is very valuable. If you have a SOTA model, it's just a worse model.