> which is only going to get worse as it "learns" from garbage outputted by other LLMs moving forward
You seem to assume that autoregressive pretraining (and unfiltered behavior cloning, maybe) are the only ways to improve LLM performance.