It may not be mainly or solely due to LLM pollution, but rather the fact that every publisher, (social) media company, newspaper, etc. clammed up and started charging (licensing) fees sometime in the last couple of years.
So maybe there's just not much openly available and new content worth training on that wasn't available prior to 2025.