Well, my experts disagree with your experts :). Sure, the supply of available fresh data is running out, but at the same time, there's way more data than needed. Most of it is low-quality noise anyway. New models aren't just old models with more tooling - the entire training pipeline has been evolving, as researchers and model vendors focus on making better use of data they have, and refining training datasets themselves.
There are more stages to LLM training than just the pre-training stage :).