I'm a little more optimistic than that. I suspect that the open-weight models we already have are going to be enough to support incremental development of new ones, using reasonably-accessible levels of compute.
The idea that every new foundation model needs to be pretrained from scratch, using warehouses of GPUs to crunch the same 50 terabytes of data from the same original dumps of Common Crawl and various Russian pirate sites, is hard to justify on an intuitive basis. I think the hard work has already been done. We just don't know how to leverage it properly yet.
I do not think it's common crawl anymore, its common crawl++ using paid human experts to generate and verify new content, weather its code or research.
I believe US is building this off the cost difference from other countries using companies like scale, outlier etc, while china has the internal population to do this
Change layer size and you have to retrain. Change number of layers and you have to retrain. Change tokenization and you have to retrain.