> The science behind these models are being worked on IN PUBLIC. The research is not secret. The implementations will all catch up.
Only to a limited extent - the US companies stopped sharing research a long time ago, other than Anthropic's interpretability research (which also seems to have dried up?). Interestingly most of the sharing is now coming from the Chinese side, largely DeepSeek. Ziphu/Z.ai (GLM) is also partner in the Slime RL training framework.
I wouldn't call much, if any, of this "science" - it's all empiricalism. Throw spaghetti at the wall and see what sticks. There's a famous quote from Noam Shazeer:
"We offer no explanation as to why these architectures seem to work; we attribute their success, as all else, to divine benevolence"
https://arxiv.org/abs/2002.05202v1
Jakob Uszkoreit has also talked about the empiricalism that it took to make what would become the Transformer, and any complex neural network architecture work.
While OpenAI and Anthropic have not provided any useful information for a long time, there still are some research publications from a few US companies, e.g. NVIDIA about its Nemotron models, or Google and IBM about their small LLMs.