logoalt Hacker News

HarHarVeryFunnytoday at 2:36 PM1 replyview on HN

> The science behind these models are being worked on IN PUBLIC. The research is not secret. The implementations will all catch up.

Only to a limited extent - the US companies stopped sharing research a long time ago, other than Anthropic's interpretability research (which also seems to have dried up?). Interestingly most of the sharing is now coming from the Chinese side, largely DeepSeek. Ziphu/Z.ai (GLM) is also partner in the Slime RL training framework.

I wouldn't call much, if any, of this "science" - it's all empiricalism. Throw spaghetti at the wall and see what sticks. There's a famous quote from Noam Shazeer:

"We offer no explanation as to why these architectures seem to work; we attribute their success, as all else, to divine benevolence"

https://arxiv.org/abs/2002.05202v1

Jakob Uszkoreit has also talked about the empiricalism that it took to make what would become the Transformer, and any complex neural network architecture work.


Replies

adrian_btoday at 6:38 PM

While OpenAI and Anthropic have not provided any useful information for a long time, there still are some research publications from a few US companies, e.g. NVIDIA about its Nemotron models, or Google and IBM about their small LLMs.