The Chinese models will not overtake the frontier US ones given the current way things are going. Th...

christina97 • yesterday at 10:49 PM • 9 replies • view on HN

The Chinese models will not overtake the frontier US ones given the current way things are going. The US models derive their lead from incredible efforts to source more and higher quality (mostly synthetic data) via great feats (eg generating with humongous teacher models that could never feasibly serve interactive traffic). The Chinese models advance via heroic efforts to optimize models and great feats to secure more and higher quality training data from the US frontier models.

For an (Chinese) open weight model to surpass the (US lab) frontier models, this equation must flip and the Chinese labs must entirely retool from harvesting frontier model data to producing the data systems and efforts to produce novel data; as well as procuring latest generation hardware en masse for this. This does not happen easily. Also training a frontier scale model is actually not such an unimaginable feat: doing all the inference with the teacher models is where the hardware goes.

Replies

throwawayffffas • yesterday at 11:30 PM

Unless you are working at one of these companies you don't know what they are doing.

You don't know what's happening in z.ai nor alibaba. And you don't know what's happening in anthropic and open ai.

I don't know what they are all doing, but I find it extremely unlikely that they are not all collecting data from one another. I am confident anthropic has a team going over GML 5.2 weights even if it's just to see where the competition is.

Just because some labs are getting data from Anthropic does not mean they are not also doing their own research.

They were focused on optimization because they could not get the best hardware.The only reason their top labs are behind may be because they did not have h200s and MI350s. And now they do.

Plus you are discounting other risks, Anthropic is currently sitting on "the best" models in the world because they got in a pissing match with the US administration.

btw: This could be the case in china as well, their administration has been surprisingly open on AI exports and open weight models, that we know of. There is a very small but not trivial chance they are hogging a better version of glm 5.2 for example, but no one is allowed to talk about it. Now I am not saying that is the case, I am saying the two cases (chinese labs are 6 months behind, they are forced to suppress their best models) are indistinguishable.

andy99 • yesterday at 11:01 PM

> Chinese labs must entirely retool from harvesting frontier model data to producing the data systems and efforts to produce novel data

Even if your characterization is accurate, they could do this tomorrow and are not so myopic that they wouldn’t have thought about it. I don’t see this as a barrier, and I see a lot of the same underestimation of Asia that’s been happening for 50 years. There’s not some innate American advantage to building LLMs, and personally I think whatever head start the US has is going to be squandered on delays from the export control “to dangerous for release” LARPing we’re seeing.

➕ show 2 replies

yorwba • yesterday at 11:57 PM

The amount of data Anthropic has claimed was extracted for distillation is tiny in comparison to the entire internet, which is right there for the taking and holds most of the knowledge people expect models to have.

Distilling even with small amounts of data from a better model is still helpful, but not in the sense of transferring capabilities the raw internet-trained model doesn't have at all, but for identifying those capabilities that are compatible with the servile assistant persona and suppressing others that are undesirable (e.g. trolling). A primitive version of this were instruction-tuning datasets generated with ChatGPT, as used e.g. for Alpaca.

Without a clear target to emulate, competitors might have to rely more on human raters, but there are plenty of data labeling companies in China, so that's hardly a hurdle.

bradishungry • yesterday at 11:27 PM

“China can only copy the US” is a very short sighted and uninformed opinion. there is more coming out of china than just new ways to distill models

CuriouslyC • yesterday at 11:13 PM

Coding a case where it's possible to programmatically generate large amounts of data relatively cheaply. China could realistically surpass the US in coding while still being behind in many other areas.

➕ show 1 reply

kulahan • yesterday at 11:20 PM

How so? You'll soon have your choice of a very old OAI model or a new Chinese model, because the USG has no interest in letting you access the newest models without explicit permission.

➕ show 2 replies

danny_codes • today at 1:56 AM

This seems wildly naive. This entire field is like 4 years old. We have quite frankly no idea about what things will look like in 4 more years.

➕ show 1 reply

elisbce • yesterday at 11:05 PM

Chinese frontier models don't need to catch up in every category. They just need to win in coding and that's exactly where they are going. The gap went from 12+ months to 1-2 months with the latest release of GLM 5.2 and coding is a task that you don't need heroic efforts to find rare and long-tail training data, you can just outsmart your competitor by optimizing algorithms and training recipes. This is something they can do at scale with the money and talent pool.

➕ show 1 reply

jmyeet • yesterday at 11:31 PM

Yeah, this is, to be perfectly blunt, cope, for several reasons:

1. It's unclear if there is a law of diminishing returns with ever-larger models. They're more expensive to run and for many applications, you'll probably find smaller models are sufficient;

2. There's an inbuilt market for local LLMs. This is an effective limit on how large models can get. Case law hasn't been established yet on, for example, if a law firm using ChatGPT breaks privilege. Specifically, chat logs may be discoverable. Medical applications have this issue too and I think you'll find that financial firms are going to be leery about this as well;

3. Better, larger models will bleed into smaller, open source models. The chat logs themselves are training data. There's a whole market in China for Claude tokens around this;

4. China has a national security interest in not being beholden to US tech giants when it comes to AI. China has a history of being able to commit to large-scale long-term projects and Anthropic just won't be able to compete with a national project by one of the world's superpowers, if it comes down to it;

5. Winning doesn't necessarily mean being the best. Often it's just being good enough;

6. As an example of a national project, China is busy replicating EUV because of the US ban on ASML and NVidia exporting their best stuff. I don't think many in the West are prepared for how rapid this will be. I'm reminded of the policy debate in 1945 when many in American policy and militarey circles thought the USSR would never catch up with atomic bomb or, if they did, it would take 20+ years. It took 4 years. For the hydrogen bomb, it took 1. The US hardware advantage is a lot more tenuous than many realize.

alt Hacker News

Replies