logoalt Hacker News

0xbadcafebeetoday at 12:54 AM7 repliesview on HN

There's two basic kinds of distillation: 1) the massive [and dumb] method where you ask a question and use the answer as reinforcement (Black Box), and 2) more targeted distillation where you use one model to directly inform/train/guide another model (RLAIF).

The latter is basically fine-tuning the model with direction from another model. Thousands of businesses do this every day to fine-tune. This is almost certainly what the Chinese labs are doing, since it has a much better effect on the end result than just getting simple answers to simple questions.

These complaints of distillation are inflating the problem to make it sound worse than it is, because they want the USG to block/ban Chinese model providers as protectionism. They have already called for more export controls on chips (which is funny because DeepSeek v4 was designed to run on Huawei chips and now the other Chinese providers are following suit). But they can't come right out and say that, so their claim is that they're asking for more export controls because distilled models might not be as safe as their own. But if you show them a jailbreak of their model that bypasses their safety, they'll tell you that any model can eventually be jailbroken so don't worry about safety.


Replies

anon373839today at 5:42 AM

> These complaints of distillation are inflating the problem to make it sound worse than it is

Unfortunately, the Reuters piece itself is complicit in this dramatization. The lede paragraph parrots Anthropic's talking point that distillation is an "attack", without using quotes that would alert the reader that this framing is a corporate talking point. Distillation is NOT an attack.

show 1 reply
lemaxtoday at 5:48 AM

I've used RLAIF to build out heuristic based non-LLM models for various decision systems and achieved like, 95% F1 on certain projects. We're in a place where models can be used to fine tune a lot of stuff via loops.

gmerctoday at 4:57 AM

https://research.nvidia.com/labs/lpr/slm-agents/ - Distillation data is a natural byproduct of using these models. There's no effective defence against it. Anthropic is degrading thinking blocks to summaries to slow it down and hide model internals, but in the end, the math says you're SOL and it works on MNC/Large Corporate scale well enough that the moment cost becomes a priority, you're left without the lock in you need to keep customers paying.

dannywtoday at 3:31 AM

If you’re doing evals, you’re basically doing RLAIF without training a model; just looking at the results.

Fundamentally it is very difficult to stop this while still making your AI models useful.

janalsncmtoday at 4:37 AM

Yeah I think the technical term is something more like “pseudo-labeling”. The OG distillation requires logits which Anthropic doesn’t provide.

fnord77today at 5:12 AM

Can you reach into the model and "transplant" weights directly?

show 3 replies
mannanjtoday at 4:43 AM

>But if you show them a jailbreak of their model that bypasses their safety, they'll tell you that any model can eventually be jailbroken so don't worry about safety.

Yes this is in line with what Anthropic said in their public statements about their Fable access restriction by the government directive. The hypocrisy and inconsistency in their statements and behavior feels quite childish and controlling. I believe our companies and their leaders, friends among our other influential leaders and leaders from rich social classes, want to actively hurt most people as this behavior looks to be quite self-interested.

show 1 reply