That’s misunderstanding why these models are behind. A large part of why they’re behind is they aren...

sulam • yesterday at 6:43 PM • 4 replies • view on HN

That’s misunderstanding why these models are behind. A large part of why they’re behind is they aren’t able to do the reinforcement learning post-training steps that takes a pre-trained model and turns it into a frontier model like GPT 5 or Opus. Instead they do their best to recreate these models using distillation.

Fundamentally, you can never distill your way to being the teacher, so these approaches will not advance the frontier.

[edit, after thinking about it I think my phrasing is unfair. It's not necessarily that aren't able to do it, but they haven't yet shown that they are willing to do it.]

Replies

computerex • yesterday at 7:04 PM

That’s not remotely true. They did distillation as a cheap solution to the cold start problem. You need data/trajectories to hill climb to higher capabilities. All large Chinese labs do RLAIF.

➕ show 1 reply

TurdF3rguson • today at 2:05 AM

I think GLM 5.2 having a higher ELO than Opus 4.8 shows that they did it [https://gptbased.com]

DANmode • yesterday at 9:12 PM

> you can never distill your way to being the teacher

Are you sure?

What if you distill from 10 teachers?

➕ show 1 reply

FpUser • yesterday at 7:04 PM

>"they aren’t able to do the reinforcement learning post-training steps"

Not yet.

If there is a need someone will come and fulfill. Personally for me now I do not even want to use top models. Professionally I use AI to help with the coding using Junie agent that comes with IDEs from JetBrains. Junie is told to use Gemini Flash and works fine for what I ("I" being an emphasis here) ask it to do. I tried more advanced models and different vendors only to discover credits going down the toilet without any extra benefit.

➕ show 1 reply

alt Hacker News

Replies