logoalt Hacker News

sulamyesterday at 6:43 PM4 repliesview on HN

That’s misunderstanding why these models are behind. A large part of why they’re behind is they aren’t able to do the reinforcement learning post-training steps that takes a pre-trained model and turns it into a frontier model like GPT 5 or Opus. Instead they do their best to recreate these models using distillation.

Fundamentally, you can never distill your way to being the teacher, so these approaches will not advance the frontier.

[edit, after thinking about it I think my phrasing is unfair. It's not necessarily that aren't able to do it, but they haven't yet shown that they are willing to do it.]


Replies

computerexyesterday at 7:04 PM

That’s not remotely true. They did distillation as a cheap solution to the cold start problem. You need data/trajectories to hill climb to higher capabilities. All large Chinese labs do RLAIF.

show 1 reply
TurdF3rgusontoday at 2:05 AM

I think GLM 5.2 having a higher ELO than Opus 4.8 shows that they did it [https://gptbased.com]

DANmodeyesterday at 9:12 PM

> you can never distill your way to being the teacher

Are you sure?

What if you distill from 10 teachers?

show 1 reply
FpUseryesterday at 7:04 PM

>"they aren’t able to do the reinforcement learning post-training steps"

Not yet.

If there is a need someone will come and fulfill. Personally for me now I do not even want to use top models. Professionally I use AI to help with the coding using Junie agent that comes with IDEs from JetBrains. Junie is told to use Gemini Flash and works fine for what I ("I" being an emphasis here) ask it to do. I tried more advanced models and different vendors only to discover credits going down the toilet without any extra benefit.

show 1 reply