logoalt Hacker News

NitpickLawyertoday at 4:57 PM0 repliesview on HN

> the distil pipeline

I don't have inside info, but everything we've seen about gemini3.0 makes me think they aren't doing distillation for their models. They are likely training different arch/sizes in parallel. Gemini 3.0-flash was better than 3.0-pro on a bunch of tasks. That shouldn't happen with distillation. So my guess is that they are working in parallel, on different arches, and try out stuff on -flash first (since they're smaller and faster to train) and then apply the learnings to -pro training runs. (same thing kinda happened with 2.5-flash that got better upgrades than 2.5-pro at various points last year). Ofc I might be wrong, but that's my guess right now.