logoalt Hacker News

littlestymaarlast Monday at 7:07 AM2 repliesview on HN

> careful layering of well-understood optimizations—RoPE, SwiGLU, GQA, MoE

They basically cloned Qwen3 on that, before adding the few tweaks you mention afterwards.


Replies

Voloskayalast Monday at 8:19 AM

You seem to be conflating when you first heard about those techniques and when they first appeared. None of those techniques were first seen in Qwen, nor this specific combination of techniques.

NitpickLawyerlast Monday at 7:38 AM

> They basically cloned Qwen3 on that

Oh, come on! GPT4 was rumoured to be an MoE well before Qwen even started releasing models. oAI didn't have to "clone" anything.

show 1 reply