> I get the feeling that it was trained very differently from the other models It's...

3abiton • last Sunday at 4:47 PM • 2 replies • view on HN

> I get the feeling that it was trained very differently from the other models

It's actually based on a deepseek architecture just bigger size experts if I recall correctly.

Replies

As far as I'm aware, they all are. There are only five important foundation models in play -- Gemini, GPT, X.ai, Claude, and Deepseek. (edit: forgot Claude)

Everything from China is downstream of Deepseek, which some have argued is basically a protege of ChatGPT.

➕ show 1 reply

krackers • last Sunday at 8:00 PM

It was notably trained with Muon optimizer for what it's worth, but I don't know how much can be attributed to that alone

alt Hacker News

Replies