logoalt Hacker News

3abitonlast Sunday at 4:47 PM2 repliesview on HN

> I get the feeling that it was trained very differently from the other models

It's actually based on a deepseek architecture just bigger size experts if I recall correctly.


Replies

CamperBob2last Sunday at 5:34 PM

As far as I'm aware, they all are. There are only five important foundation models in play -- Gemini, GPT, X.ai, Claude, and Deepseek. (edit: forgot Claude)

Everything from China is downstream of Deepseek, which some have argued is basically a protege of ChatGPT.

show 1 reply
krackerslast Sunday at 8:00 PM

It was notably trained with Muon optimizer for what it's worth, but I don't know how much can be attributed to that alone