The benchmarks are a bit of a disaster? It's at about DeepSeek V3.2 level, but with about 50% m...

jampekka • yesterday at 9:39 PM • 3 replies • view on HN

The benchmarks are a bit of a disaster? It's at about DeepSeek V3.2 level, but with about 50% more parameters. Loses handily to the also smaller GLM-5.1, and even worse to the similarly sized Kimi K2.6.

Replies

sailingparrot • yesterday at 9:50 PM

Yes and no. Yes from a user PoV, I don't really see a great reason to use this other than for enterprises that care about using a model not trained on copyrighted data (not sure what the market really is for this anymore, feels like this concern has been forgotten by most customers).

From a strategic PoV for MS, all the models you cited are distilling GPT/Claude/Gemini and wouldn't be anywhere as good as they are without this distillation, which in turn means you are dependent on OAI/Anthropic/G first shipping a good model to generate data for your training. This MAI model is trained from scratch with no synthetic data or distillation. So in term of benchmark its obviously much harder to get strong score and thus not a disaster if they can keep on improving.

usef- • yesterday at 9:54 PM

They claim to not be training to the benchmarks at all. It'll be interesting to see how it stacks up in actual use.

nojito • yesterday at 11:10 PM

No distillation. Comparing it to DeepSeek or GLM doesn't make much sense.

alt Hacker News

Replies