The benchmarks are a bit of a disaster? It's at about DeepSeek V3.2 level, but with about 50% more parameters. Loses handily to the also smaller GLM-5.1, and even worse to the similarly sized Kimi K2.6.
They claim to not be training to the benchmarks at all. It'll be interesting to see how it stacks up in actual use.
No distillation. Comparing it to DeepSeek or GLM doesn't make much sense.
Yes and no. Yes from a user PoV, I don't really see a great reason to use this other than for enterprises that care about using a model not trained on copyrighted data (not sure what the market really is for this anymore, feels like this concern has been forgotten by most customers).
From a strategic PoV for MS, all the models you cited are distilling GPT/Claude/Gemini and wouldn't be anywhere as good as they are without this distillation, which in turn means you are dependent on OAI/Anthropic/G first shipping a good model to generate data for your training. This MAI model is trained from scratch with no synthetic data or distillation. So in term of benchmark its obviously much harder to get strong score and thus not a disaster if they can keep on improving.