logoalt Hacker News

camelmelyesterday at 7:40 PM4 repliesview on HN

Huh, according to that model card this is a 137B total parameter model.

Performance doesn't seem that good:

- MAI-Code-1-Flash (137B-A5B) = 51% on SWE-bench pro

- Qwen3.6-35B-A3B = 49.5% on SWE-bench pro (https://huggingface.co/Qwen/Qwen3.6-35B-A3B)

They benchmark against Claude Haiku but Haiku is not good, it's worse than tiny open models you can run locally or via API at 10% the cost.


Replies

giancarlostoroyesterday at 7:49 PM

The take away is that this model is a smaller model that competes with Haiku, I would hope they come out with a "Sonnet" competing model, then Opus. I have been wondering why Microsoft is kind of "sleeping" on offering models they themselves have made on Copilot, maybe it was part of their deal with OpenAI? Not sure.

show 2 replies
kristjanssonyesterday at 8:15 PM

> 137B-A5B

Yeah, not a 5B param model as the earlier title implied!

wetpawsyesterday at 7:54 PM

[dead]