Huh, according to that model card this is a 137B total parameter model. Performance doesn't s...

camelmel • yesterday at 7:40 PM • 4 replies • view on HN

Huh, according to that model card this is a 137B total parameter model.

Performance doesn't seem that good:

- MAI-Code-1-Flash (137B-A5B) = 51% on SWE-bench pro

- Qwen3.6-35B-A3B = 49.5% on SWE-bench pro (https://huggingface.co/Qwen/Qwen3.6-35B-A3B)

They benchmark against Claude Haiku but Haiku is not good, it's worse than tiny open models you can run locally or via API at 10% the cost.

Replies

giancarlostoro • yesterday at 7:49 PM

The take away is that this model is a smaller model that competes with Haiku, I would hope they come out with a "Sonnet" competing model, then Opus. I have been wondering why Microsoft is kind of "sleeping" on offering models they themselves have made on Copilot, maybe it was part of their deal with OpenAI? Not sure.

➕ show 2 replies

kristjansson • yesterday at 8:15 PM

> 137B-A5B

Yeah, not a 5B param model as the earlier title implied!

wetpaws • yesterday at 7:54 PM

[dead]

alt Hacker News

Replies