logoalt Hacker News

syntaxingyesterday at 10:48 PM0 repliesview on HN

Devstral is purely nonthinking too it’s very possible it uses less models (I don’t know how DS 3.2 nonthinking compares). It’s interesting because Qwen pretty much proved hybrid models work worse than fully separate models.