> Given that devstral is much smaller, I can not imagine it will be more expensive Devstral 2 i...

NitpickLawyer • yesterday at 8:45 PM • 1 reply • view on HN

> Given that devstral is much smaller, I can not imagine it will be more expensive

Devstral 2 is 123B dense. Deepseek is 37B Active. It will be slower and more expensive to run inference on this than dsv3. Especially considering that dsv3.2 has some goodies that make inference at higher context be more effective than their previous gen.

Replies

syntaxing • yesterday at 10:48 PM

Devstral is purely nonthinking too it’s very possible it uses less models (I don’t know how DS 3.2 nonthinking compares). It’s interesting because Qwen pretty much proved hybrid models work worse than fully separate models.

alt Hacker News

Replies