My biggest issue with Devstral and even their biggest model is that they’re dangerous unless closely directed and reviewed and i mean CLOSELY. Unfortunately mistral models will believe and do anything.
See: https://petergpt.github.io/bullshit-benchmark/viewer/index.v...
See some of the test results, it’s horrifying
FWIW personally i prefer this. When i tried Qwen3.6 and asked it a few questions, while it did respond, it was ADAMANT i should do something else when i really wanted an answer to the question i made. It felt like when you search something and a stackoverflow answer about what you search for comes up and the most upvoted answer is about using/doing something else - when you want a specific answer to that specific question, not something else.
Meanwhile Devstral Small 2 just answers the damn question.
I don't want to have to convince my computer to do what i want it to do, i want from it to do what i ask it to.