As always, rooting for these guys — model and national diversity is great. This looks like a solid foundation to build on; hopefully the 3.6/3.7 will dial in more gains. It looks like maybe from the computer use benchmarks that their vision pipeline could use improvement, but that’s just speculation.
The different results on some benchmarks vibes as if this is truly an independently trained model, not just exfiltrated frontier logs, which I think is also really important - having different weight architectures inside a particular model seems like a benefit on its own when viewed from a global systems architecture perspective.
The problem with this model is that DeepSeek v4 Flash runs quite well quantized to 2 bit (see https://github.com/antirez/llama.cpp-deepseek-v4-flash), at 30 t/s generation and 400 t/s prefill in a M3 Ultra (and not too much slower on a 128GB MacBook Pro M3 Max). It works as a good coding agent with opencode/pi, tool calling is very reliable, and so forth. All this at a speed that a 120B dense model can never achieve. So it has to compete not just with models that fit 4-bit quantized the same size, but with an 86GB GGUF file of DeepSeek v4 Flash, and it is not very easy to win in practical terms for local inference.
Note: I have more uncommitted speed improvements in my tree that I'll push soon, the current tree could be a little bit slower but not much, still super usable.
I don't understand one thing about Mistral, which I'm a fan being in Europe: they opened the open weights MoE show with Mixtral. Why are they now releasing dense models of significant sizes? In this way you don't compete in any credible space, nor local inference, nor remote inference since the model is far from SOTA and not cheap to serve. So why they are training such dense big models? Dense models have a place in the few tens of billion parameters, as Qwen 3.6 27B shows, but if you go 5 times that, it is no longer a fit, unless you are crushing with capabilities anything requiring the same VRAM, which is not the case.
For it's size, that's really good! Though I bet it being a dense model probably helps a lot, if it was MoE at that size, I bet the benchmark performance would go quite a bit down (which consequently would also mean that I'd at least be able to run it with decent tokens/second, with the bunch of Nvidia L4 cards available to me, which presently are only okay with MoE models).
It's cool that they added comparisons to their own Mistral Small 4 119B A7B, which kind of shows that! They could have also included comparisons to something like Qwen Coder Next 80B A3B (or maybe the newer Qwen 3.6 35B A3B, or the 27B dense one), maybe DeepSeek V4 Flash 284B A13B, or the older GPT-OSS 120B A5B to illustrate that difference and where their model sits even better, it would probably give a more positive picture than just comparing themselves against a bunch of bigger models!
Come to think of it, alongside throwing some money at DeepSeek not just Anthropic, I probably should get a Mistral subscription as well sometime, to see how they perform on various tasks - cause they seem pretty cost effective and it's nice to support at least some EU orgs: https://mistral.ai/pricing
Compared to all other hosted LLMs that I have tested, Mistral seems to be the only one with rather strict CSP headers. When you ask them to create a website with some javascript library it will not preview, even though le chat offers canvas mode.
Sometimes when a new release comes around from any provider I just want to test it a bit on the web. without paying and using an agent harness.
Why are they like this ;_;
Edit: Christ on a bike it's bad at drawing SVGs https://chat.mistral.ai/chat/23214adb-5530-4af9-bb47-90f5219...
Mistral continuing to ship credible models is good for the market. Buyers need more than a two-company choice if they want pricing and deployment leverage.
I'm using mistral-medium-2508 for some text transformation operations. It's giving me better results than mistral-large for my use cases. Looking forward to testing this new model, although I'm not sure if it's really meant at replacing the previous medium model since it's a lot more expensive and presented more as a coding / agentic model (mistral-medium-2508 was priced $0.4/$2 per 1M tokens, mistral-medium-3.5 is $1,5/$7.5).
It's okay, nothing exceptional, but any news from non US and non Chinese models is still good news.
The Vibe CLI is really bad on Windows, sure they don’t officially support it, so can’t blame them, but a FYI for anyone wanting to try it. It can’t get find and replace right.
It's funny that 128B is now considered Medium. I remember back in the day when 355M parameters was considered medium with GPT-2.
Ouch. Maybe they have a captive buying market to insulate them from actual market forces or ???
This release Mistral really reminds you of the gap between the frontier labs and everyone else.
Pre-agent, there wasn't always an obvious difference between models. Various models had their charms. Nowadays, I don't want to entertain anything less than the frontier models. The difference in capability is enormous and choosing anything less has a real cost in terms of productivity.
I've been a big fan of the smaller labs like Mistral and especially Cohere but it's been a while since I've been excited by a release by either company.
That said, I'm using mistral voxtral realtime daily – it's great.
I can't figure out if this is available in the official Mistral API or not.
Their model listing API returns this:
{
"id": "mistral-medium-2508",
"object": "model",
"created": 1777479384,
"owned_by": "mistralai",
"capabilities": {
"completion_chat": true,
"function_calling": true,
"reasoning": false,
"completion_fim": false,
"fine_tuning": true,
"vision": true,
"ocr": false,
"classification": false,
"moderation": false,
"audio": false,
"audio_transcription": false,
"audio_transcription_realtime": false,
"audio_speech": false
},
"name": "mistral-medium-2508",
"description": "Update on Mistral Medium 3 with improved capabilities.",
"max_context_length": 131072,
"aliases": [
"mistral-medium-latest",
"mistral-medium",
"mistral-vibe-cli-with-tools"
],
"deprecation": null,
"deprecation_replacement_model": null,
"default_model_temperature": 0.3,
"type": "base"
},
So that has the alias "mistral-medium-latest", but the official ID is "mistral-medium-2508" which suggests it's the model they released in August 2025.But... that 1777479384 timestamp decodes to Wednesday, April 29, 2026 at 04:16:24 PM UTC
So is that the new Mistral Medium?
This is a very interesting strategy that might pay off. This model is a very good option for enterprise self host. I would argue a lot of companies are VRAM constrained rather than compute constrained. You could fit 4-5 running instances on one H100 cluster where you can only fit 1-2 Kimi K2 or GLM5.
Given what Vibe already did in the previous versions with codestral-v2, that's great news. Keep up the good work ! I don't want to depend on the world's two hungry superpowers.
I like the idea of Mistral, but the last time I evaluated Mistral Vibe it was really nice for $15/month but not as effective as Gemini Plus with AntiGravity and gemini-cli. I am currently running Gemini Ultra on a 3 month 'special deal' and AntiGravity with Opus 4.7 tokens is pretty much fantastic.
That said, when I stop spending money on Gemini Ultra, I will give Mistral Vibe another 1-month test.
I like the entire business model and vibe of Mistral so much more than OpenAI/Anthropic/Google but I also have stuff to get done. I am curious if Mistral Vibe for $15/month is a stable business model (i.e., can they make a profit).
I use Mistral Le Chat quite a bit.
One thing in particular I was disappointed in was its bad explanations when asking about French grammar. It made multiple mistakes and the other models got it right, even Qwen 3.6 27b!
Anyway, I'm hoping they catch up some more.
I'm rooting for Mistral. It seems they are making a big bet that smaller models will win over larger ones and I can see it happening. I was running some simple chat and tool-calling benchmarks for small models and Mistral Small 4 performed well for it's price ($.15/$.60). Seeing this today got me excited, benchmarks seems solid compared to models much larger, but it's priced higher than Haiku, 5.4 mini, and all the the Chinese models it's comparing itself too. It's not even winning those benches either, just being competitive with them, which is great, those models are 5x+ the size, but they are also 1/2 the price. Hard to be excited about that.
With most OSS releases being MoEs, and modern GPUs optimized for MoEs, can somebody with knowledge of the topic explain or speculate why Mistral might have opted for a dense model?
A 1000B model, can we call it 1KB model?
Looks at first graph. It's SWE-Bench Verified. A benchmark Open-AI stopped using two months ago due to contamination.
Doesn't look to promising. Is there any reason to consider Mistral other than it's not US?
I want to believe it's gonna be good, but after trying GPT-5.5 even the most advanced Chinese models seem depressing.
TLDR: Mistral Medium 3.5, text-only, 128B dense model, 256k context window, modified MIT license. Model is ~140G ...
https://huggingface.co/mistralai/Mistral-Medium-3.5-128B
They more or less claim this exceeds Claude Sonnet 3.5 on most things, but is worse than Sonnet 3.6, and exceeds all other open models.
Oh and they have a cloud service that will code your apps "in the cloud". But, yeah, at this point, so does my cat.
And, yes, unsloth is on it: https://huggingface.co/unsloth/Mistral-Medium-3.5-128B-GGUF (but 4bit quant is 75G)
[dead]
Oh they are still a thing?! Completely forgot about Mistral. I am assuming they are still burning trough investor money.
I'm not sure what people are on in the comments. It doesn't beat the other models, but it sure competes despite its size.
GLM 5.1 is an excellent model, but even at Q4 you're looking at ~400GB. Kimi K2.5 is really good too, and at Q4 quantization you're looking at almost ~600GB.
This model? You can run it at Q4 with 70GB of VRAM. This is approaching consumer level territory (you can get a Mac Studio with 128GB of RAM for ~3500 USD).
For the Claude-pilled people, I don't know if you only run Opus but when I was on the Pro plan Sonnet was already extremely capable. This beats the latest Sonnet while running locally, without anyone charging you extra for having HERMES.md in your repo, or locking you out of your account on a whim.
Mistral has never been competitive at the frontier, but maybe that is not what we need from them. Having Pareto models that get you 80% of the frontier at 20% of the cost/size sounds really good to me.