logoalt Hacker News

Mistral 3 family of models released

554 pointsby pembertoday at 3:01 PM171 commentsview on HN

Comments

barrelltoday at 4:04 PM

I use large language models in http://phrasing.app to format data I can retrieve in a consistent skimmable manner. I switched to mistral-3-medium-0525 a few months back after struggling to get gpt-5 to stop producing gibberish. It's been insanely fast, cheap, reliable, and follows formatting instructions to the letter. I was (and still am) super super impressed. Even if it does not hold up in benchmarks, it still outperformed in practice.

I'm not sure how these new models compare to the biggest and baddest models, but if price, speed, and reliability are a concern for your use cases I cannot recommend Mistral enough.

Very excited to try out these new models! To be fair, mistral-3-medium-0525 still occasionally produces gibberish ~0.1% of my use cases (vs gpt-5's 15% failure rate). Will report back if that goes up or down with these new models

show 6 replies
msp26today at 4:33 PM

The new large model uses DeepseekV2 architecture. 0 mention on the page lol.

It's a good thing that open source models use the best arch available. K2 does the same but at least mentions "Kimi K2 was designed to further scale up Moonlight, which employs an architecture similar to DeepSeek-V3".

---

vllm/model_executor/models/mistral_large_3.py

```

from vllm.model_executor.models.deepseek_v2 import DeepseekV3ForCausalLM

class MistralLarge3ForCausalLM(DeepseekV3ForCausalLM):

```

"Science has always thrived on openness and shared discovery." btw

Okay I'll stop being snarky now and try the 14B model at home. Vision is good additional functionality on Large.

show 2 replies
simonwtoday at 5:41 PM

The 3B vision model runs in the browser (after a 3GB model download). There's a very cool demo of that here: https://huggingface.co/spaces/mistralai/Ministral_3B_WebGPU

Pelicans are OK but not earth-shattering: https://simonwillison.net/2025/Dec/2/introducing-mistral-3/

show 1 reply
mythztoday at 4:05 PM

Europe's bright star has been quiet for a while, great to see them back and good to see them come back to Open Source light with Apache 2.0 licenses - they're too far from the SOTA pack that exclusive/proprietary models would work in their favor.

Mistral had the best small models on consumer GPUs for a while, hopefully Ministral 14B lives up to their benchmarks.

show 1 reply
timperatoday at 3:11 PM

Extremely cool! I just wish they would also include comparisons to SOTA models from OpenAI, Google, and Anthropic in the press release, so it's easier to know how it fares in the grand scheme of things.

show 4 replies
yvoschaaptoday at 3:28 PM

Upvoting for Europe's best efforts.

show 2 replies
mrinterwebtoday at 7:51 PM

I don't like being this guy, but I think Deepseek 3.2 stole all the thunder yesterday. Notice that these comparisons are to Deepseek 3.1. Deepseek 3.2 is a big step up over 3.1, if benchmarks are to be believed. Just unfortunate timing of release. https://api-docs.deepseek.com/news/news251201

simgttoday at 3:27 PM

I still don't understand what the incentive is for releasing genuinely good model weights. What makes sense however is OpenAI releasing a somewhat generic model like gpt-oss that games the benchmarks just for PR. Or some Chinese companies doing the same to cut the ground from under the feet of American big tech. Are we really hopeful we'll still get decent open weights models in the future?

show 5 replies
nullbiotoday at 4:58 PM

Anyone else find that despite Gemini performing best on benches, it's actually still far worse than ChatGPT and Claude? It seems to hallucinate nonsense far more frequently than any of the others. Feels like Google just bench maxes all day every day. As for Mistral, hopefully OSS can eat all of their lunch soon enough.

show 11 replies
arnaudsmtoday at 4:00 PM

Geometric mean of MMMLU + GPQA-Diamond + SimpleQA + LiveCodeBench :

- Gemini 3.0 Pro : 84.8

- DeepSeek 3.2 : 83.6

- GPT-5.1 : 69.2

- Claude Opus 4.5 : 67.4

- Kimi-K2 (1.2T) : 42.0

- Mistral Large 3 (675B) : 41.9

- Deepseek-3.1 (670B) : 39.7

The 14B 8B & 3B models are SOTA though, and do not have chinese censorship like Qwen3.

show 1 reply
lalassutoday at 4:12 PM

It's sad that they only compare to open weight models. I feel most users don't care much about OSS/not OSS. The value proposition is the quality of the generation for some use case.

I guess it says a bit about the state of European AI

show 3 replies
hnuser123456today at 3:32 PM

Looks like their own HF link is broken or the collection hasn't been made public yet. The 14B instruct model is here:

https://huggingface.co/mistralai/Ministral-3-14B-Instruct-25...

The unsloth quants are here:

https://huggingface.co/unsloth/Ministral-3-14B-Instruct-2512...

show 1 reply
trvztoday at 5:00 PM

Sad to see they've apparently fully given up on releasing their models via torrent magnet URLs shared on Twitter; those will stay around long after Hugging Face is dead.

show 1 reply
tootyskootytoday at 4:52 PM

Since no one has mentioned it yet: note that the benchmarks for large are for the base model, not for the instruct model available in the API.

Most likely reason is that the instruct model underperforms compared to the open competition (even among non-reasoners like Kimi K2).

esafaktoday at 4:04 PM

Well done to the France's Mistral team for closing the gap. If the benchmarks are to be believed, this is a viable model, especially at the edge.

show 1 reply
andhumantoday at 3:32 PM

This is big. The first really big open weights model that understands images.

show 1 reply
tmalytoday at 8:13 PM

I see several 3.x versions on Openrouter.ai, any idea which of those are the new models?

Tiberiumtoday at 3:40 PM

A bit interesting that they used Deepseek 3's architecture for their Large model :)

domoritztoday at 6:04 PM

Urg, the bar charts to not start at 0. It's making it impossible to compare across model sizes. That's a pretty basic chart design principle. I hope they can fix it. At least give me consistent y scales!

Aissentoday at 8:02 PM

Anyone succeed in running it with vLLM?

show 1 reply
jasonjmcgheetoday at 4:19 PM

I wish they showed how they compared to models larger/better and what the gap is, rather than only models they're better than.

Like how does 14B compare to Qwen30B-A3B?

(Which I think is a lot of people's goto or it's instruct/coding variant, from what I've seen in local model circles)

RYJOXtoday at 6:03 PM

I find that there are too many paid sub models at the minute with non legitimate progress to warrant the money spent. Recently cancelled GPT.

another_twisttoday at 4:20 PM

I am not sure why Meta paid 13B+ to hire some kid vs just hiring back or acquiring these folks. They'll easily catch up.

show 1 reply
tucnaktoday at 3:46 PM

If the claims on multilingual and pretraining performance are accurate, this is huge! This may be the best-in-class multilingual stuff since the more recent Gemma's, where they used to be unmatched. I know Americans don't care much about the rest of the world, but we're still using our native tongues thank you very much; there is a huge issue with i.e. Ukrainian (as opposed to Russian) being underrepresented in many open-weight and weight-available models. Gemma used to be a notable exception, I wonder if it's still the case. On a different note: I wonder why scores on TriviaQA vis-a-vis 14b model lags behind Gemma 12b so much; that one is not a formatting-heavy benchmark.

show 1 reply
codybontecoutoday at 3:21 PM

Do all of these models, regardless of parameters, support tool use and structured output?

show 1 reply
dmezzettitoday at 5:28 PM

Looking forward to trying them out. Great to see they are Apache 2.0...always good to have easy-to-understand licensing.

GaggiXtoday at 3:44 PM

The small dense model seems particularly good for their small sizes, I can't wait to test them out.

ThrowawayTestrtoday at 5:33 PM

Awesome! Can't wait till someone abliterates them.

catlover76today at 7:12 PM

[dead]

s_devtoday at 4:18 PM

I was subscribing to these guys purely to support the EU tech scene. So I was on Pro for about 2 years while using ChatGPT and Claude.

Went to actually use it, got a message saying that I missed a payment 8 months previously and thus wasn't allowed to use Pro despite having paid for Pro for the previous 8 months. The lady I contacted in support simply told me to pay the outstanding balance. You would think if you missed a payment it would relate to simply that month that was missed not all subsequent months.

Utterly ridiculous that one missed payment can justify not providing the service (otherwise paid for in full) at all.

Basically if you find yourself in this situation you're actually better of deleting the account and resigning up again under a different email.

We really need to get our shit together in the EU on this sort of stuff, I was a paying customer purely out of sympathy but that sympathy dried up pretty quick with hostile customer service.

show 2 replies
RomanPushkintoday at 5:30 PM

Mistral presented DeepSeek 3.2