logoalt Hacker News

constantcryingtoday at 3:29 PM4 repliesview on HN

The lack of the comparison (which absolutely was done), tells you exactly what you need to know.


Replies

bildungtoday at 4:48 PM

I think people from the US often aren't aware how many companies from the EU simply won't risk losing their data to the providers you have in mind, OpenAI, Anthropic and Google. They simply are no option at all.

The company I work for for example, a mid-sized tech business, currently investigates their local hosting options for LLMs. So Mistral certainly will be an option, among the Qwen familiy and Deepseek.

Mistral is positioning themselves for that market, not the one you have in mind. Comparing their models with Claude etc. would mean associating themselves with the data leeches, which they probably try to avoid.

show 2 replies
popinman322today at 4:10 PM

They're comparing against open weights models that are roughly a month away from the frontier. Likely there's an implicit open-weights political stance here.

There are also plenty of reasons not to use proprietary US models for comparison: The major US models haven't been living up to their benchmarks; their releases rarely include training & architectural details; they're not terribly cost effective; they often fail to compare with non-US models; and the performance delta between model releases has plateaued.

A decent number of users in r/LocalLlama have reported that they've switched back from Opus 4.5 to Sonnet 4.5 because Opus' real world performance was worse. From my vantage point it seems like trust in OpenAI, Anthropic, and Google is waning and this lack of comparison is another symptom.

show 2 replies
tarrudatoday at 4:00 PM

Here's what I understood from the blog post:

- Mistral Large 3 is comparable with the previous Deepseek release.

- Ministral 3 LLMs are comparable with older open LLMs of similar sizes.

show 1 reply
crimsoneertoday at 3:36 PM

If someone is using these models, they probably can't or won't use the existing SOTA models, so not sure how useful those comparisons actually are. "Here is a benchmark that makes us look bad from a model you can't use on a task you won't be undertaking" isn't actually helpful (and definitely not in a press release).

show 1 reply