Mistral AI Releases Forge

437 points • by pember • yesterday at 9:04 PM • 84 comments • view on HN

Comments

I like Mistral, it hits the exact sweet spot between cost and my data staying in the EU, withouth a significant drop in quality, but man are their model naming conventions confusing af. They mention they have a model called Devstral 2, which is neither Codestral nor Devestral. I want to use it, but the api only lists devstral-2512, devstral-latest, devstral-medium-latest, devstral-medium-2507, devstral-small, devstral-small-2507.

I think, devstral-latest should be it, no? So I write to support and get an answer 12 hours later that says oh, no, devstral 2 is definetely called devstral 2 and then a page of instructions on how to set it up in Intellij... generated with AI. The screens it is refering to don't exist and never did.

➕ show 3 replies

ogou • today at 5:19 AM

Don't sleep on Mistral. Highly underrated as a general service LLM. Cheaper, too. Their emphasis on bespoke modelling over generalized megaliths will pay off. There are all kinds of specialized datasets and restricted access stores that can benefit from their approach. Especially in highly regulated EU.

Not everyone is obsessed with code generation. There is a whole world out there.

➕ show 3 replies

upghost • today at 3:07 AM

> Pre-training allows organizations to build domain-aware models by learning from large internal datasets.

> Post-training methods allow teams to refine model behavior for specific tasks and environments.

How do you suppose this works? They say "pretraining" but I'm certain that the amount of clean data available in proper dataset format is not nearly enough to make a "foundation model". Do you suppose what they are calling "pretraining" is actually SFT and then "post-training" is ... more SFT?

There's no way they mean "start from scratch". Maybe they do something like generate a heckin bunch of synthetic data seeded from company data using one of their SOA models -- which is basically equivalent to low resolution distillation, I would imagine. Hmm.

➕ show 4 replies

mark_l_watson • yesterday at 11:58 PM

I am rooting for Mistral with their different approach: not really competing on the largest and advanced models, instead doing custom engineering for customers and generally serving the needs of EU customers.

➕ show 5 replies

thecopy • today at 9:21 AM

Looks interesting. But how to explore or test or use? The product page (https://mistral.ai/products/forge) also does not contain anything useful. Just "Contact us"

Dissapointing.

roxolotl • yesterday at 11:36 PM

Mistral has been releasing some cool stuff. Definitively behind on frontier models but they are working a different angle. Was just talking at work about how hard model training is for a small company so we’d probably never do it. But with tools like this, and the new unsloth release, training feels more in reach.

jbverschoor • today at 7:39 AM

ASML and ESA as clients means something. I dont expect to see the first name somewhere else on the logo list

ryeguy_24 • today at 1:52 AM

How many proprietary use cases truly need pre-training or even fine-tuning as opposed to RAG approach? And at what point does it make sense to pre-train/fine tune? Curious.

➕ show 3 replies

burgerquizz • today at 9:33 AM

can i use mistral to read my source code and teach it so i don't need to inject the whole doc every single time and consume token every single time?

dmix • today at 1:14 AM

This is definitely the smart path for making $$ in AI. I noticed MongoDB is also going into this market with https://www.voyageai.com/ targeting business RAG applications and offering consulting for company-specific models.

dash2 • today at 6:30 AM

I think it’s interesting what this approach suggests about who will profit from AI. I’m sceptical that having huge numbers of GPUs is a moat. After all, real humans – even geniuses – are trained on much much less data than the whole Internet. But proprietary and specialised data could very well be a moat. It’s hard to train a scientist/lawyer/analyst without reading a lot of science/law/finance. Companies’ proprietary data might encode a great deal of irreplaceable knowledge. Seems as if Mistral is taking this bet.

Aldipower • today at 8:27 AM

I cannot keep up with their products, model names and releases. What is what for? Their marketing texts do not make sense for me. Is there a nice overview somewhere?

I am a simple stupid Le Chat user with a small mind and the Tredict MCP Server connected to it (to Le Chat, not my mind), which works ok-ish. :-)

csunoser • yesterday at 11:47 PM

Huh. I initially thought this is just another finetuning end point. But apparently they are partnering up with customers on the pretraining side as well. But RL as well? Jeez RL env are really hard to get right. Best wishes I guess.

krinne • today at 8:45 AM

I wasnt able to find a way to access this - is this something accessible only to enterprises ?

Would love to take it for a spin, if that is even possible.

zby • today at 6:40 AM

I am pretty sure that the solution to continuous learning is with external storage. There is a lot of talk about context engineering - but I have not seen anyone taking context as the main bottleneck and building a system around that. This would show that even context engineering is kind of wrong term - because context does not enter the llm in some mysterious way - it goes through prompt and the whole model of passing chat history back and forth is not the most efficient way of using the prompt limitation.

➕ show 2 replies

speedgoose • today at 6:37 AM

I was enthusiastic but it’s "contact us" priced for now. I was expecting a classic cloud LLM forge with a public pricing.

whatever1 • today at 6:42 AM

I thought that for pretraining to work and reasoning to emerge you need internet scale data. How can forge achieve it with just internal company data (unless the said company is AT&T or something) ?

rorylawless • today at 1:12 AM

The fine tuning endpoint is deprecated according to the API docs. Is this the replacement?

https://docs.mistral.ai/api/endpoint/deprecated/fine-tuning

➕ show 1 reply

andai • today at 1:19 AM

They mention pretraining too, which surprises me. I thought that was prohibitively expensive?

It's feasible for small models but, I thought small models were not reliable for factual information?

➕ show 1 reply

hermit_dev • today at 3:04 AM

The future of AI is specialization, not just achieving benevolent knowledge as fast as we can at the expense of everything and everyone along the way. I appreciate and applaud this approach. I am looking into a similar product myself. Good stuff.

➕ show 2 replies

aavci • today at 2:27 AM

How does this compare to fine tuning?

supernes • today at 5:40 AM

> Code agents are becoming the primary users of developer tools, so we built Forge for them first, not

... for humans.

bsjshshsb • today at 1:09 AM

Id training or FT > context? Anyone have experience.

Is it possible to retrain daily or hourly as info changes?

wei03288 • today at 6:02 AM

The interesting positioning here is the pretraining partnership angle, not just the fine-tuning endpoint. Most model providers compete on "best foundation model" — Mistral is betting on "best model for your data", which is a fundamentally different value proposition and sidesteps the frontier race entirely.

The RL component is the part worth watching. Custom reward models trained on domain-specific preferences can get significantly better results than generic RLHF on narrow tasks, but they require the customer to have enough labeled preference data to bootstrap the reward model. That's a higher bar than fine-tuning, but also a higher moat for Mistral once it's working.

The business model makes sense too: pretraining partnerships lock in much longer relationships than inference API contracts.

codance • today at 1:58 AM

[dead]

shablulman • yesterday at 9:21 PM

[dead]

gpubridge • today at 3:01 AM

[flagged]

alt Hacker News

Mistral AI Releases Forge

Comments