I gave Codex with GPT-5.5 High this prompt:
Identify bugs in [datrs/varinteger](https://github.com/datrs/varinteger) . Do NOT look at the GitHub issues, just inspect the source
It also found the bug that Leanstral 1.5 found and the authors highlighted. I think this bug wasn't especially tricky; it's just a case of too few eyeballs on this repo.Congrats on the release regardless! Excited for the direction Lean + automated AI proofs are headed.
Disclosure: I work at OpenAI.
Given that they directly compare to GPT-5.5 in their documentation. This comes off as puppy kicking to me. They state it is not SOTA, even IN its domain!
Honestly: Think twice before dragging your firm into what you say.
Disclaimer: I speak for myself. Not any firm I am associated with.
Leanstral 1.5 has 6B active parameters. How many parameters does GPT-5.5 have?
GPT-5.5 is what, a trillion+ parameter model? I think the insight here is that you can do this with a tiny model.
the mechanism is whats interesting, rather than whether it could do it.
this sounds like a great tool to add to the toolbelt, as part of the "how do we handle all the code output from LLMs" problem
> It also found the bug that Leanstral 1.5 found and the authors highlighted
This is a little bit like someone pointing the moon and you look at the finger.
The formal proof domain goes way beyond just finding bugs.
It has tons of usages in term of functional safety, protocol validation, cryptography, etc...
The fact Mistral tackle this kind of problem is both smart and not so surprising.
Smart because it is niche enough that they do not front face the big competitors (yet).
No so surprising because the French labs have a well known and long time expertise with formal proof tools (Coq and all its Ocaml associated tools). It has been historically mainly pushed by the aerospace and train industries (Airbus, Dassault, Alsthom).