Looks at first graph. It's SWE-Bench Verified. A benchmark Open-AI stopped using two months ago due to contamination.
Doesn't look to promising. Is there any reason to consider Mistral other than it's not US?
If it's not US and it's within a few percent of SOTA that might be good enough for a lot of people (eg Europeans)
Price and speed.
They did not stop using it due to contamination. They said it's flawed and indirectly said anthropics results were impossible. It's very possible they are sore losers