logoalt Hacker News

InputNameyesterday at 3:35 PM3 repliesview on HN

Looks at first graph. It's SWE-Bench Verified. A benchmark Open-AI stopped using two months ago due to contamination.

Doesn't look to promising. Is there any reason to consider Mistral other than it's not US?


Replies

2ndorderthoughtyesterday at 5:30 PM

They did not stop using it due to contamination. They said it's flawed and indirectly said anthropics results were impossible. It's very possible they are sore losers

tpurvesyesterday at 3:43 PM

If it's not US and it's within a few percent of SOTA that might be good enough for a lot of people (eg Europeans)

show 1 reply
amunozoyesterday at 3:49 PM

Price and speed.