logoalt Hacker News

jampekkatoday at 4:42 PM1 replyview on HN

1491 vs 1418 ELO means the stronger model wins about 60% of the time.


Replies

supermatttoday at 4:50 PM

Probably naive questions:

Does that also mean that Gemini-3 (the top ranked model) loses to mistral 3 40% of the time?

Does that make Gemini 1.5x better, or mistral 2/3rd as good as Gemini, or can we not quantify the difference like that?

show 1 reply