What was the main focus when training this model? Besides the ELO score, it's looking like the ...

_boffin_ • today at 5:33 PM • 1 reply • view on HN

What was the main focus when training this model? Besides the ELO score, it's looking like the models (31B / 26B-A4) are underperforming on some of the typical benchmarks by a wide margin. Do you believe there's an issue with the tests or the results are misleading (such as comparative models benchmaxxing)?

Thank you for the release.

Replies

BoorishBears • today at 6:32 PM

Becnhmarks are a pox on LLMs.

You can use this model for about 5 seconds and realize its reasoning is in a league well above any Qwen model, but instead people assume benchmarks that are openly getting used for training are still relevant.

➕ show 1 reply

alt Hacker News

Replies