logoalt Hacker News

BoorishBearstoday at 6:32 PM1 replyview on HN

Becnhmarks are a pox on LLMs.

You can use this model for about 5 seconds and realize its reasoning is in a league well above any Qwen model, but instead people assume benchmarks that are openly getting used for training are still relevant.


Replies

j45today at 6:58 PM

Definitely have to use each model for your use case personally, many models can train to perform better on these tests but that might not transfer to your use case.