Becnhmarks are a pox on LLMs. You can use this model for about 5 seconds and realize its reasoning...

BoorishBears • today at 6:32 PM • 1 reply • view on HN

Becnhmarks are a pox on LLMs.

You can use this model for about 5 seconds and realize its reasoning is in a league well above any Qwen model, but instead people assume benchmarks that are openly getting used for training are still relevant.

Replies

j45 • today at 6:58 PM

Definitely have to use each model for your use case personally, many models can train to perform better on these tests but that might not transfer to your use case.

alt Hacker News

Replies