In part because model performance is benchmarked using tests that favor giving partly correct answer...

xmcqdpt2 • yesterday at 6:26 AM • 0 replies • view on HN

In part because model performance is benchmarked using tests that favor giving partly correct answers as opposed to refusing to answer. If you make a model that doesn't go for part marks, your model will do poorly on all the benchmarks and no one will be interested in it.

https://arxiv.org/abs/2509.04664

alt Hacker News