logoalt Hacker News

xmcqdpt2yesterday at 6:26 AM0 repliesview on HN

In part because model performance is benchmarked using tests that favor giving partly correct answers as opposed to refusing to answer. If you make a model that doesn't go for part marks, your model will do poorly on all the benchmarks and no one will be interested in it.

https://arxiv.org/abs/2509.04664