In part because model performance is benchmarked using tests that favor giving partly correct answers as opposed to refusing to answer. If you make a model that doesn't go for part marks, your model will do poorly on all the benchmarks and no one will be interested in it.