"Boring but right" generally means that this prediction is already priced in to our current understanding of the world though. Anyone can reliably predict "the sun will rise tomorrow", but I'm not giving them high marks for that.
Perhaps a new category, 'highest risk guess but right the most often'. Those is the high impact predictions.
something like correctness^2 x novel information content rank?
I'm giving them higher marks than the people who say it won't.
LLMs have seen huge improvements over the last 3 years. Are you going to make the bet that they will continue to make similarly huge improvements, taking them well past human ability, or do you think they'll plateau?
The former is the boring, linear prediction.