The people recruited weren’t experts. I can imagine it’s straightforward to find humans (such as those that play many video games) that can score >100% on this benchmark.
So, if you look at the way the scoring works, 100% is the max. For each task, you get full credit if you solve in a number of steps less than or equal to the baseline. If you solve it with more steps, you get points off. But each task is scored independently, and you can't "make up" for solving one slowly by solving another quickly.
Like suppose there were only two tasks, each with a baseline score of solving in 100 steps. You come along and you solve one in only 50 steps, and the other in 200 steps. You might hope that since you solved one twice as quickly as the baseline, but the other twice as slowly, those would balance out and you'd get full credit. Instead, your scores are 1.0 for the first task, and 0.25 (scoring is quadratic) for the second task, and your total benchmark score is a mere 0.625.
So, if you look at the way the scoring works, 100% is the max. For each task, you get full credit if you solve in a number of steps less than or equal to the baseline. If you solve it with more steps, you get points off. But each task is scored independently, and you can't "make up" for solving one slowly by solving another quickly.
Like suppose there were only two tasks, each with a baseline score of solving in 100 steps. You come along and you solve one in only 50 steps, and the other in 200 steps. You might hope that since you solved one twice as quickly as the baseline, but the other twice as slowly, those would balance out and you'd get full credit. Instead, your scores are 1.0 for the first task, and 0.25 (scoring is quadratic) for the second task, and your total benchmark score is a mere 0.625.