Like other ARC-AGI challenges it was never needed to reach 100% to get human-level. The benchmark score is stretched so that the benchmark takes more time to be saturated, that's it.
The current SotA models are still very far from your hypothetical “average human” with a score of 3%. So the benchmark is indeed useful to help the field progress (which is the entire point of ARC-AGI benchmarks).