I think it can't be improved because it's measuring the wrong thing. A junior engineer becomes a senior when they stop being told what code to write and start solving business needs. Therefore often the highest paid engineers aren't the ones who would do the best on leetcode - or SWE bench pro verified.
Maybe AGI is possible and we'll have software defined human intelligence that's completely autonomous but that's not coming in the next slightly better RL trained LLM and if existed likely wouldn't be under our control anyway