The author posted new results using the API (apparently the original run was through Codex), and 5.5 moves to the top: https://x.com/VictorTaelin/status/2047818978664268071
Still doesn't explain why Codex 5.4 is better than Codex 5.5.
Still doesn't explain why Codex 5.4 is better than Codex 5.5.