The point of this test is to check if an AI system can figure out the game. This isn't what happened here. A human figured out the game, wrote in their prompts exactly how the game works and THEN put the AI on the problem. This is 100% cheating and imo quite stupid.
The harness would be fine if the agent coded its own harness in a controlled environment while observing the game.
Not sure if the specific rules of this prize allow that, but I would accept that