logoalt Hacker News

boxedtoday at 5:54 AM1 replyview on HN

The point of this test is to check if an AI system can figure out the game. This isn't what happened here. A human figured out the game, wrote in their prompts exactly how the game works and THEN put the AI on the problem. This is 100% cheating and imo quite stupid.


Replies

ithkuiltoday at 9:56 AM

The harness would be fine if the agent coded its own harness in a controlled environment while observing the game.

Not sure if the specific rules of this prize allow that, but I would accept that