logoalt Hacker News

zahlmanyesterday at 3:58 PM1 replyview on HN

> This is why the video of Claude solving level 1 at the top was actually (dramatic musical cue) staged, and only possible via a move-for-move tutorial that Claude nicely rationalized post hoc.

One of the things this arc of history has taught me is that post-hoc rationalization is depressingly easy. Especially if it doesn't have to make sense, but even passing basic logical checks isn't too difficult. Ripping the rationalization apart often requires identifying novel, non-obvious logical checks.

I thought I had learned that time and time again from human politics, but AI somehow made it even clearer than I thought possible. Perhaps simply because of knowing that a machine is doing it.

Edit: after watching the video more carefully:

> "This forms WALL IS WIN horizontally. But I need "FLAG IS WIN" instead. Let me check if walls now have the WIN property. If they do, I just need to touch a wall to win. Let me try moving to a wall:

There's something extremely uncanny-valley about this. A human player absolutely would accidentally win like this, and have similar reasoning (not expressed so formally) about how the win was achieved after the fact. (Winning depends on the walls having WIN and also not having STOP; many players get stuck on later levels, even after having supposedly learned the lesson of this one, by trying to make something WIN and walk onto it while it is still STOP.)

But the WIN block was not originally in line with the WALL IS text, so a human player would never accidentally form the rule, but would only do it with the expectation of being able to win that way. Especially since there was already an obvious, clear path to FLAG — a level like this has no Sokoban puzzle element to it; it's purely about learning that the walls only block the player because they are STOP.

Nor would (from my experience watching streamers at least) a human spontaneously notice that the rule "WALL IS WIN" had been formed and treat that as a cue to reconsider the entire strategy. The natural human response to unintentionally forming a useful rule is to keep pushing in the same direction.

On the other hand, an actually dedicated AI system (in the way that AlphaGo was dedicated to Go) could, I'm sure, figure out a game like Baba Is You pretty easily. It would lack the human instinct to treat the walls as if they were implicitly always STOP; so it would never struggle with overriding it.


Replies

deadbabeyesterday at 4:34 PM

A simple feed-forward neural network with sufficient training can solve levels way better than Claude. Why is Claude being used at all.

show 1 reply