For a game like anchorhead, which is famous in its niche, shouldn’t Claude already know it sufficiently to just solve it right away? I would expect that its data source contained multiple discussions and walkthroughs of the game.
I would think so. I'd be far more interested in a comparison of LLMs (no internet search allowed) playing against IF games released in the past month.
Yeah, I do not find performances like this very impressive.
Honestly I am curious how it would do if it did have a walkthrough.
It's very likely the model didn't stop to question if the game they were playing was something they knew already, and just assumed it was a puzzle created for it.
I expect it's somewhere in the training data, but it's very unlikely to be salient. A few textfiles here and there in the ocean of the Internet is nothing. If Claude had memorized the walkthrough, it would have performed better.