logoalt Hacker News

ThouTo2Cyesterday at 9:45 AM2 repliesview on HN

There are numerous guides for all levels of Baba Is You available. I think it's likely that any modern LLM has them as part of its training dataset. That severely degrades this as a test for complex solution capabilities.

Still, its interesting to see the challenges with dynamic rules (like "Key is Stop") that change where are you able to move etc.


Replies

ethan_smithyesterday at 1:08 PM

The dynamic rule changes are precisely what make this a valuable benchmark despite available guides. Each rule modification creates a novel state-space that requires reasoning about the consequences of those changes, not just memorizing solution paths.

klohtoyesterday at 9:50 AM

Read the article first maybe