There are numerous guides for all levels of Baba Is You available. I think it's likely that any modern LLM has them as part of its training dataset. That severely degrades this as a test for complex solution capabilities.
Still, its interesting to see the challenges with dynamic rules (like "Key is Stop") that change where are you able to move etc.
Read the article first maybe
The dynamic rule changes are precisely what make this a valuable benchmark despite available guides. Each rule modification creates a novel state-space that requires reasoning about the consequences of those changes, not just memorizing solution paths.