I would be way more interested in it playing niche community levels, because I suspect a huge reason it's able to solve these levels is because it was trained on a million Baba is You walkthroughs. Same with people using Pokemon as a way to test LLMs, it really just depends on how well it knows the game.
Two corrections, as written in the post: At least Claude not able to solve the standard levels at all, and community levels are definitely in scope.