I once made a “RC plays Baba Is You” that controlled the game over a single shared browser that was streaming video and controls back to the game. Was quite fun!
But I am fairly sure all of Baba Is You solutions are present in the training data for modern LLMs so it won’t make for a good eval.
> But I am fairly sure all of Baba Is You solutions are present in the training data for modern LLMs so it won’t make for a good eval.
Claude 4 cannot solve any Baba Is You level (except level 0 that is solved by 8 right inputs), so for now it's at least a nice low bar to shoot for...