logoalt Hacker News

meffmaddtoday at 11:13 AM1 replyview on HN

I found LLMs to be surprisingly good at puzzle games like Baba Is You: https://meffmadd.github.io/samplesurium/posts/baba_is_agent/


Replies

staredtoday at 11:27 AM

Nice!

I remember "Baba is Eval" (https://fi-le.net/baba/), released 11 months ago, back when Claude Opus 4 was the strongest model. Back then, I was surprised how poor was it even at the first level.

I am happy to see an another approach - and indeed, with much stronger results.

show 1 reply