I remember "Baba is Eval" (https://fi-le.net/baba/), released 11 months ago, back when Claude Opus 4 was the strongest model. Back then, I was surprised how poor was it even at the first level.
I am happy to see an another approach - and indeed, with much stronger results.
Nice!
I remember "Baba is Eval" (https://fi-le.net/baba/), released 11 months ago, back when Claude Opus 4 was the strongest model. Back then, I was surprised how poor was it even at the first level.
I am happy to see an another approach - and indeed, with much stronger results.