Exactly. As far as I'm concerned, the benchmark is useless. It's way too easy and rewarding to train on it.
Y'all are way too skeptical, no matter what cool thing AI does you'll make up an excuse for how they must somehow be cheating.
I mean if you want to make your own benchmark, simply don't make it public and don't do it often. If your salamander on skis or whatever gets better with time it likely has nothing to do with being benchmaxxed.
It's just an in-joke, he doesn't intend it as a serious benchmark anymore. I think it's funny.