You know they are benchmaxxing when they end up writing their coding harness in TypeScript npm slop
Their models can't help them build it with something better?
That's the only benchmark people need, whether or not their model can raise the bar of their own product
And so far it's looking pretty sad