This outperforms Gemini 3 pro image (nano banana pro) on Text-to-Image Arena and Image Edit Arena. I'm surprised they didn't mention this leaderboard in the blog post.
I like this benchmark because its based upon user votes, so overfitting is not as easy (after all, if users prefer your result, you've won).
The arena concept doesn’t work for image models due to watermarks.
The score are really, really close, it might be why