The matrix required for a fair comparison is getting too complicated, since you have to compare chat/thinking/pro against an array of Anthropic and Google models.
But they publish all the same numbers, so you can make the full comparison yourself, if you want to.