Interesting benchmark.
I can't help but notice that they're benchmarking Opus 4.6 (Anthropic's latest and greatest model) against GPT-5.2 (which is three generations behind OpenAI's latest coding models: GPT-5.2-Codex, GPT-5.3-Codex and the latest GPT-5.4).
As far as I know, OpenAI did not release 5.3 Codex in their API. You can only use it with Codex CLI or app.