It's not reasonable to compare results from two different tool sets, especially as they are guided by humans.
The only way a reasonable comparison could be made, would be to compare completely automated results from either technology - that would be useful.
For example - creating a 'per-baked script' and running on both to see the output.
Codex and Claude are obviously very different, though it's hard to characterize how those differences might apply exactly to a given problem.
Two 'very different power saws' will ultimately build the same home.