Get the actual prompt and have Claude Code / Codex try it out via curl / python requests. The full prompt will yield debugging information. You have to set a few parameters to make sure you get the full gpt-5 performance. e.g. if your reasoning budget too low, you get gpt-4 grade performance.
IMHO you should just write your own harness so you have full visibility into it, but if you're just using vanilla OpenClaw you have the source code as well so should be straightforward.
Codex is fully open source…
> IMHO you should just write your own harness
Can you point to some online resources to achieve this? I'm not very sure where I'd begin with.