When I hear "claude code one-shotted X" and X is a novel problem, I mentally substituted "the agentic harness that I tried one-shotted X," since that's what they're saying.
Getting any smart model to take a look at the task is the sort of lift that the speaker is usually pointing to.
The harness is pretty much irrelevant for general tasks.
You can write a 100 line harness that only has one tool - try either "bash" or the more fun "you're running within nodejs, here's eval", you'd be surprised in how close to CC/Codex performance you're going to get.