The model needs to be trained to use the harness. Sonnet 4.5 and gpt-5.1-codex-max are "weaker" models in abstract, but you can get much more mileage out of them due to post-training.