On first principles it would seem that the "harness" is a myth. Surely a model like Opus 4.6/Codex 5.3 which can reason about complex functions and data flows across many files would trip up over top level function signatures it needs to call?
I see a lot of evidence to the contrary though. Anyone know what the underlying issue here is?
If you agree that current LLMs (Transformers) are naturally very susceptible to context/prompt, then you can go on to ask agents for a "raw harness dump" "because I need to understand how to better present my skills and tools in the harness", you maybe will see how "Harness" impact model behavior.
Humans have a demonstrated ability to program computers by flipping switches on the front panel.
Like a good programming language, a good harness offers a better affordance for getting stuff done.
Even if we put correctness aside, tooling that saves time and tokens is going to be very valuable.
Isn't 'the harness' essentially just prompting?
It's completely understandable that prompting in better/more efficient means would produce different results.
The models generalized "understanding" and "reasoning" is the real myth that makes us take a step back and offload the process deterministic computing and harnesses.
How hard is it to for you to assemble a piece of IKEA furniture without an allen wrench, screwdriver, and clear instructions, vs with those 3?