Curious what you mean by "agent harness" here... are you distinguishing between true autonomous agents (model decides next step) vs workflows that use LLMs at specific nodes? I've found the latter dramatically more reliable for anything beyond prototyping, which makes me wonder if the "model improvement" is partly better prompting and scaffolding.
An agent harness is what enables the user to seamlessly interact with both a model and tool calls. Claude Code is an agent harness.
┌────────────────────────────┐
│ User │
└──────────────┬─────────────┘
│
▼
┌────────────────────────────┐
│ Agent Harness │
│ (software interface) │
└──────┬──────────────┬──────┘
│ │
▼ ▼
┌────────────┐ ┌────────────┐
│ Models │ │ Tools │
└────────────┘ └────────────┘
Here's an example of a harness with less code: https://github.com/badlogic/pi-mono/blob/fdcd9ab783104285764...
Hi, author here. I mean the piece of code that calls the model and executes the tool calls. My colleague Philip calls it “9 lines of code”: https://sketch.dev/blog/agent-loop
We have built two of them now, and clearly the state of the art here can be improved. But it is hard to push too much on this while the models keep improving.