" The real value in the AI ecosystem isn’t the model or the harness — it’s the integration of both working seamlessly together. "
Wut? The value in the ecosystem is the model. Harnesses are simple. Great models work nearly identically in every harness
There is a reason why Copilot+Opus4.6 is shit, while Claude Code + Opus 4.6 produces excellent results.
The harness matters A LOT.
The model is the engine, the harness is the driver and chassis. Even the best top of the line engine in a shitty car driven by a bad driver won't win any races.
Harnesses are simple (kind of? Some certainly aren't, but I'd agree that they can be simple) but they deliver a ton of value. They have a significant ROI.
I agree that good models have more value because a harness can't magically make a bad model good, but there's a lot that would be inordinately difficult without a proper harness.
Keeping models on rails is still important, if not essential. Great models might behave similarly in the same harness, but I suppose the value prop is that they wouldn't behave as well on the same task without a good harness.
The model defines the ceiling, but the harness determines how much of that ceiling you actually reach.
It is not everyone’s experience that models work the same in every harness.
I wouldn’t say harnesses is simple. They do a lot of things that we aren’t thinking of. I learned that a good harnesses is as valuable as the model. But obviously the model is what carries the whole thing.
I tried to build my own harness once. The amount of work that is required is incredible. From how external memory is managed per session to the techniques to save on the context window, for example, you do not want the llm to read in whole files, instead you give it the capability to read chunks from offsets, but then what should stay in context and what should be pruned.
After that you have to start designing the think - plan - generate - evaluation pipeline. A learning moment for me here was to split up when the llm is evaluating the work, because same lllm who did the work should not evaluate itself, it introduces a bias. Then you realize you need subagents too and statt wondering how their context will be handled (maybe return a summarized version to the main llm?).
And then you have to start thinking about integration with mcp servers and how the llm should invoke things like tools, prompts and resources from each mcp. I learned llms, especially the smaller ones tend to hiccup and return malformed json format.
At some point I started wondering about just throwing everything and just look at PydanticAi or Langchain or Langgraph or Microsoft Autogen to operate everything between the llm and mcps. Its quite difficult to make something like this work well, especially for long horizontal tasks.