I'm a neophyte. What makes a harness special or all that unique from another? I've had a reasonable experience with Zed and local models, but could be persuaded to put something else in the mix if there is a measurable benefit to be had.
Simple example: a while back LLMs would trip over questions like "how many Rs are in strawberry". Now, the system prompts have a line like "when a user asks for a count, actually count the value by calling a tool if needed". The LLMs cannot get smarter in this regard, next token predictors will hallucinate here.
A harness is that covering every blind spot or sub-optimal but probable output people have hit in the wild, and a lot of problems just have better solutions if you say "break problem A into subproblem B and subproblem C, then solve".
Simple example: a while back LLMs would trip over questions like "how many Rs are in strawberry". Now, the system prompts have a line like "when a user asks for a count, actually count the value by calling a tool if needed". The LLMs cannot get smarter in this regard, next token predictors will hallucinate here.
A harness is that covering every blind spot or sub-optimal but probable output people have hit in the wild, and a lot of problems just have better solutions if you say "break problem A into subproblem B and subproblem C, then solve".