> I also don’t think Claude Code is the right harness for this deep and broad scanning work. We find that it struggles to maintain clarity when considering many different bugs at the same time.
The skill we do splits it into multiple passes to divide and conquer on task dimension and on files for that exact reason. Likewise, it loops (Ralph-ish) until it converges. It maintains a task queue and work log to stay on track. We are growing it over time, but now more about per-repo customization, while the bones are good cross-repo.
I would only trust frontier-grade harnesses to do this kind of skill run, and guilty-until-proven-innocent various harnesses x prompt combos because of that.
My point isn't that our 1 page skill eliminates the need for your startup, but that is a normal flow for more serious ai-augmented coders so you are picking a blatantly known-bad starting point for serious coders. That makes it unclear what value your tool brings and calls into question why you are refusing to measure yourself in a post about measurement.