yeah, I saw Claude Code doing lots of grepping/find and was curious if that approach might miss something in the log lines or if loading small portion of interesting log lines into the context could help. I find frequently that just looking at ERROR/WARN lines is not enough since some might not actually be errors and some other skipped log lines might have something to look into.
And I just wanted to try MCP tooling tbh hehe Took me 2 days to create this to be honest
From our experience running this, we're seeing patterns like these:
- Opus agent wakes up when we detect an incident (e.g. CI broke on main)
- It looks at the big picture (e.g. which job broke) and makes a plan to investigate
- It dispatches narrowly focused tasks to Haiku sub agents (e.g. "extract the failing log patterns from commit XXX on job YYY ...")
- Sub agents use the equivalent of "tail", "grep", etc (using SQL) on a very narrow sub-set of logs (as directed by Opus) and return only relevant data (so they can interpret INFO logs as actually being the problem)
- Parent Opus agent correlates between sub agents. Can decide to spawn more sub agents to continue the investigation
It's no different than what I would do as a human, really. If there are terabytes of logs, I'm not going to read all of them: I'll make a plan, open a bunch of tabs and surface interesting bits.