logoalt Hacker News

Show HN: OpenSwarm – Multi‑Agent Claude CLI Orchestrator for Linear/GitHub

29 pointsby unoheetoday at 2:19 AM19 commentsview on HN

I built OpenSwarm because I wanted an autonomous “AI dev team” that can actually plug into my real workflow instead of running toy tasks. OpenSwarm orchestrates multiple Claude Code CLI instances as agents to work on real Linear issues. It: • pulls issues from Linear and runs a Worker/Reviewer/Test/Documenter pipeline • uses LanceDB + multilingual-e5 embeddings for long‑term memory and context reuse • builds a simple code knowledge graph for impact analysis • exposes everything through a Discord bot (status, dispatch, scheduling, logs) • can auto‑iterate on existing PRs and monitor long‑running jobs Right now it’s powering my own solo dev workflow (trading infra, LLM tools, other projects). It’s still early, so there are rough edges and a lot of TODOs around safety, scaling, and better task decomposition. I’d love feedback on: • what feels missing for this to be useful to other teams • failure modes you’d be worried about in autonomous code agents • ideas for better memory/knowledge graph use in real‑world repos Repo: https://github.com/Intrect-io/OpenSwarm Happy to answer questions and hear brutal feedback.


Comments

MidasToolstoday at 10:06 AM

Running a solo dev business on top of multi-agent Claude Code workflows (OpenClaw stack) -- the cascading context drift problem is real and the state partitioning approach in this thread is the right instinct.

The failure mode that bit us hardest: agents sharing a single context window where early tool outputs pollute later reasoning. Fixed it by treating each agent turn as append-only -- worker writes output to a structured log, reviewer reads only that log (not the raw conversation history). Isolated drift. Night and day.

The confidence score idea is underutilized. We log tool call outcomes as: {action, result, confidence: 0-3}. The reviewer agent pattern-matches on low-confidence streaks before they compound into something unfixable.

On the multi-model review question from another commenter: different models catch different failure types. Claude catches logical inconsistencies; a smaller/faster model catches format errors and incomplete outputs. Cheap pre-check before the expensive reviewer saves a lot of token burn.

What's your retry strategy when the reviewer blocks -- exponential backoff on the same worker context, or fresh context each retry? We do fresh context after 2 failures.

show 1 reply
das-bikash-devtoday at 10:56 AM

the context isolation approach is smart — cascading drift between agents is a real problem. i run 10 microservices with claude code and solved a similar issue by maintaining curated reference docs that agents read on-demand per task area instead of loading everything. the model escalation on failure (haiku → sonnet) is a nice touch too. do you find the lancedb memory layer actually helps with repeated similar tasks, or is it more useful for the code knowledge graph side?

jamiecodetoday at 7:53 AM

The reviewer/worker pattern gets tricky when they share state. The pattern I've found that works: each agent owns a separate state partition, and they communicate through a shared message queue (even a simple append-only JSONL file works). Worker writes output + confidence score. Reviewer reads, adds a decision record, worker reads that before retrying.

The key thing to get right: make the retry idempotent. If worker retries the same task, it should produce the same side effects as a fresh run, not double them. This is harder than it sounds when agents are calling real APIs or writing files.

How does OpenSwarm handle the case where worker keeps failing reviewer? Is there a max retry count, and if so, what happens to the Linear issue?

show 1 reply
csto12today at 4:21 AM

Is there a new agent orchestrater posted every day? Is this the new JS framework?

show 5 replies
vladgurtoday at 9:52 AM

have you consider having different models(e.g. codex) do the reviews? i wonder if its presents an opportunity to catch more issues than the same model

show 1 reply
mihneadevriestoday at 4:41 AM

the reviewer/worker pipeline is honestly the part I'm most curious about. like how do you handle disagreements between agents, does the reviewer just block and the worker retries, or is there a loop with a hard cutoff?

the failure mode I'd worry about most is cascading context drift, where each agent in the chain slightly misunderstands the task and by the time you get to the test agent it's validating the wrong thing entirely. fwiw I think the LanceDB memory is the right call for this kind of setup, keeping shared context grounded is probably what prevents most of those drift issues.

show 1 reply