logoalt Hacker News

tmerrlast Wednesday at 6:42 AM40 repliesview on HN

This is interesting to hear, but I don't understand how this workflow actually works.

I don't need 10 parallel agents making 50-100 PRs a week, I need 1 agent that successfully solves the most important problem.

I don't understand how you can generate requirements quicky enough to have 10 parallel agents chewing away at meaningful work. I don't understand how you can have any meaningful supervising role over 10 things at once given the limits of human working memory.

It's like someone is claiming they unlocked ultimate productivity by washing dishes, in parallel with doing laundry, and cleaning their house.

Likely I am missing something. This is just my gut reaction as someone who has definitely not mastered using agents. Would love to hear from anyone that has a similar workflow where there is high parallelism.


Replies

crystal_revengelast Wednesday at 8:53 AM

My initial response to reading this post was "wow, I think I'd rather just write the code".

I also remain a bit skeptical because, if all of this really worked (and I mean over a long time and scaling to meet a range of business requirements), even if it's not how I personally want to write code, shouldn't we be seeing a ton of 1 person startups?

I see Bay area startups pushing 996 and requiring living in the Bay area because of the importance of working in an office to reduce communication hurdles. But if I can really 10x my current productivity, I can get the power of a seed series startup with even less communication overhead (I could also get by with much less capital). Imagine being able to hire 10 reliable junior-mid engineers who unquestionably followed your instruction and didn't need to sleep. This is what I keep being told we have for $200/month. Forget not needing engineers, why do we need angel investors or even early stage VC? A single smart engineer should be able, if all the claims I'm hearing are true, to easily accomplish in months what used to take years.

But I keep seeing products shipped at the same speed but with a $200 per month per user overhead. Honestly I would love to be wrong on this because that would be incredibly cool. But unfortunately I'm not seeing it yet.

show 22 replies
stingraycharleslast Wednesday at 7:51 AM

I hope self-promotion isn't frowned upon, but I've been spending the past months figuring out a workflow [1] that helps tackle the "more complicated problems" and ensure long-term maintainability of projects when done purely through Claude Code.

Effectively, I try to:

- Do not allow the LLM to make any implicit decisions, but instead confirm with the user.

- Ensure code is written in such a way that it's easy to understand for LLMs;

- Capture all "invisible knowledge" around decisions and architecture that's difficult to infer from code alone.

It's based entirely on Claude Code sub-agents + skills. The skills almost all invoke a Python script that guides the agents through workflows.

It's not a fast workflow: it frequently takes more than 1 hour just for the planning phase. Execution is significantly faster, as (typically) most issues have been discovered during the planning phase already (otherwise it would be considered a bug and I'd improve the workflow based on that).

I'm under the impression that the creator of Claude Code's post is also intended to raise awareness of certain features of Claude Code, such as hand-offs to the cloud and back. Their workflow only works for small features. It reads a bit like someone took a “best practices” guide and turned it into a twitter post. Nice, but not nearly detailed enough for an actual workflow.

[1] https://github.com/solatis/claude-config/

show 3 replies
dansolast Wednesday at 6:56 AM

Yes thank you! I find I get more than enough done (and more than enough code to review) by prompting the agent step by step. I want to see what kind of projects are getting done with multiple async autonomous agents. Was hoping to find youtube videos of someone setting up a project for multiple agents so I could see the cadence of the human stepping in and making directions

thomasfromcdnjslast Wednesday at 8:08 AM

I run 3-5 on distinct projects often. (20x plan) I quite enjoy the context switching and always have. I have a vanilla setup too, and I don't use plugins/skills/commands, sometimes I enable a MCP server for different things and definitely list out cli tools in my claude.md files. I keep a Google doc open where I list out all the projects I'm working on and write notes as I'm jumping thought the Claude tabs, I also start drafting more complex prompts in the Google doc. I've been using turbo repo a lot so I don't have to context switch the architecture in my head. (But projects still using multiple types of DevOps set ups)

Often these days I vibe code a feedback loop for each project, a way to validate itself as OP said. This adds time to how long Claude takes to complete giving me time to switch context for another active project.

I also use light mode which might help others... jks

exitblast Wednesday at 7:31 AM

Multiple instances of agents are an equivalent to tabs in other applications - primarily holders of state, rather than means for extreme parallelism.

show 2 replies
HarHarVeryFunnylast Wednesday at 4:06 PM

I suppose he may have a list of feature requests and bug reports to work on, but it does seem a bit odd from a human perspective to want to work on 5 or more things literally in parallel, unless they are all so simple that there is no cognitive load and context switching required to mentally juggle them.

Washing dishes in parallel with laundry and cleaning is of course easily possible, but precisely because there is no cognitive load involved. When the washing machine stops you can interrupt what you are doing to load clothes into the drier, then go back to cleaning/whatever. Software development for anything non-trivial obviously has a much higher task-switching overhead. Optimal flow for a purely human developer is to "load context" at the beginning of the day, then remain in flow-state without interruptions.

The cynical part of me can't also help but wonder if Cherny/Anthopic aren't just advocating token-maxxing!

show 1 reply
poncolast Wednesday at 7:49 AM

I agree. I'm imagining a large software team with hundreds of tickets "ready to be worked on" might support this workflow - but even then, surely you're going to start running into unnecessary conflicts.

The max Claude instances I've run is 2 because beyond that, I'm - as you say - unable to actually determine the next best course during the processing time. I could spend the entire day planning / designing prompts - and perhaps that will be the most efficient software development practise in the future. And/or perhaps there it is a sign I'm doing insufficient design up front.

Haaargiolast Wednesday at 9:58 AM

I would do the same thing if I would justifing paying 200$ per Month for my hobby. But even with that, you will run into throttling / API / Resource limits.

But AI Agents need time. They need a little bit of reading the sourcecode, proposing the change, making the change, running the verification loop, creating the git commit etc. Can be a minute, can be 10 and potentially a lot longer too.

So if your code base is big enough that you can work o different topics, you just do that:

- Fix this small bug in the UI when xy happens - Add a new field to this form - Cleanup the README with content x - . . .

I'm an architect at work and have done product management on the side as its a very technical project. I have very little problem coming up with things to fix, enhnace, cleanup etc. I have hard limits on my headcount.

I could easily do a handful of things in parallel and keeping that in my head. Working memory might be limited but working memory means something different than following 10 topics. Especially if there are a few tpics inbetween which just take time with the whole feedback loop.

But regarding your example of house cleaning: I have ADHD, i sometimes work like this. Working on something, waiting for a build and cleaning soming in parallel.

What you are missing is the practical experience with Agents. Taking the time and energy of setting up something for you, perhaps accessability too?

We only got access at work to claude code since end of last year.

ex-aws-dudelast Wednesday at 6:51 AM

Yeah I don’t understand these posts recently with people running 10 at once

Can someone give an example of what each of them would be doing?

Are they just really slow, is that the problem?

show 4 replies
victorbjorklundlast Wednesday at 7:12 AM

Depends on the project you are working on. Solo on a web app? You probably have 100s of small things to fix. Some more padding there, add a small new feature here, etc.

raduculast Wednesday at 7:07 AM

> don't need 10 parallel agents making 50-100 PRs a week

I don't like to be mean, but I few weeks ago the guy bragged about Claude helping him do +50k loc and -48k loc(netting a 2k loc), I thought he was joking because I know plenty of programmers who do exactly that without AI, they just commit 10 huge json test files or re-format code.

I almost never open a PR without a thorough cleanup whereas some people seem to love opening huge PRs.

gherkinnnlast Wednesday at 7:04 AM

LLM agents can be a bit like slot machines. The more the merrier.

And at least two generate continuous shitposts for their companies Slack.

That said, having one write code and a clean context review it is helpful.

giancarlostorolast Wednesday at 7:33 AM

I use Beads which makes it more easy to grasp since its "tickets" for the agent, and I tell it what I want, it creates a bead (or "ticket") and then I ask it to do research, brain dump on it, and even ask it to ask me clarifying questions, and it updates the tasks, by the end once I have a few tasks with essentially a well defined prompt, I tell Claude to run x tasks in parallel, sometimes I dump a bunch of different tasks and ask it to research them all in parallel, and it fills them in, and I review. When it's all over, I test the code, look at the code, and mention any follow ups.

I guess it comes down to, how much do you trust the agent? If you don't trust it fully you want to inspect everything, which you still can, but you can choose to do it after it runs wild instead of every second it works.

carefulfungilast Wednesday at 3:11 PM

My impression is that people who are exploring coordinated multi-agent-coding systems are working towards replacing full teams, not augmenting individuals. "Meaningful supervising role" becomes "automated quality and process control"; "generate requirements quickly" -> we already do this for large human software teams.

If that's the goal, then we shouldn't interpret the current experiment as the destination.

headcanonlast Wednesday at 4:10 PM

Potentially, a lot of that isn't just code generation, it *is* requirements gathering, design iteration, analysis, debugging, etc.

I've been using CC for non-programming tasks and its been pretty successful so far, at least for personal projects (bordering on the edge of non-trivial). For instance, I'll get a 'designer' agent coming up with spec, and a 'design-critic' to challenge the design and make the original agent defend their choices. They can ask open questions after each round and I'll provide human feedback. After a few rounds of this, we whittle it down to a decent spec and try it out after handing it off to a coding agent.

Another example from work: I fired off some code analysis to an agent with the goal of creating integration tests, and then ran a set of spec reviewers in parallel to check its work before creating the actual tickets.

My point is there are a lot of steps involved in the whole product development process and isn't just "ship production code". And we can reduce the ambiguity/hallucinations/sycophancy by creating validation/checkpoints (either tests, 'critic' agents to challenge designs/spec, or human QA/validation when appropriate)

The end game of this approach is you have dozens or hundreds of agents running via some kind of orchestrator churning through a backlog that is combination human + AI generated, and the system posts questions to the human user(s) to gather feedback. The human spends most of the time doing high-level design/validation and answering open questions.

You definitely incur some cognitive debt and risk it doing something you don't want, but thats part of the fun for me (assuming it doesn't kill my AI bill).

quijoteunivlast Wednesday at 6:48 AM

This is it! “I don't need 10 parallel agents making 50-100 PRs a week, I need 1 agent that successfully solves the most important problem.”

show 2 replies
babylast Wednesday at 7:05 AM

I usually have 4-5, but it's because they are working on different parts of the codebase, or some I will use as read only to brainstorm

david_shilast Wednesday at 11:07 AM

> It's like someone is claiming they unlocked ultimate productivity by washing dishes, in parallel with doing laundry, and cleaning their house.

In this case you have to take a leap of faith and assume that Claude or Codex will get each task done correctly enough that your house won't burn down.

xnxlast Wednesday at 9:43 PM

Agree. People are stuck applying the "agent" = "employee" analogy and think they are more productive by having a team/company of agents. Unless you've perfectly spec'ed and detailed multiple projects up front, the speed of a single agent shouldn't be the bottleneck.

show 1 reply
CraigJPerrylast Wednesday at 6:56 AM

>> I need 1 agent that successfully solves the most important problem

In most of these kinds of posts, that's still you. I don't believe i've come across a pro-faster-keyboard post yet that claims AGI. Despite the name, LLMs have no agency, it's still all on you.

Once you've defined the next most important problem, you have a smaller problem - translate those requirements into code which accurately meets them. That's the bit where these models can successfully take over. I think of them as a faster keyboard and i've not seen a reason to change my mind yet despite using them heavily.

show 1 reply
CuriouslyClast Wednesday at 2:03 PM

The problem isn't generating requirements, it's validating work. Spec driven development and voice chat with ticket/chat context is pretty fast, but the validation loop is still mostly manual. When I'm building, I can orchestrate multiple swarm no problem, however any time I have to drop in to validate stuff, my throughput drops and I can only drive 1-2 agents at a time.

aforwardslashlast Wednesday at 1:10 PM

It depends on the specifics of the tasks; I routinely work on 3-5 projects at once (sometimes completely different stuff), and having a tool like cloud code fits great in my workflow.

Also, the feedback doesnt have to be immediate: sometimes I have sessions that run over a week, because of casual iterations; In my case its quite common to do this to test concepts, micro-benchmarking and library design.

bob1029last Wednesday at 9:22 AM

If you're trying to solve one very hard problem, parallelism is not the answer. Recursion is.

Recursion can give you an exponential reduction in error as you descend into the call stack. It's not guaranteed in the context of an LLM but there are ways to strongly encourage some contraction in error at each step. As long as you are, on average, working with a slightly smaller version of the problem each time you recurse, you still get exponential scaling.

eaurougelast Wednesday at 10:31 AM

> It's like someone is claiming they unlocked ultimate productivity by washing dishes, in parallel with doing laundry, and cleaning their house.

But we do this routinely with machines. Not saying I don't get your point re 100 PRs a week, just that it's a strange metaphor given the similarities.

port3000last Wednesday at 10:46 AM

50-100 PRs a week but they still can't fix the 'flickering' bug

someguyiguesslast Wednesday at 10:20 PM

I see you haven’t tried BMAD-METHOD or spec-kit yet.

bitfilpedyesterday at 7:10 AM

The only way to achieve that level of parallelism is by not knowing what you are doing or the peoblem space you are working in to begin with and just throwing multiple ill defined queries at agents until something "works". It's sort of a modern infinite monkey theorem if you will.

William_BBlast Wednesday at 10:56 AM

This is just the creator of Claude Code overselling Claude Code

MattGaiserlast Wednesday at 7:09 AM

> I need 1 agent that successfully solves the most important problem.

If you only have that one problem, that is a reasonable criticism, but you may have 10 different problems and want to focus on the important one while the smaller stuff is AIed away.

> I don't understand how you can generate requirements quicky enough to have 10 parallel agents chewing away at meaningful work.

I am generally happy with the assumptions it makes when given few requirements? In a lot of cases I just need a feature and the specifics are fairly open or very obvious given the context.

For example, I am adding MFA options to one project. As I already have MFA for another portal on it, I just told Claude to add MFA options for all users. Single sentence with no details. Result seems perfectly servicable, if in need of some CSS changes.

show 1 reply
stoneforgerlast Wednesday at 6:49 AM

The captive audience is not you, it's people salivating at the train of thought where they can 100x productivity of whatever and push those features that will get paying customers so they can get bought from private equity and ride out on the sunset. This whole thing is existential dread on a global scale, driven by sociopaths and everyone is just unable to not bend over.

show 1 reply
csomarlast Wednesday at 7:03 AM

It's all smokes really. Claude Code is an unreliable piece of software and yet one of the better ones in LLM-Coding. (https://github.com/anthropics/claude-code/issues). That and I highly suspect it's mostly engineers who are working on it instead of LLMs. Google itself with all its resources and engineers can't come up with a half-decent CLI for coding.

Reminder: The guy works for Claude. Claude is over-hyping LLMs. That's like a Jeweler dealer assistant telling you how Gold chains helped his romantic life.

show 1 reply
drbojinglelast Wednesday at 3:06 PM

Prototyping.

danguslast Wednesday at 3:40 PM

Let’s not forget the massive bias in the author: for all we know this post is a thinly veiled marketing pitch for “how to use the most tokens from your AI provider and ramp up your bill.”

This isn’t about being the most productive or having the best workflow, it’s about maximizing how much Claude is a part of your workflow.

gedylast Wednesday at 2:17 PM

> This is interesting to hear, but I don't understand how this workflow actually works

The cynic in me is it's a marketing pitch to sell "see this is way cheaper than 10 devs!". The "agent" thing leans heavily into bean counter CTO/CIO marketing.

AtlasBarfedlast Wednesday at 7:55 AM

Claude is absolutely plastering Facebook with this bullshit.

Every PR Claude makes needs to be reviewed. Every single one. So great! You have 10 instances of Claude doing things. Great! You're still going to need to do 10 reviews.

show 2 replies
BiteCode_devlast Wednesday at 10:20 AM

At the begining of the project, the runs are fast, but as the project gets bigger, the runs are slower:

- there are bigger contexts

- the test suite is much longer and slower

- you need to split worktree, resources (like db, ports) and sometimes containers to work in isolation

So having 10 workers will run for a long time. Which give plenty of time to write good spec.

You need good spec, so the llm produce good tests, so it can write good code to match these tests.

Having a very strong spec + test suite + quality gates (linter, type checkers, etc) is the only way to get good results from an LLM as the project become more complex.

Unlike a human, it's not very good at isolating complexity by itself, nor stopping and asking question in the face of ambiguity. So the guardrails are the only thing that keeps it on track.

And running a lot of guardrail takes time.

E.G: yesterday I had a big migration to do from HTMX to viewjs, I asked the LLM to produce screenshots of each state, and then do the migration in steps in a way that kept the screenshit 90% identical.

This way I knew it would not break the design.

But it's very long to run e2e tests + screenshot comparison every time you do a modification. Still faster than a human, but it gives plenty of time to talk to another llm.

Plus you can assign them very different task:

- One work on adding a new feature

- One improves the design

- One refactor part of the code (it's something you should do regularly, LLM produce tech debt quickly)

- One add more test to your test suite

- One is deploying on a new server

- One is analyzing the logs of your dev/test/prod server and tell you what's up

- One is cooking up a new logo for you and generating x versions at different resolutions.

Etc.

It's basically a small team at your disposal.

vidarhlast Wednesday at 8:26 AM

> I don't understand how you can generate requirements quicky enough to have 10 parallel agents chewing away at meaningful work.

You use agents to expand the requirements as well, either in plan mode (as OP does) or with a custom scaffold (rules in CLAUDE.md about how to handle requirements; personally I prefer giving Claude the latitude to start when Claude is ready rather than wait for my go-ahead)

> I don't understand how you can have any meaningful supervising role over 10 things at once given the limits of human working memory.

[this got long: TL;DR: This is what works for me: Stop worrying about individual steps; use sub-agents and slash-commands to encapsulate units of work to make Claude run longer; use permissions to allow as much as you dare (and/or run in a VM to allow Claude to run longer; give Claude tools to verify its work (linters, test suites, sub-agents double-checking the work against the spec) and make it use it; don't sit and wait and read invidiual parts of the conversation - it will only infuriate you to see Claude make stupid mistakes, but if well scaffolded it will fix them before it returns the code to you, so stop reading, breathe, and let it work; only verify when Claude has worked for a long time and checked its own work -- that way you review far less code and far more complete and coherent changes]

You don't. You wait until each agent is done, and you review the PR's. To make this kind of thing work well you need agents and slash-commands, like OP does - sub-agents in particular help prevent the top-level agents from "context anxiety": Claude Code appears to have knowledge of context use, and will be prone to stopping before context runs out; sub-agents use their own context and the top-level agent only uses context to manage the input to and output from them, so the more is farmed out to sub-agents, the longer Claude Code is willing to run. I when I got up this morning, Claude Code had run all night and produced about 110k words of output.

This also requires extensive permissions to use safe tools without asking (what OP does), or --dangerously-skip-permissions (I usually do this; you might want to put this in a container/VM as it will happily do things like "killall -9 python" or similar without "thinking through" consequences - I've had it kill the terminal it itself ran in before), or it'll stop far too quickly.

You'll also want to explicitly tell it to do things in parallel when possible. E.g. if you want to use it as a "smarter linter" (DO NOT rely on it as the only linter, use a regular one too, but using claude to apply more complex rules that requires some reasoning works great), you can ask it to "run the linter agent in parallel on all typescript files" for example, and it will tend to spawn multiple sub-agents running in parallel, and metaphorically twiddle its thumbs waiting for them to finish (it's fun seeing it get "bored" and decide to do other things in the meantime, or get impatient and check on progress obsessively).

You'll also want to make Claude use sub-agents to review, verify, test its work, with instructions to repeat until all the verification sub-agents give its changes a PASS (see 12/ and 13/ in the thread) - there is no reason for you to waste your time reviewing code that Claude itself can tell isn't ready.

[E.g. concrete example: "Vanilla" Claude "loves" using instance_variable_get() in Ruby if facing a class that is missing an accessor for an instance variable. Whether you know Ruby or not, that should stand out like a sore thumb - it's a horrifically gross code smell, as it's basically bypassing encapsulation entirely. But you shouldn't worry about that - if you write Ruby with Claude, you'd want a rule in CLAUDE.md telling it how to address missing accessors, and sub-agent, and possibly a hook, making sure that Claude is told to fix it immediately if it ever uses it.]

Farming it off to sub-agents both makes it willing to work longer, especially on "boring" tasks, and avoids the problem that it'll look at past work and decide it already "knows" this code is ready and start skipping steps.

The key thing is to stop obsessing over every step Claude takes, and treat that as a developer experimenting with something they're not clear on how to do yet. If you let it work, and its instructions are good, and it has ways of checking its work, it will figure out its first attempts are broken, fix them, and leave you with output that takes far less of your time to review.

When Claude tells you its done with a change, if you stop egregious problems, fix your CLAUDE.md, fix your planning steps, fix your agents.

None of the above will absolve you of reviewing code, and you will need to kick things back and have it fix them, and sometimes that will be tedious, but Claude is good enough that the problems you have it fix should be complex, not simple code smells or logic errors, and 9 out 10 times they should signal that your scaffold is lacking important detail about your project or that your spec is incomplete at a functional/acceptance criteria level (not low level detail)

elmigrantolast Wednesday at 7:38 AM

[flagged]

ukprogrammerlast Wednesday at 8:02 AM

skill issue

same way a lesser engineer might say they cannot do X or Y