> By using React, you embrace building applications with a pattern of reactivity and modularity, which people now accept to be a standard requirement, but this was not always obvious to early web developers.
This is quite a whopper. For one thing, the web started off reactive. It did take a while for a lot of people to figure out how to bring that to client-side rendering in a reasonably decent way (though, I'm sorry, IMO that doesn't actually include react). Second, "modularity" has been a thing for quite some time before the web existed. (If you want to get down to it, separating and organizing and your processes in information systems predate computers.)
There’s both “no multi-agent system” and “multi-agent system,” depending on how you look at it. In reality, you’re always hitting the same /chat/completion API, itself has no awareness of any agents. Any notion of an agent comes purely from the context and instructions you provide.
Separating agents, has a clear advantage. For example, suppose you have a coding agent with a set of rules for safely editing code. Then you also have a code search task, which requires a completely different set of rules. If you try to combine 50 rules for code editing with 50 rules for code searching, the AI can easily get confused.
It’s much more effective to delegate the search task to a search agent and the coding task to a code agent. Think of it this way: when you need to switch how you approach a problem, it helps to switch to a different “agent”, a different mindset with rules tailored for that specific task.
Do i need to think differently about this problem? if yes, you need a different agent!
So yes, conceptually, using separate agents for separate tasks is the better approach.
We're in the context engineering stone age. You the engineer shouldn't be trying to curate context, you should be building context optimization/curation engines. You shouldn't be passing agents context like messages, they should share a single knowledge store with the parent, and the context optimizer should just optimally pack their context for the task description.
In the context compression approach, why aren't the agents labeled as subagents instead? The compressed context is basically a "subtask".
This is my main issue with all these agentic frameworks - they always conviniently forget that there is nothing "individual" about the thing they label "an agent" and draw a box around.
Such "on demand" agents, spawned directly from previos LLM output, are never in any way substantially different from dynamic context compression/filtering.
I think the only sensible framework is to think in terms of tools, with clear interfaces, and a single "agent" (single linear interaction chain) using those tools towards a goal. Such tools could be LLM-based or not. Forcing a distinction between a "function tool" and an "agent that does somethng" doesn't make sense.
Is it concerning to anyone else that the "Simple & Reliable" and "Reliable on Longer Tasks" diagrams look kind of like the much maligned waterfall design process?
> As of June 2025, Claude Code is an example of an agent that spawns subtasks. However, it never does work in parallel with the subtask agent, and the subtask agent is usually only tasked with answering a question, not writing any code.
Has this changed since June? Because I’ve been experimenting over the last month with Claude Cide subagents that work in parallel and agents which write code (doing both simultaneously is inadvisable for obvious reasons, at least without workspace separation).
The 'dilution effect' is real - even with plenty of context space left, agents start losing track of their original goals around the 50k token mark. It's not just about fitting information in, it's about maintaining coherent reasoning chains. Single agents with good prompt engineering often outperform elaborate multi-agent orchestrations.
It's not about ""do not build multi-agents" it's about "do not build paralleled multi-agents"
This is very similar to the conclusion I have been coming to over the past 6 months. Agents are like really unreliable employees, that you have to supervise, and correct so often that its a waste of time to delegate to them. The approach I'm trying to develop for myself is much more human centric. For now I just directly supervise all actions done by an AI, but I would like to move to something like this: https://github.com/langchain-ai/agent-inbox where I as the human am the conductor of work agents do, then check in with me for further instructions or correction.
Whom to believe? Devin or Claude? - https://www.anthropic.com/engineering/multi-agent-research-s...
Restricting output from a subagent (not allowing arbitrary strings anywhere in the output) seems like a way to minimize the risks of prompt injection attacks, though? Sometimes you only need to get a boolean or an enum back from a query.
I think the example of “two agents make a bird and background with two different visual styles, even given original context” would be better represented as “two agents make a bird and background with very different physics behaviors”.
Because I don’t think the AI particularly cares what it looks like, and visual differences seem reconcilable. Whereas there are harder things to combine in the land of programming. Sync/async, message passing, etc.
This resonates heavily with our experience. We ended up using one agent + actively managed context, with the smartness baked into how we manage that context for that one agent, rather than attempting to manage expectations/context across a team of agents.
Anecdotally Devin has been one of the worst coding agents I tried, to the point where I didn’t even bother asking for my unused credits to be refunded. That was 2 months ago, so things may have changed.
"I designed a bad system so all system of these class must be bad"
They really handing out ai domain to anyone these days.
What software are you using to create these beautiful diagrams in the article?
> Principle 1: Share context, and share full agent traces, not just individual messages
I was playing around with this task: give a prompt to a low-end model, get the response, and then get the higher-end model to evaluate the quality of the response.
And one thing I've noticed, is while sometimes the higher-end model detects when the low-end model is misinterpreting the prompt (e.g. it blatantly didn't understand some aspect of it and just hallucinated), it still often allows itself be controlled by the low-end model's framing... e.g. if the low-end model takes a negative attitude to an ambiguous text, the high-end model will propose moderating the negativity... but the thing it doesn't realise, is if given the prompt without the low-end model's response, it might not have adopted that negative attitude at all.
So one idea I had... a tool which enables the LLM to get its own "first impression" of a text... so it can give itself the prompt, and see how it would react to it without the framing of the other model's response, and then use that as additional input into its evaluation...
So this is an important point this post doesn't seem to understand – sometimes less is more, sometimes leaving stuff out of the context is more useful than putting it in
> It turns out subagent 1 actually mistook your subtask and started building a background that looks like Super Mario Bros. Subagent 2 built you a bird, but it doesn’t look like a game asset and it moves nothing like the one in Flappy Bird. Now the final agent is left with the undesirable task of combining these two miscommunications
It seems to me there is another way to handle this... allow the final agent to go back to the subagent and say "hey, you did the wrong thing, this is what you did wrong, please try again"... maybe with a few iterations it will get it right... at some point, you need to limit the iterations to stop an endless loop, and either the final agent does what it can with a flawed response, or escalate to a human for manual intervention (even the human intervention can be a long-running tool...)
How is this fundamentally any different than Erlang/Elixir concepts of supervisors controlling their child processes? It seems like the AI industry keeps re-discovering several basic techniques that have been around since the 80s.
I'm not surprised—most AI "engineers" are not really good software engineers; they're often "vibe engineers" who don't read academic papers on the subject and keep re-inventing the wheel.
If someone asked me why I think there's an AI bubble, I'd point exactly to this situation.
"It is now 2025 and React (and its descendants) dominates the way developers build sites and apps." Is there any research which tells react is dominating or most of the internet is not vanilla HTML but react?
Don't listen to anyone who tells you how to build an agent. This stuff has never existed before in the history of the world, and literally everyone is just figuring it out as we go. Work from the simplest basic building blocks possible and do what works for your use case. Eventually things will be figured out, and you can worry about "best practices" then. But it's all just conjecture right now.
If you don't use the phrase "structured generation" or "constrained generation" in your discussion of how to build Agents, you're doing it wrong.
Don't hide your content from people using NoScript, how about that for starters...
And oh great, another Peter Thiel company booted to the top of HN, really?
> "Cognition AI, Inc. (also known as Cognition Labs), doing business as Cognition, is an artificial intelligence (AI) company headquartered in San Francisco in the US State of California. The company developed Devin AI, an AI software developer...Originally, the company was focused on cryptocurrency, before moving to AI as it became a trend in Silicon Valley following the release of ChatGPT... With regards to fundraising, the company was backed by Peter Thiel's Founders Fund which provided $21 million of funding to it in early 2024, valuing the company at $350 million.[2] In April 2024, Founders Fund led a $175 million investment into Cognition valuing the company at $2 billion making it a Unicorn."
The bubble's gonna pop, and you'll have so much egg on your face. This stuff is just compilers with extra compute and who got rich off compilers? VC people...
Why is this article on top of HN? this is nothing breaking/new/interesting/astonishing..
The principles are super basic to get the first time you build an agent.
The real problem is to get reliability. If you have reliability and clear defined input and output you can easily go parallel.
THis seems like a bad 5th class homework
I'm building a simple agent accessible over SMS for a family member. One of their use cases is finding recipes. A problem I ran into was that doing a web search for recipes would pull tons of web pages into the context, effectively clobbering the system prompt that told the agent to format responses in a manner suited for SMS. I solved this by creating a recipe tool that uses a sub-agent to do the web search and return the most promising recipe to the main agent. When the main agent uses this tool instead of performing the web search itself, it is successfully able to follow the system prompt's directions to format and trim the recipe for SMS. Using this sub-agent to prevent information from entering the context dramatically improved the quality of responses. More context is not always better!
I bring this up because this article discusses context management mostly in terms of context windows having a maximum size. I think that context management is far more than that. I'm still new to this building agents thing, but my experience suggests that context problems start cropping up well before the context window fills up.