Show HN: Agent framework that generates its own topology and evolves at runtime

91 points • by vincentjiang • yesterday at 7:39 PM • 29 comments • view on HN

Hi HN,

I’m Vincent from Aden. We spent 4 years building ERP automation for construction (PO/invoice reconciliation). We had real enterprise customers but hit a technical wall: Chatbots aren't for real work. Accountants don't want to chat; they want the ledger reconciled while they sleep. They want services, not tools.

Existing agent frameworks (LangChain, AutoGPT) failed in production - brittle, looping, and unable to handle messy data. General Computer Use (GCU) frameworks were even worse. My reflections:

1. The "Toy App" Ceiling & GCU Trap Most frameworks assume synchronous sessions. If the tab closes, state is lost. You can't fit 2 weeks of asynchronous business state into an ephemeral chat session.

The GCU hype (agents "looking" at screens) is skeuomorphic. It’s slow (screenshots), expensive (tokens), and fragile (UI changes = crash). It mimics human constraints rather than leveraging machine speed. Real automation should be headless.

2. Inversion of Control: OODA > DAGs Traditional DAGs are deterministic; if a step fails, the program crashes. In the AI era, the Goal is the law, not the Code. We use an OODA loop to manage stochastic behavior:

- Observe: Exceptions are observations (FileNotFound = new state), not crashes.

- Orient: Adjust strategy based on Memory and - Traits.

- Decide: Generate new code at runtime.

- Act: Execute.

The topology shouldn't be hardcoded; it should emerge from the task's entropy.

3. Reliability: The "Synthetic" SLA You can't guarantee one inference ($k=1$) is correct, but you can guarantee a System of Inference ($k=n$) converges on correctness. Reliability is now a function of compute budget. By wrapping an 80% accurate model in a "Best-of-3" verification loop, we mathematically force the error rate down—trading Latency/Tokens for Certainty.

4. Biology & Psychology in Code "Hard Logic" can't solve "Soft Problems." We map cognition to architectural primitives: Homeostasis: Solving "Perseveration" (infinite loops) via a "Stress" metric. If an action fails 3x, "neuroplasticity" drops, forcing a strategy shift. Traits: Personality as a constraint. "High Conscientiousness" increases verification; "High Risk" executes DROP TABLE without asking.

For the industry, we need engineers interested in the intersection of biology, psychology, and distributed systems to help us move beyond brittle scripts. It'd be great to have you roasting my codes and sharing feedback.

Repo: https://github.com/adenhq/hive

Comments

kkukshtel • today at 5:25 AM

The comments on this post that congratulate/engage with OP all seem to be from hn accounts created in the past three months that have only ever commented on this post, so it seems like there is some astro-turfing going on here.

➕ show 2 replies

Gagan_Dev • today at 7:08 AM

Interesting direction. I agree that most agent frameworks hit a “toy app ceiling” because they conflate conversational state with long-lived system state. Once you move into real business workflows (ERP, reconciliation, async pipelines), the problem stops being prompt orchestration and becomes distributed state management under uncertainty.

The OODA framing is compelling, especially treating exceptions as observations rather than terminal states. That said, I’m curious how you’re handling:

1.State persistence across long-running tasks — is memory append-only, event-sourced, or periodically compacted?

2.Convergence guarantees in your “system of inference” model — how do you prevent correlated failure across k runs?

3.Cost ceilings — at what point does reliability-through-redundancy become economically infeasible compared to hybrid symbolic validation?

I also like the rejection of GCU-style UI automation. Headless, API-first execution seems structurally superior for reliability and latency.

The biology-inspired control mechanisms (stress / neuroplasticity analogs) are intriguing — especially if they’re implemented as adaptive search constraints rather than metaphorical wrappers. Would be interested to understand how measurable those dynamics are versus heuristic.

Overall, pushing agents toward durable, autonomous services instead of chat wrappers is the right direction. Curious to see how Hive handles multi-agent coordination and resource contention at scale.

Emar7 • today at 9:01 AM

Contributed the BigQuery MCP tool (PR #3350) - lets agents query data warehouses with read-only SQL, cost tracking, and safety guardrails. Also just submitted a fix for runtime storage path validation (#4466).

The OODA framing resonates - treating exceptions as observations rather than crashes is exactly how the self-healing should work. The stress/neuroplasticity concept for preventing infinite loops is clever.

One thing I'd love to see explored more: structured audit logging for credential access. With enterprise sources (Vault/AWS/Azure) on the roadmap, compliance tracking becomes essential.

JBheemeswar • today at 7:58 AM

I’ve been exploring Hive recently and what stands out is the move from prompt orchestration to persistent, stateful execution. For real ERP-style workflows, that shift makes sense.

Treating exceptions as observations instead of terminal failures is a strong architectural reframing. It turns brittleness into a feedback signal rather than a crash condition.

A few production questions come to mind:

1) In the k-of-n inference model, how do you prevent correlated failure? If runs share similar prompts and priors, independence may be weaker than expected.

2) How is memory managed over long-lived tasks? Is it append-only, periodically compacted, or pruned strategically? State entropy can grow quickly in ERP contexts.

3) How do you bound reflection loops to prevent runaway cost? Are there hard ceilings or confidence-based stopping criteria?

I strongly agree with the rejection of UI-bound GCU approaches. Headless, API-first automation feels structurally more reliable.

The real test, in my view, is whether stochastic autonomy can be wrapped in deterministic guardrails — especially under strict cost and latency constraints.

Curious to see how Hive evolves as these trade-offs become more formalized.

AIorNot • today at 9:34 AM

WTH is this? Why is this even allowed on HN

This company is a fraud - please Remove this scam company hype from HN

Their “AI agent” website is just LLM slop and marketing hype!

They tried to hire folks in India to hype their repo and do fraudulent growth for some apparently crapped ai “agent” platform https://www.reddit.com/r/developersIndia/s/a1fQC5j0FM

https://news.ycombinator.com/item?id=46764091

CuriouslyC • today at 12:57 AM

Failures of workflows signal assumption violations that ultimately should percolate up to humans. Also, static dags are more amenable to human understanding than dynamic task decomposition. Robustness in production is good though, if you can bound agent behavior.

Best of 3 (or more) tournaments are a good strategy. You can also use them for RL via GRPO if you're running an open weight model.

➕ show 1 reply

zerebos • today at 7:16 AM

Oh hey aren't you the folks that grabbed all the stargazers of an open source project and their emails and sent out unsolicited ads?

vincentjiang • yesterday at 7:43 PM

To expand on the "Self-Healing" architecture mentioned in point #2:

The hardest mental shift for us was treating Exceptions as Observations. In a standard Python script, a FileNotFoundError is a crash. In Hive, we catch that stack trace, serialize it, and feed it back into the Context Window as a new prompt: "I tried to read the file and failed with this error. Why? And what is the alternative?"

The agent then enters a Reflection Step (e.g., "I might be in the wrong directory, let me run ls first"), generates new code, and retries.

We found this loop alone solved about 70% of the "brittleness" issues we faced in our ERP production environment. The trade-off, of course, is latency and token cost.

I'm curious how others are handling non-deterministic failures in long-running agent pipelines? Are you using simple retries, voting ensembles, or human-in-the-loop?

It'd be great to hear your thoughts.

avoidaccess • today at 7:28 AM

This looks so cool and more noncoder friendly over hardcoded workflows, that's the exactly what most builders need

mubarakar95 • today at 5:02 AM

It forces you to write code that is "strategy-aware" rather than just "procedural." It’s a massive shift from standard DAGs where one failure kills the whole run. Really interesting to see how the community reacts to this "stochastic" approach to automation.

israrkhan0 • today at 8:38 AM

I am Frontend Engineer have hands on experience with React, JavaScript, Tailwindcss, HTML, CSS, API Integration

khimaros • today at 5:47 AM

i have been working on something similar, trying to build the leanest agent loop that can be self modifying. ended up building it as a plugin within OpenCode with the cow pulled out into python hooks that the agent can modify at runtime (with automatic validation of existing behavior). this allows it to create new tools for itself, customize it's system prompt preambles, and of course manage its own traits. also contains a heartbeat hook. it all runs in an incus VM for isolation and provides a webui and attachable TUI thanks to OpenCode.

mhitza • today at 12:33 AM

3. What, or who, is the judge of correctness (accuracy); regardless of the many solutions run in parallel. If I optimize for max accuracy how close can I get to 100% matemathically and how much would that cost?

➕ show 2 replies

Multicomp • yesterday at 11:13 PM

I am of course unqualified to provide useful commentary on it, but I find this concept to be new and interesting, so I will be watching this page carefully.

My use case is less so trying to hook this up to be some sort of business workflow ClawdBot alternative, but rather to see if this can be an eventually consistent engine that lets me update state over various documents across the time dimension.

could I use it to simulate some tabletop characters and their locations over time?

that would perhaps let me remove some bookkeeping how to see where a given NPC would be on a given day after so many days pass between game sessions. Which lets me do game world steps without having to manually do them per character.

➕ show 1 reply

foota • yesterday at 11:14 PM

I was sort of thinking about a similar idea recently. What if you wrote something like a webserver that was given "goals" for a backend, and then told agents what the application was supposed to be and told it to use the backend for meeting them and then generate feedback based on their experience.

Then have an agent collate the feedback, combined with telemetry from the server, and iterate on the code to fix it up.

In theory you could have the backend write itself and design new features based on what agents try to do with it.

I sort of got the idea from a comparison with JITs, you could have stubbed out methods in the server that would do nothing until the "JIT" agent writes the code.

➕ show 3 replies

omhome16 • today at 2:09 AM

Strongly agree on the 'Toy App' ceiling with current DAG-based frameworks. I've been wrestling with LangGraph for similar reasons—once the happy path breaks, the graph essentially halts or loops indefinitely because the error handling is too rigid.

The concept of mapping 'exceptions as observations' rather than failures is the right mental shift for production.

Question on the 'Homeostasis' metric: Does the agent persist this 'stress' state across sessions? i.e., if an agent fails a specific invoice type 5 times on Monday, does it start Tuesday with a higher verification threshold (or 'High Conscientiousness') for that specific task type? Or is it reset per run?

Starred the repo, excited to dig into the OODA implementation.

fwip • today at 3:06 AM

Yet more LLM word vomit. If you can't be bothered to describe your new project in your own words, it's not worth posting about.

Fayek_Quazi • today at 5:54 AM

Hive looks like a promising framework for AI agents. I recently contributed a docs PR and found the onboarding experience improving quickly. Excited to see where this goes.

spankalee • today at 5:50 AM

> The topology shouldn't be hardcoded; it should emerge from the task's entropy

What does this even mean?

➕ show 1 reply

Biswabijaya • yesterday at 11:46 PM

Great work team.

kittbuilds • today at 5:15 AM

[dead]

Agent_Builder • today at 4:48 AM

[dead]

ichistudio • today at 3:58 AM

[dead]

chaojixinren • today at 3:25 AM

[dead]

andrew-saintway • today at 12:32 AM

[flagged]

Sri_Madhav • today at 5:37 AM

[flagged]

woldan • today at 3:53 AM

[flagged]

matchaonmuffins • today at 3:09 AM

[flagged]

abhishekgoyal19 • today at 4:33 AM

[flagged]

nishant_b555 • today at 5:38 AM

[flagged]

mapace22 • today at 3:55 AM

[flagged]

Anujsharma002 • today at 6:28 AM

[flagged]

alt Hacker News

Show HN: Agent framework that generates its own topology and evolves at runtime

Comments