logoalt Hacker News

Building an internal agent: Code-driven vs. LLM-driven workflows

51 pointsby pavel_lishinyesterday at 6:34 PM22 commentsview on HN

Comments

WhiteNoiz3today at 1:23 AM

I'm struggling to understand why an LLM even needs to be involved in this at all. Can't you write a script that takes the last 10 slack messages and checks the github status for any URLs and adds an emoji? It could be a script or slack bot and it would work far more reliably and cost nothing in LLM calls. IMO it seems far more efficient to have an LLM write a repeatable workflow once than calling an LLM every time.

show 1 reply
valdair3dtoday at 1:18 AM

The "code vs LLM" framing is a bit misleading - the real question is where to draw the boundary. We've been building agents that interact with web services and the pattern that works is: LLM for understanding intent and handling unexpected states, deterministic code for everything else.

The key insight from production: LLMs excel at the "what should I do next given this unexpected state" decisions, but they're terrible at the mechanical execution. An agent that encounters a CAPTCHA, an OAuth redirect, or an anti-bot challenge needs judgment to adapt. But once it knows what to do, you want deterministic execution.

The evals discussion is critical. We found that unit-test style evals don't capture the real failure modes - agents fail at composition, not individual steps. Testing "does it correctly identify a PR link" misses "does it correctly handle the 47th message in a channel where someone pasted a broken link in a code block". Trajectory-level evals against real edge cases matter more than step-level correctness.

galaxyLogicyesterday at 10:19 PM

What I'm struggling with is, when you ask AI to do something, its answer is always undeterministically different, more or less.

If I start out with a "spec" that tells AI what I want, it can create working software for me. Seems great. But let's say some weeks, or months or even years later I realize I need to change my spec a bit. I would like to give the new spec to the AI and have it produce an improved version of "my" software. But there seems to be no way to then evaluate how (much, where, how) the solution has changed/improved because of the changed/improved spec. Becauze AI's outputs are undeterministic, the new solution might be totally different from the previous one. So AI would not seem to support "iterative development" in this sense does it?

My question then really is, why can't there be an LLM that would always give the exact same output for the exact same input? I could then still explore multiple answers by changing my input incrementally. It just seems to me that a small change in inputs/specs should only produce a small change in outputs. Does any current LLM support this way of working?

show 6 replies
jaynateyesterday at 8:38 PM

It’s sort of difficult to understand why this is even a question - LLM-based / judgment dependent workflows vs script-based / deterministic workflows.

In mapping out the problems that need to be solved with internal workflows, it’s wise to clarify where probabilistic judgments are helpful / required vs. not upfront. If the process is fixed and requires determinism why not just write scripts (code-gen’ed, of course).

show 1 reply
Davidyesterday at 8:20 PM

> We still start all workflows using the LLM, which works for many cases. When we do rewrite, Claude Code can almost always rewrite the prompt into the code workflow in one-shot.

Why always start with an LLM to solve problems? Using an LLM adds a judgment call, and (at least for now) those judgment calls are not reliable. For something like the motivating example in this article of "is this PR approved" it seems straightforward to get the deterministic right answer using the github API without muddying the waters with an LLM.

show 1 reply
Edmondyesterday at 7:58 PM

There is a third option, letting AI write workflow code:

https://youtu.be/zzkSC26fPPE

You get the benefit of AI CodeGen along with the determinism of conventional logic.

dmarwickeyesterday at 8:49 PM

hit this with support ticket filtering. llm kept missing weird edge cases. wrote some janky regex instead, works fine

mayop100yesterday at 8:51 PM

This is the basic idea we built Tasklet.ai on. LLMs are great at problem solving but less great at cost and reliability — but they are great at writing code that is!

So we gave the Tasklet agent a filesystem, shell, code runtime, general purpose triggering system, etc so that it could build the automation system it needed.

retinarosyesterday at 9:31 PM

its just a form of structured output. you still need an env to run the code. secure it. maintain it. upgrade it. its some work. easier to build a rule based workflow for simple stuff like this.