We ran into similar reliability issues while building GTWY. What surprised us was that most failures...

Agent_Builder • today at 6:48 AM • 1 reply • view on HN

We ran into similar reliability issues while building GTWY. What surprised us was that most failures weren’t about model quality, but about agents being allowed to run too long without clear boundaries.

What helped was treating agents less like “always-on brains” and more like short-lived executors. Each step had an explicit goal, explicit inputs, and a defined end. Once the step finished, the agent stopped and context was rebuilt deliberately.

Harnesses like this feel important because they shift the problem from “make the model smarter” to “make the system more predictable.” In our experience, reliability came more from reducing degrees of freedom than from adding intelligence.

Replies

brap • today at 9:10 AM

This seems to be where it’s at right now, we can’t seem to make the models significantly more intelligent, so we “inject” our own intelligence into the system, in the form of good old fashioned code.

My philosophy is make the LLMs do as little work as possible. Only small, simple steps. Anything that can be reasonably done in code (orchestration, tool calls, etc) should be done in code. Basically any time you find yourself instructing an LLM to follow a certain recipe, just break it down to multiple agents and do what you can with code.

➕ show 1 reply

alt Hacker News

Replies