logoalt Hacker News

roughlytoday at 8:15 PM2 repliesview on HN

One of the things that makes it very difficult to have reasonable conversations about what you can do with LLMs is the effort-to-outcome curve is basically exponential - with almost no effort, you can get 70% of the way there. This looks amazing, and so people (mostly executives) look at this and think, “this changes everything!”

The problem is the remaining 30% - the next 10-20% starts to require things like multi-agent judge setups, external memory, context management, and that gets you to something that’s probably working but you sure shouldn’t ship to production. As to the last 10% - I’ve seen agentic workflows with hundreds of different agents, multiple models, and fantastically complex evaluation frameworks to try to reduce the error rates past the ~10% mark. By a certain point, the amount of infrastructure and LLM calls are running into several hundred dollars per run, and you’re still not getting guaranteed reliable output.

If you know what you’re doing and you know where to fit the LLMs (they’re genuinely the best system we’ve ever devised for interpreting and categorizing unstructured human input), they can be immensely useful, but they sing a siren song of simplicity that will lure you to your doom if you believe it.


Replies

zephyrthenobletoday at 9:31 PM

Yes, it's essentially the Pareto principle [0]. The LLM community has conflated the 80% as difficult complicated work, when it was essentially boilerplate. Allegedly LLMs have saved us from that drudgery, but I personally have found that (without the complicated setups you mention) the 80% done project that gets one shot is in reality more like 50% done because it is built on an unstable foundation, and that final 20% involves a lot of complicated reworking of the code. There's still plenty of value but I think it is less than proponents would want you to believe.

Anecdotally, I have found that even if you type out paragraph after paragraph describing everything you need the agent to take care of, it eventually feels like you could have written a lot of the code yourself with the help of a good IDE by the time you can finally send your prompt off.

- [0] https://en.wikipedia.org/wiki/Pareto_principle

show 1 reply
morkalorktoday at 9:29 PM

Just for getting a frame of reference, how many people were involved over how much time building a workflow with hundreds of agents?

show 1 reply