logoalt Hacker News

amangsinghtoday at 11:10 AM10 repliesview on HN

A 500k line codebase for an agent CLI proves one thing: making a probabilistic LLM behave deterministically is a massive state-management nightmare. Right now, they're great for prompting simple sites/platforms but they break at large enterprise repos.

If you don't have a rigid, external state machine governing the workflow, you have to brute-force reliability. That codebase bloat is likely 90% defensive programming; frustration regexes, context sanitizers, tool-retry loops, and state rollbacks just to stop the agent from drifting or silently breaking things.

The visual map is great, but from an architectural perspective, we're still herding cats with massive code volume instead of actually governing the agents at the system level.


Replies

ttcbjtoday at 1:20 PM

I find it really strange that there is so much negative commentary on the _code_, but so little commentary on the core architecture.

My takeaway from looking at the tool list is that they got the fundamental architecture right - try to create a very simple and general set of tools on the client-side (e.g. read file, output rich text, etc) so that the server can innovate rapidly without revving the client (and also so that if, say, the source code leaks, none of the secret sauce does).

Overall, when I see this I think they are focused on the right issues, and I think their tool list looks pretty simple/elegant/general. I picture the server team constantly thinking - we have these client-side tools/APIs, how can we use them optimally? How can we get more out of them. That is where the secret sauce lives.

show 4 replies
sunirtoday at 12:04 PM

It’s not surprising. There has been quite a bit of industrial research in how to manage mere apes to be deterministic with huge software control systems, and they are an unruly bunch I assure you.

show 1 reply
nicoburnstoday at 1:05 PM

Kinda depends how much of it is vibe coded. It could easily be 5x larger than it needs to be just because the LLM felt like it if they've not been careful.

show 2 replies
comboytoday at 12:23 PM

It's hard to tell how much it says about difficulty of harnessing vs how much it says about difficulty of maintaining a clean and not bloated codebase when coding with AI.

show 2 replies
whycombagatortoday at 1:22 PM

> Right now, they're great for prompting simple sites/platforms but they break at large enterprise repos

Can you expand on this?

My experience is they require excessive steering but do not “break”

show 1 reply
bogdanoff_2today at 12:11 PM

What do you mean by "actually governing the agents at the system level", and how is it different from "herding cats"?

show 1 reply
ramesh31today at 12:40 PM

>A 500k line codebase for an agent CLI proves one thing: making a probabilistic LLM behave deterministically is a massive state-management nightmare. Right now, they're great for prompting simple sites/platforms but they break at large enterprise repos.

Is that the case? I'm pretty sure Claude Code is one of the most massively successful pieces of software made in the last decade. I don't know how that proves your point. Will this codebase become unmanageable eventually? Maybe, but literally every agent harness out there is just copying their lead at this point.

show 1 reply
p-e-wtoday at 12:04 PM

> A 500k line codebase for an agent CLI proves one thing: making a probabilistic LLM behave deterministically is a massive state-management nightmare.

Considering what the entire system ends up being capable of, 500k lines is about 0.001% of what I would have expected something like that to require 10 years ago.

You can combine that with all the training and inference code, and at the end of the day, a system that literally writes code ends up being smaller than the LibreOffice codebase.

It boggles the mind, really.

show 5 replies
quantumquantaratoday at 1:11 PM

[dead]

dolomotoday at 12:10 PM

[flagged]

show 3 replies