This is kind of where I'm at. I don't think everything is for certain though. I think it...

rudedogg • yesterday at 9:29 PM • 2 replies • view on HN

This is kind of where I'm at.

I don't think everything is for certain though. I think it's 50/50 on whether Anthropic/whoever figures out how to turn them into more than a boilerplate generator.

The imprecision of LLMs is real, and a serious problem. And I think a lot of the engineering improvements (little s-curve gains or whatever) have caused more and more of these. Every step or improvement has some randomness/lossiness attached to it.

Context too small?:

- No worries, we'll compact (information loss)

- No problem, we'll fire off a bunch of agents each with their own little context window and small task to combat this. (You're trusting the coordinator to do this perfectly, and cutting the sub-agent off from the whole picture)

All of this is causing bugs/issues?:

- No worries, we'll have a review agent scan over the changes (They have the same issues though, not the full context, etc.)

Right now I think it's a fair opinion to say LLMs are poison and I don't want them to touch my codebase because they produce more output I can handle, and the mistakes they make are too subtle that I can't reliably catch them.

It's also fair to say that you don't care, and your work allows enough bugs/imprecision that you accept the risks. I do think there's a bit of an experience divide here, where people more experienced have been down the path of a codebase degrading until it's just too much to salvage – so I think that's part of why you see so much pushback. Others have worked in different environments, or projects of smaller scales where they haven't been bit by that before. But it's very easy to get to that place with SOTA LLMs today.

There's also the whole cost component to this. I think I disagree with the author about the value provided today. If costs were 5x what they are now, I think it would be a hard decision for me to decide if they are worth it. For prototypes, yes. But for serious work, where I need things to work right and be reasonably bug free, I don't know if the value works out.

I think everyone is right that we don't have the right architecture, and we're trying to fix layers of slop/imprecision by slapping on more layers of slop. Some of these issues/limitations seem fundamental and I don't know if little gains are going to change things much, but I'm really not sure and don't think I trust anyone working on the problem enough to tell me what the answer is. I guess we'll see in the next 6-12 months.

Replies

simonw • yesterday at 10:10 PM

> I do think there's a bit of an experience divide here, where people more experienced have been down the path of a codebase degrading until it's just too much to salvage – so I think that's part of why you see so much pushback.

When I look back over my career to date there are so many examples of nightmare degraded codebases that I would love to have hit with a bunch of coding agents.

I remember the pain of upgrading a poorly-tested codebase from Python 2 to Python 3 - months of work that only happened because one brave engineer pulled a skunkworks project on it.

One of my favorite things about working with coding agents is that my tolerance for poorly tested, badly structured code has gone way down. I used to have to take on technical debt because I couldn't schedule the time to pay it down. Now I can use agents to eliminate that almost as soon as I spot it.

➕ show 1 reply

akomtu • yesterday at 10:21 PM

LLM is like a chef that cooks amazing meals in no time, but his meals often contain small pieces of broken glass.

alt Hacker News

Replies