logoalt Hacker News

benjiroyesterday at 5:45 PM1 replyview on HN

> AI is pretty bad at Python and Go as well.

It great in Golang IF its one shot tasks. LLMs seem to degrade a lot when they are forced to work on existing code bases (even their own). What seems to be more a issue with context sizes growing out of control way too fast (and this is what degrades LLMs the most).

So far Opus 4.5 has been the one LLM that keeps mostly coding in a, how to say, predictable way even with a existing code base. It requires scaffolding and being very clear with your coding requests. But not like the older models where they go off script way too much or rewrite code in their own style.

For me Opus 4.5 has reached that sweet spot of productivity and not just playing around with LLMs and undoing mistakes.

The problem with LLMs is a lot of times a mix of LLM issues, people giving different requests, context overload, different models doing better with different languages, the amount of data it needs to alter etc... This makes the results very mixed from one person to another, and harder to quantify.

Even the different in a task makes the difference between a person one day glorifying a LLM and a few weeks later complaining it was nerfed, when it was not. Just people doing different work / different prompts and ...


Replies

OhSoHumbleyesterday at 11:20 PM

> So far Opus 4.5 has been the one LLM that keeps mostly coding in a, how to say, predictable way even with a existing code base.

I find this to be true only if you have very explicit rules in CLAUDE.md and even then it still messes up.

I have "you will use the shared code <here>" twice in my CLAUDE.md as it will constantly write duplicate code.

Something that is also annoying is that if it moves some code somewhere with the intent to slightly modify it I've seen it delete the code, then implement from scratch, and then modify it to what it has been specified to do. This completely breaks tests. I then have to say "look at this earlier commit - you've caused a complete regression."

show 1 reply