"When the software is being written by agents as much as by humans, the familiar-language argum...

somat • today at 6:44 AM • 5 replies • view on HN

"When the software is being written by agents as much as by humans, the familiar-language argument is the weakest it has ever been - an LLM does not care whether your codebase is Java or Clojure. It cares about the token efficiency of the code, the structural regularity of the data, the stability of the language's semantics across releases."

Isn't familiarity with the language even more the case with a LLM. The language they do best with is the one with the largest corpus in the training set.

Replies

dgb23 • today at 7:24 AM

Familiarity matters to some degree. But there are diminishing returns I think.

Stability, consistency and simplicity are much more important than this notion of familiarity (there's lots of code to train on) as long as the corpus is sufficiently large. Another important one is how clear and accessible libraries, especially standard libraries, are.

Take Zig for example. Very explicit and clear language, easy access to the std lib. For a young language it is consistent in its style. An agent can write reasonable Zig code and debug issues from tests. However, it is still unstable and APIs change, so LLMs get regularly confused.

Languages and ecosystems that are more mature and take stability very seriously, like Go or Clojure, don't have the problem of "LLM hallucinates APIs" nearly as much.

The thing with Clojure is also that it's a very expressive and very dynamic language. You can hook up an agent into the REPL and it can very quickly validate or explore things. With most other languages it needs to change a file (which are multiple, more complex operations), then write an explicit test, then run that test to get the same result as "defn this function and run some invocations".

➕ show 1 reply

ehnto • today at 6:53 AM

And they're very sensitive to new releases, often making it difficult to work with after a major release of a framework for example. Tripping up on minor stuff like new functions, changes in signatures etc.

A stable mature framework then is the best case scenario. New frameworks or rapidly changing frameworks will be difficult, wasting lots of tokens on discovery and corrections.

phoehne • today at 1:59 PM

I spent about two hours last night trying to get a consistent and accurate answer out of Claude regarding a set of graphics APIs. I then went the old fashioned way to find most of the articles outside of a couple of sources were also incorrect API slop. I can't override methods that don't exist and never have existed in an API, but that's what the clankers have latched on to.

Just before that, at work, I found a bug in an AI driven refactor of code. For some reason, both the original refactor and the ai driven autocomplete kept trying to send the wrong parameters to a function. It was determined to get it wrong, even after I manually fixed it. [Edit - I should also mention the AI driven code review agent tried to do the same thing. The clankers are consistent.]

This is why familiar language matters. Because at some point, you'll have bugs that the AI can't fix. And by the way, I use LLM tools at work and have a set of skills that improve my productivity, if not my QoL. But I still need to be able to dive into the language, the build tools, and fix things.

bilekas • today at 8:28 AM

Yes I'd agree from the perspective of the model that one cohesive well established language would be more reliable. The nightmare scenario is an enterprise suite with a Hodge podge mix of every language known to man all mangled together because the frontier model at the time decided Haskel would be the most efficient when compiled for webassembly and some poor intern has to fix a bug that should cost 100x less than rerunning the LLM to patch.

lelanthran • today at 1:09 PM

> The language they do best with is the one with the largest corpus in the training set.

Up to a point, I guess? There must be a point of diminishing returns based on the expressiveness of the language

I mean, a language that has 8 different ways to declare + initialise composite variables needs to have a much larger training corpus than a language that has only 2 or 3 different ways.

The more expressive a language, the more different suitable patterns would be required, which results in a larger corpus being needed.

alt Hacker News

Replies