logoalt Hacker News

palatayesterday at 11:45 PM8 repliesview on HN

I am not convinced.

- Natural languages are ambiguous. That's the reason why we created programming languages. So the documentation around the code is generally ambiguous as well. Worse: it's not being executed, so it can get out of date (sometimes in subtle ways).

- LLMs are trained on tons of source code, which is arguably a smaller space than natural languages. My experience is that LLMs are really good at e.g. translating code between two programming languages. But translating my prompts to code is not working as well, because my prompts are in natural languages, and hence ambiguous.

- I wonder if it is a question of "natural languages vs programming languages" or "bad code vs good code". I could totally imagine that documenting bad code helps the LLMs (and the humans) understand the intent, while documenting good code actually adds ambiguity.

What I learned is that we write code for humans to read. Good code is code that clearly expresses the intent. If there is a need to comment the code all over the place, to me it means that the code is maybe not as good as it should be :-).

Of course there is an argument to make that the quality of code is generally getting worse every year, and therefore there is more and more a need for documentation around it because it's getting hard to understand what the hell the author wanted to do.


Replies

pdntspatoday at 6:20 AM

> because my prompts are in natural languages, and hence ambiguous.

Legalese developed specifically because natural language was too ambiguous. A similar level of specificity for prompting works wonders

One of the issues with specifying directions to the computer with code is that you are very narrowly describing how something can be done. But sometimes I don't always know the best 'how', I just know what I know. With natural language prompting the AI can tap into its training knowledge and come up with better ways of doing things. It still needs lots of steering (usually) but a lot of times you can end up with a superior result.

show 1 reply
bottdtoday at 12:00 AM

> If there is a need to comment the code all over the place, to me it means that the code is maybe not as good as it should be :-)

If good code was enough on its own we would read the source instead of documentation. I believe part of good software is good documentation. The prose of literate source is aimed at documentation, not line-level comments about implementation.

show 3 replies
baqtoday at 6:58 AM

Docs and code work together as mutually error correcting codes. You can’t have the benefits of error detection and correction without redundant information.

show 1 reply
hoshyesterday at 11:54 PM

I don’t have my LLMs generate literate programming. I do ask it to talk about tradeoffs.

I have full examples of something that is heavily commented and explained, including links to any schemas or docs. I have gotten good results when I ask an LLM to use that as a template, that not everything in there needs to be used, and it cuts down on hallucinations by quite a bit.

k32ktoday at 1:42 AM

"But translating my prompts to code is not working as well, because my prompts are in natural languages, and hence ambiguous."

Not only that, but there's something very annoying and deeply dissatisfying about typing a bunch of text into a thing for which you have no control over how its producing an output, nor can an output be reproduced even if the input is identical.

Agreed natural language is very ambiguous and becoming more ambiguous by the day "what exactly does 'vibe' mean?".

People spoke in a particular way, say 60 years ago, that left very little room for interpretation of what they meant. The same cannot be said today.

show 1 reply
awesome_dudetoday at 1:29 AM

> Natural languages are ambiguous. That's the reason why we created programming languages. So the documentation around the code is generally ambiguous as well. Worse: it's not being executed, so it can get out of date (sometimes in subtle ways).

I loathe this take.

I have rocked up to codebases where there were specific rules banning comments because of this attitude.

Yes comments can lie, yes there are no guards ensuring they stay in lock step with the code they document, but not having them is a thousand times worse - I can always see WHAT code is doing, that's never the problem, the problems is WHY it was done in this manner.

I put comments like "This code runs in O(n) because there are only a handful of items ever going to be searched - update it when there are enough items to justify an O(log2 n) search"

That tells future developers that the author (me) KNOWS it's not the most efficient code possible, but it IS when you take into account things unknown by the person reading it

Edit: Tribal knowledge is the worst type of knowledge, it's assumed that everyone knows it, and pass it along when new people onboard, but the reality (for me) has always been that the people doing the onboarding have had fragments, or incorrect assumptions on what was being conveyed to them, and just like the childrens game of "telephone" the passing of the knowledge always ends in a disaster

show 2 replies
casey2today at 5:48 AM

Programming languages are natural and ambiguous too, what does READ mean? you have to look it up to see the types. The power comes from the fact that it's audit-able, but that you don't need to audit it every time you want to write some code. You think you write good code? try to prove it after the compiler gets through with it.

Natural languages are richer in ideas, it may be harder to get working code going from a purely natural description to code, than code to code, but you don't gain much from just translating code. One is only limited by your imagination the other already exists, you could just call it as a routine.

You only have a SENSE for good code because it's a natural language with conventions and shared meaning. If the goal of programming is to learn to communicate better as humans then we should be fighting ambiguity not running from it. 100 years from now nobody is going to understand that your conventions were actually "good code".