logoalt Hacker News

01100011yesterday at 8:04 AM11 repliesview on HN

People are sticking up for LLMs here and that's cool.

I wonder, what if you did the opposite? Take a project of moderate complexity and convert it from code back to natural language using your favorite LLM. Does it provide you with a reasonable description of the behavior and requirements encoded in the source code without losing enough detail to recreate the program? Do you find the resulting natural language description is easier to reason about?

I think there's a reason most of the vibe-coded applications we see people demonstrate are rather simple. There is a level of complexity and precision that is hard to manage. Sure, you can define it in plain english, but is the resulting description extensible, understandable, or more descriptive than a precise language? I think there is a reason why legalese is not plain English, and it goes beyond mere gatekeeping.


Replies

drpixieyesterday at 10:37 AM

> Do you find the resulting natural language description is easier to reason about?

An example from an different field - aviation weather forecasts and notices are published in a strongly abbreviated and codified form. For example, the weather at Sydney Australia now is:

  METAR YSSY 031000Z 08005KT CAVOK 22/13 Q1012 RMK RF00.0/000.0
It's almost universal that new pilots ask "why isn't this in words?". And, indeed, most flight planning apps will convert the code to prose.

But professional pilots (and ATC, etc) universally prefer the coded format. Is is compact (one line instead of a whole paragraph), the format well defined (I know exactly where to look for the one piece I need), and it's unambiguous and well defined.

Same for maths and coding - once you reach a certain level of expertise, the complexity and redundancy of natural language is a greater cost than benefit. This seems to apply to all fields of expertise.

show 9 replies
fluidcruftyesterday at 11:25 AM

I'm not so sure it's about precision rather than working memory. My presumption is people struggle to understand sufficiently large prose versions for the same reason a LLM would struggle working with larger prose versions: people have limited working memory. The time needed to reload info from prose is significant. People reading large text works will start highlighting and taking notes and inventing shorthand forms in their notes. Compact forms and abstractions help reduce demands for working memory and information search. So I'm not sure it's about language precision.

show 2 replies
eightysixfouryesterday at 2:52 PM

Language can carry tremendous amounts of context. For example:

> I want a modern navigation app for driving which lets me select intersections that I never want to be routed through.

That sentence is low complexity but encodes a massive amount of information. You are probably thinking of a million implementation details that you need to get from that sentence to an actual working app but the opportunity is there, the possibility is there, that that is enough information to get to a working application that solves my need.

And just as importantly, if that is enough to get it built, then “can I get that in cornflower blue instead” is easy and the user can iterate from there.

show 1 reply
Affricyesterday at 8:26 AM

Sure but we build (leaky) abstractions, and this is even happens in legal texts.

Asking an llm to build a graphical app in assembly from an ISA and a driver for the display would give you nothing.

But with a mountain of abstractions then it can probably do it.

This is not to defend an LLM more to say I think that by providing the right abstractions (reusable components) then I do think it will get you a lot closer.

show 2 replies
jimmyddddyesterday at 2:13 PM

--I think there is a reason why legalese is not plain English

This is true. Part of the precision of legalese is that the meanings of some terms have already been more precisely defined by the courts.

show 2 replies
1vuio0pswjnm7yesterday at 6:08 PM

"Sure, you can define it in plain english, but is the resulting description extensible, understandable, or more descriptive than a precise language? I think there is a reason why legalese is not plain English, and it goes beyond mere gatekeeping."

Is this suggesting the reason for legalese is to make documents more "extensible, understable or descriptive" than if written in plain English.

What is this reason that the parent thinks legalese is used that "goes beyond gatekeeping".

Plain English can be every bit as precise as legalese.

It is also unclear that legalese exists for the purpose of gatekeeping. For example, it may be an artifact that survives based on familiarity and laziness.

Law students are taught to write in plain English.

https://www.law.columbia.edu/sites/default/files/2021-07/pla...

In some situations, e.g., drafting SEC filings, use of plain English is required by law.

https://www.law.cornell.edu/cfr/text/17/240.13a-20

show 1 reply
jsightyesterday at 9:02 PM

I've thought about this quite a bit. I think a tool like that would be really useful. I can imagine asking questions like "I think this big codebase exposes a rest interface for receiving some sort of credit check object. Can you find it and show me a sequence diagram for how it is implemented?"

The challenge is that the codebase is likely much larger than what would fit into a single codebase. IMO, the LLM really needs to be taught to consume the project incrementally and build up a sort of "mental model" of it to really make this useful. I suspect that a combination of tool usage and RL could produce an incredibly useful tool for this.

soulofmischiefyesterday at 10:41 AM

What you're describing is decontextualization. A sufficiently powerful transformer would theoretically be able recontextualize a sufficiently descriptive natural language specification. Likewise, the same or an equivalently powerful transformer should be able to fully capture the logic of a complicated program. We just don't have sufficient transformers yet.

I don't see why a complete description of the program's design philosophy as well as complete descriptions of each system and module and interface wouldn't be enough. We already produce code according to project specification and logically fill in the gaps by using context.

show 2 replies
vonneumannstanyesterday at 4:08 PM

I think you can basically make the same argument for programming directly in machine code since programming languages are already abstractions.

nsonhayesterday at 11:39 AM

isn't that just copilot "explain", one of the earliest copilot capabilities. It's definitely helpful to understand new codebases at a high level

> there is a reason why legalese is not plain English, and it goes beyond mere gatekeeping.

unfortunately they're not in any kind of formal language either

show 2 replies
VoodooJuJuyesterday at 4:13 PM

[dead]