Provenance Is the New Version Control

46 points • by gpi • today at 3:26 AM • 62 comments • view on HN

Comments

I don't see how this is an AI-specific issue or an issue at all. We solved it already. It's called software development best practices.

> A diff can show what changed in the artifact, but it cannot explain which requirement demanded the change, which constraint shaped it, or which tradeoff caused one structure to be chosen over another.

That's not true... diffs would be traceable to commits and PRs, which in turn are traceable to the tickets. And then there would be tests. With all that, it would be trivial to understand the whys.

You need both the business requirements and the code. One can't replace the other. If you attempt to describe technical requirements precisely, you'll inevitably end up writing the code, at very least, a pseudocode.

As for regenerating the deleted code out of business requirements alone, that won't work cleanly most of the time. Because there are technical constraints and technical debt.

➕ show 2 replies

gritzko • today at 6:09 AM

LLMs can implement red-black trees with impressive speed, quality and even some level of determinism. Here I buy the argument. Once we take something that is not already on GitHub in a thousand different flavors, it becomes an adventure. Like real adventure.

So what did you say about version contol?

➕ show 1 reply

Atomic_Torrfisk • today at 4:02 PM

Sounds like hot air, Wolfram style. Making an intellectual smart sounding argument out of something that is well simple. Version control is version control , a hammer is a hammer. What style you choose depends on the situation, right now git is king because it works and we all understand it.. enough.

The lossy aspect mentioned in the article just sounds like you forgot to write comments or a README. simple fix

➕ show 1 reply

RHSeeger • today at 5:41 AM

I'm a bit confused by this because a given set of inputs can produce a different output, and different behaviors, each time it is run through the AI.

> By regenerable, I mean: if you delete a component, you can recreate it from stored intent (requirements, constraints, and decisions) with the same behavior and integration guarantees.

That statement just isn't true. And, as such, you need to keep track of the end result... _what_ was generated. The why is also important, but not sufficient.

Also, and unrelated, the "reject whitespace" part bothered me. It's perfectly acceptable to have whitespace in an email address.

➕ show 1 reply

hnlmorg • today at 7:14 AM

Code still matters in the world of LLMs because they’re not deterministic and different LLMs produce different output too. So you cannot pin specification to application output in the way the article implies.

What the author actually wants is ADRs: https://github.com/joelparkerhenderson/architecture-decision...

That’s a way of being able to version control requirements.

➕ show 1 reply

beej71 • today at 3:29 PM

TFA> By regenerable, I mean: if you delete a component, you can recreate it from stored intent (requirements, constraints, and decisions) with the same behavior and integration guarantees.

The only way to do this is with a mathematically precise and unambiguous stored intent, isn't it? And then aren't we just taking source code?

viraptor • today at 5:36 AM

I'm not sure if this actually needs a new system. Git commits have the message, arbitrary trailers, and note objects. If this was of source control is useful, I'm sure it could be prototyped on top of git first.

➕ show 1 reply

rapjr9 • today at 11:59 AM

There may be a more subtle issue here. When the specification is interpreted by an LLM that is different than it being interpreted by a person. From the LLM you get a kind of average of how a lot of people wrote that "kind" of code. From the person you get a specific interpretation of the spec into code that fits the task. Different people can have different interpretations, but that is not the same as the random variations LLM's produce. To get the same kind of fine tuning a person can do while coding (for example realizing the spec needs to change) from an LLM you need a very precise spec to start with, one that includes a lot of assumptions that are not included in current specs, but which are expected from people. I see further complications with getting an LLM to generate code where the spec changes, like say now you want to port the same spec to generate code on new computer architectures. So now specs need architecture dependent specifications? Some backwards compatibility needs to be maintained also, if the LLM regenerates ALL of the code each time, then the testing requirements balloon.

layer8 • today at 2:27 PM

That’s pretty similar to Architecture Decision Records: https://adr.github.io/

➕ show 1 reply

klodolph • today at 5:59 AM

> Once an AI can reliably regenerate an implementation from specification…

I’m sorry but it feels like I got hit in the head when I read this, it’s so bad. For decades, people have been dreaming of making software where you can just write the specification and don’t have to actually get your hands dirty with implementation.

1. AI doesn’t solve that problem.

2. If it did, then the specification would be the code.

Diffs of pure code never really represented decisions and reasoning of humans very well in the first place. We always had human programmers who would check code in that just did stuff without really explaining what the code was supposed to do, what properties it was supposed to have, why the author chose to write it that way, etc.

AI doesn’t change that. It just introduces new systems which can, like humans, write unexplained, shitty code. Your review process is supposed to catch this. You just need more review now, compared to previously.

You capture decisions and specifications in the comments, test cases, documentation, etc. Yeah, it can be a bit messy because your specifications aren’t captured nice and neat as the only thing in your code base. But this is because that futuristic, Star Trek dream of just giving the computer broad, high-level directives is still a dream. The AI does not reliably reimplement specifications, so we check in the output.

The compiler does reliably reimplement functionally identical assembly, so that’s why we don’t check in the assembly output of compilers. Compilers are getting higher and higher level, and we’re getting a broader range of compiler tools to work with, but AI are just a different category of tool and we work with them differently.

➕ show 1 reply

michalsustr • today at 7:53 AM

What I think the author is hoping to get is some inspectable graph of the whys that can be a basis for further automation/analysis. That’s interesting, but the line to actual code then becomes blurry. For instance, what about self-consistency across time? If this would be just text, it would come out of sync (like all doc text does). If it's code, then maybe you just had wrong abstractions the whole time?

The way we solve the why/what separation (at minfx.ai) is by having a top-level PLAN.md document for why the commit was built, as well as regenerating README.md files on the paths to every touched file in the commit. Admittedly, this still leans more into the "what" rather than "why". I will need to think about this more, hmm.

This helps us to keep it well-documented and LLM-token efficient at the same time. What also helps is Rust forces you into a reasonable code structure with its pub/private modules, so things are naturally more encapsulated, which helps the documentation as well.

alphabetag675 • today at 6:53 AM

If you could regenerate some code from another code in a deterministic manner, then congrats you have developed a compiler and a high-level language.

rtpg • today at 7:38 AM

While in some sense it's interesting to store the prompts people might use, I feel like that might only accentuate the "try to tweak prompts over and over to pray for the result you want"-style workflows that I am seeing so many people around me work in.

People need to remember how good it feels to do precise work when the time comes!

jayd16 • today at 5:27 AM

What if I told you a specification can also be measured (and source controlled) in lines?

➕ show 1 reply

forty • today at 7:38 AM

If your git history gives you the "what" and not the "why", you are doing it wrong. We can already see what is done in the commit diff. We can only guess why you did it if you don't explain in the message.

➕ show 1 reply

elzbardico • today at 7:21 AM

I am exhausted of this ThoughtWorks style of writing. I can smell it from a mile away.

pu_pe • today at 9:23 AM

So the concept is that requirements and rationale will be more permanent and important than code, because code can be regenerated very cheaply?

I think commenters here identified many of the issues we would face with it today, but thinking of a future where LLMs are indeed writing virtually all code and very fast, ideas like these are interesting. Our current tooling (version control, testing, etc.) will certainly need to adapt if this future comes to pass.

mmoustafa • today at 6:59 AM

I wrote an article on this exact issue (albeit more simpleminded) and I suggested a rudimentary way of tracking provenance in today's agents with "reasoning traces" on the objects they modify.

Would love people's thoughts on this: https://0xmmo.notion.site/Preventing-agent-doom-loops-with-p...

➕ show 2 replies

ricksunny • today at 7:09 AM

“ the code itself becomes an artifact of synthesis, not the locus of intent.”

would not be unfamiliar to mechanical engineers who work with CAD. The ‘Histories’ (successive line-by-line drawing operations - align to spline of such-and-such dimensions, put a bevel here, put a hole there) in many CAD tools are known to be a reflection of design intent moreso than the final 3D model that the operations ultimately produce.

➕ show 1 reply

Animats • today at 7:43 AM

This is going to be hard to fix.

If you use an LLM and agents to regenerate code, a minor change in the "specification" may result in huge changes to the code. Even if it's just due to forcing regeneration. OK, got that.

But there may be no "specification", just an ongoing discussion with an agentic system. "We don't write code any more, we just yell at the agents." Even if the entire sequence of events has been captured, it might not be very useful. It's like having a transcript of a design meeting.

There's a real question as to what the static reference of the design should be. Or what it should look like. This is going to be difficult.

materialpoint • today at 7:48 AM

Who's gonna tell the author that Git doesn't do diffs, but snapshots?

Deltas are just an implementation detail, and thinking of Git as diffing is specifically shunned in introductions to Git versioning.

➕ show 1 reply

PeterStuer • today at 7:54 AM

This reads very academic with not much real world alignment.

akoboldfrying • today at 5:56 AM

Yes, in theory you can represent every development state as a node in a DAG labelled with "natural language instructions" to be appended to the LLM context, hash each of the nodes, and have each node additionally point to an (also hashed) filesystem state that represents the outcome of running an agent with those instructions on the (outcome code + LLM context)s of all its parents (combined in some unambiguous way for nodes with multiple in-edges).

The only practical obstacle is:

> Non-deterministic generators may produce different code from identical intent graphs.

This would not be an obstacle if you restrict to using a single version of a local LLM, turn off all nondeterminism and record the initial seed. But for now, the kinds of frontier LLMs that are useful as coding agents run on Someone Else's box, meaning they can produce different outcomes each time you run them -- and even if they promise not to change them, I can see no way to verify this promise.

➕ show 1 reply

d--b • today at 7:34 AM

I found it quite insightful.

Looking at individual line changes produced by AI is definitely difficult. And going one step higher to version control makes sense.

We're not really there yet though, as the generated code currently still needs a lot of human checks.

Side thoughts: this requires the code to be modularized really well. It makes me think that when designing a system, you could imagine a world where multiple agents discuss changes. Each agent would be responsible for a sub system (component, service, module, function), and they would chat about the format of the api that works best for all agents, etc. It would be like SmallTalk at the agent level.

atoav • today at 6:57 AM

So what they want is to essentially write a spec with business rules and implementation details ans such, and version control that instead of the actual source code?

Not sure what stops you from doing that just right now.

sebaschi • today at 7:36 AM

This style of writing is insufferable (to me). The idea is also not as deep is it may seem based on the language used. I also don’t think it’s strictly valid, i.e. that version control somehow needs to be adjusted to AI.

hekkle • today at 5:40 AM

TL;DR, the author claims that you should record the reasons for change, rather than the code changes themselves...

CONGRATULATIONS: you have just 'invented' documentation, specifically a CHANGE_LOG.

➕ show 2 replies

alt Hacker News

Provenance Is the New Version Control

Comments