The most general and unambiguous way to represent a diff is to just include the contents of the two files. It's more data, but that's rarely an issue these days.
So instead of `diff a b | patch c`, where the data through the pipe needs to be in some interchange format, you'd run `apply a b c` and the apply command can use whatever internal representation it likes.
Diffs also aren't great for human reading. A color-coded side-by-side view is better. For which you also want to start with the two files.
There's really no need to ever transmit a diff and deal with all the format vagaries when you can just send the two files.
> It's more data, but that's rarely an issue these days.
I do think it's a bit annoying that a program gets updated, and you have to download the whole 130 gb again.
But especially how quickly and easily you can compress two almost-identical files, I think your approach has a lot going for it. It may even be possible to get clever and send over just a hash of the original file, and a version of the new file which has been compressed with the original file as prefix (but without the actual compressed data for that).
It is hard to manipulate pairs of files without special container. For example, you want to attach chages to e-mail, changes cover 10 files + 1 removed file + 2 added files. Will you pack it to tar/zip with two folders `old` and `new` inside or what? Looks like pre-VCS era solution, when we did manaul "version control" by copying `project` to `project-19950112-final-for-sure` :)
>There's really no need to ever transmit a diff and deal with all the format vagaries when you can just send the two files.
Well, depends what are you doing, and in 2025, they are more relevant than ever.
Asking an LLM to output a diff for an edit can save you a staggering amount of tokens and cut the latency of it's response by 5-10x. I've done it your way, a custom diff way and then added a standard diff one, and even back then with GPT 3.5 there was a huge difference, let alone now with way larger models.
There is a lot of diff's in the dataset so telling it to create a standard diff is usually no different than asking it to create a whole file in terms of accuracy (depending on the task), but saves you all the output tokens and reduces the amount of compute/time required to infer all those tokens.
Updating a code running in a sandbox on a 3rd machine over the wire and speed is relevant? You want a diff. I did it your way first for ease, but knowing how much data and compute I was wasting on that, it was a low hanging optimisation to use a diff, and it worked wonders. Yeah, for most usecases it would be an overkill, but for this usecase miliseconds were important.
If you have file A and A2 and a diff AxA2, constructing file A2 is easy and saves you all the A-diff data.
Merging them is the only potential issue due to conflicts, and that is where a human or LLM has to come in and that is where just having a file A2 to overwrite the original one would be easy, but conflict occurs only in cases where you might not want that to happen.
TLDR; diff good.
A diff between two files isn’t unique, meaning there can be better or worse diffs between the same two versions of a file, depending on the file format and possibly the purpose of the diff. Similarly, there can be different strategies for applying a diff as a patch.
Having a diff format allows decoupling the implementation of diff creation from the implementation of diff application, turning a potential n*m problem into an n+m problem.