logoalt Hacker News

thecupisbluelast Wednesday at 10:14 AM0 repliesview on HN

>There's really no need to ever transmit a diff and deal with all the format vagaries when you can just send the two files.

Well, depends what are you doing, and in 2025, they are more relevant than ever.

Asking an LLM to output a diff for an edit can save you a staggering amount of tokens and cut the latency of it's response by 5-10x. I've done it your way, a custom diff way and then added a standard diff one, and even back then with GPT 3.5 there was a huge difference, let alone now with way larger models.

There is a lot of diff's in the dataset so telling it to create a standard diff is usually no different than asking it to create a whole file in terms of accuracy (depending on the task), but saves you all the output tokens and reduces the amount of compute/time required to infer all those tokens.

Updating a code running in a sandbox on a 3rd machine over the wire and speed is relevant? You want a diff. I did it your way first for ease, but knowing how much data and compute I was wasting on that, it was a low hanging optimisation to use a diff, and it worked wonders. Yeah, for most usecases it would be an overkill, but for this usecase miliseconds were important.

If you have file A and A2 and a diff AxA2, constructing file A2 is easy and saves you all the A-diff data.

Merging them is the only potential issue due to conflicts, and that is where a human or LLM has to come in and that is where just having a file A2 to overwrite the original one would be easy, but conflict occurs only in cases where you might not want that to happen.

TLDR; diff good.