logoalt Hacker News

HelloNurselast Wednesday at 7:13 AM2 repliesview on HN

A staggering amount of unnecessary and counterproductive scope creep in just 4 items:

    A single diff can’t represent a list of commits

    There’s no standard way to represent binary patches

    Diffs don’t know about text encodings (which is more of a problem than you might think)

    Diffs don’t have any standard format for arbitrary metadata, so everyone implements it their own way.

Of these, only a notation for binary patches would be a reasonable generalization of diff files. Everything else is the internal data structure or protocol of some specific revision control system, only exchanged between its clients and servers and backups.

Replies

chipx86last Wednesday at 7:23 AM

We build a code review product that interfaces with over a dozen SCMs. In about 20 years of writing diff parsers, we've encountered all kinds of problems and limitations in SCM-generated diff files (which we have to process) that we wouldn't ever have expected to even consider thinking about before. This all comes from the pain points and lessons learned in that work, and has been a huge help in solving these for us.

These aren't problems end users should hopefully ever need to worry about, but they're problems that tools need to worry about and work around. Especially for SCMs that don't have a diff format of their own, have one that is missing data (in some, not all changes can be represented, e.g. deleted files), or don't include enough information for another tool to identify the file in a repository.

show 1 reply
tankenmatelast Wednesday at 7:22 AM

Not so, obviously it is less common these days, but I still use patch(1) and friends enough to run into problems from time to time. This is especially true when you have devs on different platforms (don't even get me started on filename mangling / case-folding issues).

show 1 reply