« Tell Above, and Ask Below - Hybridizing OO and Functional Design | Main | My Satisfaction with Git: No Abstraction »

April 07, 2012


TrackBack URL for this entry:

Listed below are links to weblogs that reference The Tyranny of the Diff:



I think this is one case where tools like Review Board (http://www.reviewboard.org/) and FileMerge win over simple diff viewers like git diff and github's diff viewer. Both RB/FM focus more on the before and after than just the diff format style changes. RB/FM visualize the changes with the before on one side, and after on the other. That of course isn't a unique strategy to those tools, but they're the 2 I commonly use.

This is one of the reasons I find Review Board to be a better changeset viewer then github (I hope this doesn't start a flame war). The diff file format is a common language that most developers speak, but it was written for a data processor (computer), not a visual processor (humans).

Solution wise, in git, there's some handy functionality for using tools like FileMerge to view a diff. I commonly use git mergetool to handle complicated merge situations. When you use that, it goes file-by-file through conflicts, opening up the file in FileMerge.

I just did a little googling, and next time you're looking at a diff, instead of doing git diff, try git difftool. Just like git mergetool, it'll go file-by-file through the changeset, opening it up in FileMerge (or whatever your git is configured to use). I haven't used difftool before just now, but it looks like it'll be useful for more complicated diffs.

Kent Beck

The next revolution in version control will be storing and replaying refactorings. That way, if you and I rename different variables in the same expression, it isn't a conflict.


Tool have their special cases... diff is useful for localized changes, but when the change is a radical rewrite, there are tools which beautifully show chunk changes and also can modify/swap code: gvimdiff and meld are the best in this regard because of their visual colorized features. Create a command in your VCS to use their power of inspection and modification.

Rudolf O.

What you want is your diff to recognize the semantic context of the code, to realize that it's more than just lines of text.


Kent and Rudolf are correct. There isn't enough semantic content in a line oriented diff. Actually there is a lack of semantic content in much of what we do as programmers. Single line commit messages and chatter on IRC.

Refactorings used to hard and occurred less often, now they are easy and our diffs are less valuable. If an organization puts value on the diff, then it could be a drag on the rate of refactoring. Reviewing diffs was the quality gate before unit testing became prevalent.

Not only do we need to record operations against the code, but we need to have operations be transactional against the codebase in terms of our intent, the operations against the code and finally the unit tests which ensure the validity of that code modifying transaction.

To be able to view those refactorings, I could see a tool that measures mutation, coupling, interface surface area, etc. It could give a qualitative/quantitative score based on a variety of metrics. This system could ensure that a codebase continues to have a cohesiveness of mind; that is doesn't turn into a camel farm.

Patrick Corcoran

diff is just a tool. Like sandpaper. Great for what it's great at, and not to be held in contempt for results of applications elsewise.


Agreed, while it might be nice to have a pretty looking diff, it's really a secondary artifact. The new state of the code is more important.

For those times when you're interested in the change itself, and a unified diff doesn't represent it well, looking at the the before and after in another comparison tool is better. For exploring all the files touched in a Git commit, I recommend git-diffall (https://github.com/thenigan/git-diffall). If you're a Vim user, it plays well with DirDiff (http://www.vim.org/scripts/script.php?script_id=102). If you prefer Emacs, I recommend dircmp-mode (https://github.com/matthewlmcclure/dircmp-mode). If you're on Windows or a Linux desktop, Beyond Compare (http://www.scootersoftware.com/moreinfo.php) is a fantastic tool.

Beyond Compare in particular lets you tell it a line on each side of the comparison that corresponds to the other side, in case you can do better than what the automatic hunking chose.

Mark Mahoney

I have been working on a version control tool that doesn't use a snapshot based approach to storing changes- it stores every keystroke, delete, copy and paste, and file operation in a database. This information can be played back (there are many filters to limit the amount that gets played back).

The most important aspect of this tool is that a developer can tell a story about the changes. The stories can be shared among team members to spread knowledge about how the system was developed.



It is in general a very hard problem to get tools (which cannot think) to show semantic relations.

In your particular case, however, there might be something that works most of the time: --patience. If there are identical lines in the original and in the new version, patience diff will go out of its way to make sure that they are marked unchanged (unless that is not possible due to rearranging).

As a consequence, if traditional diff shows things like inserted test code as changing existing tests, patience diff tends to show a better picture of what was intended.


I think what you are looking for is a diff of the syntax tree as opposed to the code itself. I know it sounds almost the same but semantically it is quite different; one concentrates more on syntactic details and the other concentrates more on flow.

Mike Burrows (@asplake)

Takes me back to the mid/late 80's...

In my first job after graduating, developers had to mark up code changes by hand on a paper printout (aka "computer listing" - remember those?). You soon learned to make changes that impacted the smallest number of lines.

Optimising for the diffs like this left code that got worse with each edit. After I while I realised this and I defied our system - I just struck through page after page of listing and wrote "rewritten". It felt good.


I don't look at the diff, really. It's good to have around if someone's been too sparse with the commit messages, but looking at which lines have changed is useless.

I care about if object A is now doing something different, or if something is now using pattern B. I care if the code does something new or different, and that is usually stated in the commit message.


seanjensengrey: "Reviewing diffs was the quality gate before unit testing became prevalent."

My thoughts exactly. The standard measure of behaviour is the unit test, and unit tests should be used for that purpose. Diffs measure something else, something fra closer to the prose of the code than to the input and output that defines its purpose.

Personally I entered software development at the same time that unit tests became widespread. I have never worked in an organization where diffs have been considered terribly important.

Enjoyed the article, BTW.


I'm reminded of a conversation I had with Simon Peyton Jones some years ago about the lack (then) of good development environments for Haskell. "Just use EMACS" he semi-joked. But, I replied, the days are long gone when practitioners think about programs as large complicated bodies of text for which the premier tool is clearly a large complicated text editor.

I'm remained also of a more recent conversation with Nat Pryce where we discussed the odd fact that OO programmers believe in computation by sending messages to objects and (in the few OO languages we have: Smalltalk and a couple of others) also do programming by sending messages to objects. Whereas functional programmers believe in computation by evaluating expressions in environments but do programming by making destructive updates to character arrays.

Looking at line-by-line diffs seems like a notion very much from the programs-are-text world. We should have grown out of that by now.


I was pondering something similar today when committing a set of changes to some files. It struck me that the use of diffs to view the lines of code that were changed was really in alignment to the thinking of a piece of code as a linear body of work.

I found myself writing a large comment in the checkin notes, and thinking that if I wrote a similar amount of text in the code itself I'd be ashamed of the necessity (and would probably have decomposed the code into appropriately named methods). Then it struck me, if comments are often a violation of DRY because they tell how something is done instead of appropriately sized methods, then the same goes for these operations. Line / diff level operations in source control are at the wrong level too. They are the 'assembly language' of source code operations to exercise a well used meme.

P.S. found this via http://news.ycombinator.org/item?id=3813793 where there is more conversation ongoing.

P.P.S. Michael, Thanks for 'Working Effectively with Legacy Code'. It helped me land my current job with an excellent company.

ambert ho

That's how RubyMine and InjelliJ do conflict resolution, is it not?

For textmate and other dev tools would be great to do side by side comparisons, of course :)

Jeff Foster

As Rudolph and Kent say, semantic diff is definitely the way forward. I want my changes to viewable as a stream of semantic changes (rename variable, extract method).

I think we can approach that with a bit more rigour about how features are developed.

Adding a new feature almost invariably requires the creation of an inflection point in the code (e.g. a new virtual method to change behaviour, a condition somewhere in a call chain). The process of creating that inflection point is a series of refactorings. Rather than jumping straight into shotgun surgery, perform a series of micro-refactorings to get to this point. Making a point of clearly separating feature development into refactoring and then feature development really helps with the diffs.

If you use a DVCS then you have much more freedom to make a commit (in that you can commit locally without affecting others). I tend to try and commit after every change of the code that still leaves my code compilable. What I really want is to hook it up to R# or IntelliJ such that each time I make a refactoring the code is automatically committed locally with a suitable message (e.g. Extract Method XYZ from ABC).

Hopefully by the end of all this, I have two sets of commits to review and they are nicely broken out. If I trust the refactoring tool then I can just accept that group of changes in bulk. By the time I get to review the feature, that hopefully diffs well because I've spent the time creating the inflection point.

Matt Wynne

Maybe the commit was too big? Would the diffs have looked better if you'd committed each baby step of the refactoring separately?

George Dinwiddie

When I was doing heavy java work with Eclipse, I found the Review Changes diff very handy. It showed the diffs in context, and allowed easy movement through them. I almost always reviewed the diffs coming in during update in addition to all of the changes I was about to commit.

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Your comment could not be posted. Error type:
Your comment has been posted. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.


Post a comment