How do you know that you have the right architecture for your application? First of all, we have to define what "right" is, and there are many opinions.
One worthwhile goal for an application is to have classes that are reasonably independent. If we have some functional change to make, we should be able to go to a single class and make the change. If we have to modify several classes it could be because we are introducing a large feature, or it could be because of the way that our application is decomposed. Adding a field to a model class might require changes in a view, and that's normal. It's a side effect of separating representation and presentation.
In many cases, though, we end up making changes in several places because we have faulty abstractions. If many changes to one class necessitate change to another, it's an important piece of information. It could be an indication of the code smell that Martin Fowler named 'Shotgun Surgery' in his book `Refactoring: Improving the Design of Existing Code' [Addison-Wesley 1999]. 'Shotgun Surgery' is essentially, the code smell that you have you find that adding features requires you to make changes spread across wide areas of the code base.
The sad thing about 'Shotgun Surgery'' is that we're pretty much left up to our day to day experience to detect it. We might get the sense that we are touching too many areas of our code, but that is a general feeling. Do we really know which classes are tied together in our problem space? Do we know what other classes we are likely to touch when we touch the one we are working on? It turns out that we have all of the information we need to figure this out in our source code repositories.
Representing Code Change
Over the past few months, I've settled on an intermediate form that I use for data I gather from source code repositories. It makes change analysis and many other forms of analysis rather easy. The central data structure is a method-change-event. Each method change event has the following fields:
- type (method add, change, or delete) - method name (fully qualified with class/modules) - method body length - file name - sha1 - commit date - committer
With this information, I can track the changes of individual methods across their entire lifetimes. I can also do higher level analysis of qualities associated with files and classes.
Approach
When methods from two or more classes are changed in the same day by the same committer, we can say that the classes are somewhat correlated in time. When this happens often their correlation is high. For our purposes, it's probably be sufficient to compute the pairwise correlation of classes.
Given that we have a set of events which represents the state of every method each time it is changed in our project's history, we should be able to find all of the classes that have been touched by each committer on each day. We can then take all pairs of those classes and throw them into a set. When we do this for each commit and total the number of times each pair occurs across all of the commits, we should be able to see the relative frequencies of the pairs. When we sort them, we'll see which classes change most often with each other.
Fortunately, this sort of analysis is rather brief in Ruby. Here is the whole of the computation after a few extensions to Array and my method change event class:
events.group_by {|e| [e.day,e.committer]}.values .map {|e| e.map(&:class_name).uniq.combination(2).to_a } .flatten(1).norm_pairs.freq_by {|e| e }.sort_by {|p| p[1] }
When you examine these sorts of frequencies, they typically have this sort of shape:
Once you have a sorted list of the highest frequency class pairs, you can march through them and see if there are any surprises. Temporal correlation of change across domain objects is particularly informative. Altering the correlation range from a day to a week can be useful too.