One thing that we know for certain is that codebases grow with time. We add code to them continually. When we think about this, it's easy to become bleak. Each of the classes in a codebase is something that we need to understand. At a certain point a team can no longer "keep the system in their heads" and they have to do a bit of spelunking to figure out what they have and where they should put their changes. Changes may end up in the wrong areas, simply because we are not sure of the right place. We start to introduce errors because we are not aware of other relevant code.
Given that these are the forces that we live with, do we have anything on our side? Anything that helps us scale up in a nice way?
One thing that helps is class closure - when our classes focus on a single responsibility, we often end up in a situation where the chance of the changing them is extremely low.
The other day, I started taking two measurements from some git repositories that I had on hand. I measured the number of classes added each month to each codebase, and I measured the number of classes closed each month.
Here is a graph for the codebase of RSpec from about 2005 to 2010. The blue line is cumulative class additions over the months, and the green line is cumulative class closures. The increment on the X axis is months.
To understand this graph it helps to understand how I calculate class closure. A class is closed on the date at which no further modifications happen to it from that date to the present. This means that the closer we come to the present, the less accurate our sense of closure is. After all, we could modify any of those classes tomorrow. But, even with this oddness, there are some interesting things to see.
The distance between the blue and green line indicates the number of classes in the "active set": the number of classes that have been added and (at that date) will still be changed at some point before the present. It appears that RSpec has had a very small active set over its lifetime. In fact it the maximum active set over its lifetime is about 201. That is pretty good for a project with 1034 classes.This seems to indicate that the developers were writing code in a way which tended to bring classes to closure quickly.
Another observation, it appears that there was a massive change in direction in the code base around the middle of its timeline: a large number of new classes were added, and the active set became rather large for a while.
The total picture for RSpec is much nicer than some other projects I've looked at. Here is a chart for a project that brought classes to closure much less quickly:
The project contains close to 5000 classes and and at one point the number of active classes was 2121.