Every once in a while, I get upset about something that happens in software development. Sometimes I simmer, sometimes I smolder, but eventually, I do pipe up about it. The thing that's been bothering me for a while is the fact that agile really threw the baby out with the bathwater about 10 years ago. At that point in time, we we had many people with deep design knowledge in the industry. Hundreds of books had been written about the mechanics of object-oriented design, and structured design. Although this knowledge was not pervasive, you had the sense that if you walked into a room of people who cared about software development, they'd be familiar with many of the design principles, patterns, and concepts that were being thrown around then.
I'm not sure that has changed. People who care, still dig deep and learn. The thing that did happen, though is a de-emphasis of software design over the last ten years. I think that there are a couple of reasons. One was maturity. There was a a lot of material out there 10 years ago, and it was excellent guidance. Maybe we'd just discovered the foundation of design and we've all moved on to other things. Another was a well-deserved re-emphasis on the people part of our work. Nowadays, the hot topic is: how to best get people to work with each other to deliver software. The last reason to mention is TDD/BDD. With Test-Driven Development and Behavior-Driven-Development, we moved much of our design discussion into the realm of micro-process: you take these little steps and good design emerges. That tends to be true, but no one can deny that it requires guidance and you can only really guide the evolution of your code when you have good design chops.
So, where are we now? I think we're at the cusp of something very interesting in the software design space, and it involves looking at things a bit differently. Let me start with a question. The last time you saw a blog or an article about design, did it just give you a single snippet of code outlining an idea? Chances are, the answer is yes. An article about refactoring might give you a before and after shot. But, do you ever see anything like this in articles about software design?
This graph shows the history of a file in a project in terms of complexity. You can see that the complexity of the file has grown over time, but there are dips, and those dips likely indicate complexity-reducing refactorings that have happened over the life of the project.
Okay, how about this diagram? It shows the number of commits for each file in a project in order of increasing number of commits. As you can see there many files that have only a few commits and a few that have an incredible number.
Remember the Open/Closed Principle? The notion that in a good design, we have many abstractions that don't change very often. Well, guess what? We can figure out whether it really happens on a project, and how often. All it takes is a little SCM mining.
Some of you reading this might be thinking "Oh, no! Metrics." I understand. Metrics do have a well-deserved bad rap. There is nothing worse than managing by a code metric. It throws away too much information and there are always unintended consequences. This isn't really about that, though. It's about taking the data that we have at hand in our development work and really using it. If we are making a decision about whether to refractor a piece of code, we should be able to see its churn and complexity trends, and tie them back to events that happened over time, and the actual features which triggered the work. Right now, it seems that we often look at our decisions through the pinhole of the present, ignoring what we can learn from our code's past.
At the time I'm writing this, there are many people who have been playing with these ideas for a while. Steve Yegge and many others at Google have been doing quite a bit of code mining work. Keith Braithwaite has been exploring how TDD affects the complexity of code. Joshua Kerievsky of Industrial Logic has, more than anyone else, pushed the direction of these ideas. His dream is to have IDEs with historical information, information mined from production logs and quite a few other things at our fingertips. Jason Gorman, Tim Ottinger and Greg Wilson share an interest in gaining a deeper empirical understanding of software development through this kind of work, and Chad Fowler, Corey Haines (blog) and I are brainstorming ideas and actively developing tools to facilitate work in this area. Currently, we're working on a Ruby gem called 'Turbulence' which plots churn and complexity for Ruby projects. Brian Foote of 'Big Ball of Mud' fame looks on and makes sage observations about the shape of software.
Interesting times.
If you change the last plot to log-log, it will probably be a straight line, demonstrating power law behavior.
I think this sort of analysis is very interesting, but it also risks becoming what Physicists might call "phenomenological", meaning it looks at the shape of a phenomenon without necessarily getting at the root source of the behavior.
I also feel that Agile has taken us too far from understanding the role of design thinking. TDD can lead to locally optimal designs, not globally optimal. I'm hopeful that the new emphasis on functional programming will rekindle interest in thinking about fundamental design questions and thereby drive us towards globally optimal designs. I don't anticipate a resurgence of OOD thinking. IMHO, the idea of reproducing your domain model in code was fundamentally misguided, as it was not really the "simplest thing that could possibly work".
Posted by: Dean Wampler | March 02, 2011 at 08:39 AM
What are your thoughts around the need for design chops as more and more patterns are already implemented for us in languages / frameworks? While I would say that those of us with >10 years experience know proper patterns and designs, I would also say that newer developers don't worry about it as much - they just see annotations or rails-gen that creates solid models to start from, without the developer needing to know why.
Thoughts?
Posted by: Joel Tosi | March 02, 2011 at 08:53 AM
@Dean - Yeah, will have to check out the log-log thing. I think that there's still value if it stays phenomenological. We may never know exactly why particular things happen - there are so many variables. But, I think that if we become versed in the patterns of our own code bases, we might be able to make better nudges toward particular goals.
I think that, too often, this sort of analysis has been mired in the desire to find the ultimate truths of software development, rather than just recognizing it as a fuzzy information source.
Posted by: Michael Feathers | March 02, 2011 at 08:57 AM
@Joel - I think that is a serious problem. My sense is that software development is successful economically because it is a way of delaying costs, but if your project lives long enough the costs catch up with you.
Frameworks are great, but the pattern seems to be that people just drop their code into framework defined bins rather than creating their own. It really is a trap. Many teams I've visited would be far better off if they'd done more factoring in the code they've written around the framework.
Posted by: Michael Feathers | March 02, 2011 at 09:05 AM
@Dean As Isaac Asimov said, the most exciting phrase in science is not "Eureka!" but "that's odd...". These power law distributions are certainly odd, because they violate all advice about how to design but they keep emerging in all sorts of strange places (I've been studying them since Hayden Smith showed them on a poster at OOPSLA). Once we know they are there, we can begin to study why and what we can do to raise the slope of the power law line so there are relatively more simple things and the complexity (or whatever) of the most complex is much lower.
Posted by: Kent Beck | March 02, 2011 at 09:23 AM
@Kent One thing I really want to understand is the hubbishness of certain classes, whether it is just really a function of human attention.
Posted by: Michael Feathers | March 02, 2011 at 09:33 AM
Many teams [...] would be far better off if they'd done more factoring in the code they've written around the framework.
Can't agree more, facing this problem every day.
@Joel Newer developers will probably agree too in a few years.
Posted by: Olivierdemeijer | March 02, 2011 at 12:00 PM
didn't you get the memo? The agile community got taken over by hippys.
The world of software design is found elsewhere...
Posted by: Keith | March 02, 2011 at 12:57 PM
Algorithms are the new design. You don't really need design anymore when you can just talk about big Oh notation. Computational thinking has over taken craft.
Posted by: Gregory | March 02, 2011 at 03:39 PM
"Remember the Open/Closed Principle? The notion that in a good design, we have many abstractions that don't change very often."
Maybe I missed your point, but isn't the fundamental notion of iterative/evolutionary development processes (e.g. TDD) that you can change your designs as your understanding of the problem changes? I'll admit that I'm not sure I've ever *really* understood the O/C Principle, but your message seems to be saying that if I change my mind about my design, that I may possibly violate the OCP. (Not that I'm too worried about that.)
Posted by: Hans | March 03, 2011 at 08:01 AM
@Hans I think it's best to look at the Open / Closed Principle as an emergent effect of making good design decisions. Sure, in evolutionary development you can and should change your design, but if you've developed very focused abstractions, changes are you you won't have to as often as you would if you had mangled several ideas together in one class, for example.
Posted by: Michael Feathers | March 03, 2011 at 10:19 AM
Interesting times indeed. I take particular interest because when I worked at Devver, particularly on the "Caliper" project, we were trying hard to add a sense of project-historical context to developer's daily work. Maybe we were just a little ahead of our time.
Posted by: Avdi Grimm | March 03, 2011 at 12:46 PM
@Advi - Great to hear from you. When I was in Boulder last week, Chad Fowler pointed Corey and I to Caliper's site. We looked at thought it would've been great had it survived.
Posted by: Michael Feathers | March 03, 2011 at 01:11 PM
You should also mention Keith Braithwaite's 'measure' which found a power law between the cumulative count of method complexity (or something like that) *for code that was associated with tests*. During one of his talks, I realised that this might be because some of us put a lot of effort into making our code readable, which brings it closer to the sort of word distribution associated with natural language. There's an old write up at http://www.m3p.co.uk/blog/2008/03/14/its-really-about-language/
Posted by: Steve Freeman | March 03, 2011 at 02:09 PM
@Steve Yes. I've had churn on my mind so much, I haven't been thinking about the complexity much.
Posted by: Michael Feathers | March 08, 2011 at 08:04 AM
Great post. I'm also working on code mining (as part of my day job so I don't know if I'll be able to post much about it). I think it's ok to mention that we're also seeing power law behavior on commits in the projects we're looking at. Nice to see other people are working in the same area.
Posted by: Lawrence_white_ | April 02, 2011 at 06:02 AM