I have Google’s blogsearch set to give me notifications about unit testing. On an average week, I read dozens of blogs and mailing list discussions about the topic. Occasionally, I read something new, but there’s lot of repetition out there. The same arguments crop up often. Of all of them, though, there is one argument about unit testing which really bugs me because it rests on a flawed theory about testing and quality and sadly, it’s an argument that I fell for a long time ago and I’d like to lay it to rest. Hopefully, this blog will help, but I have to relate a little history first.
Back in the very early 2000s, I had a conversation with Steve Freeman at a conference. We were talking about Test-Driven Development and Steve had the strong feeling that most of the people who were practicing TDD at the time were doing it wrong - they'd missed something.
Steve was and is part of a close-knit community in London who have been practicing XP and TDD from the very beginning. Among the fruits of their labor was the entire notion of mock objects. Steve Freeman and Tim MacKinnon wrote the paper that introduced the idea to the broader community. The rest is history. There are mock object frameworks out there for nearly every language in common use.
Mock Objects, however, are part of a larger relatively unpublicized approach to TDD. The story I heard was that it was all started by John Nolan, the CTO of a startup named Connextra. John Nolan, gave his developers a challenge: write OO code with no getters. Whenever possible, tell another object to do something rather than ask. In the process of doing this, they noticed that their code became supple and easy to change. They also noticed that the fake objects that they were writing were highly repetitive, so they came up with the idea of a mocking framework that would allow them to set expectations on objects - mock objects.
When Steve told me about this approach, I thought it sounded okay but I there was one thing that I couldn’t wrap my head around – Steve, and Tim, and the people who had been on that team were using mocks extensively. In fact, they used mocks whenever they could. This was a bit different from the way that I was practicing TDD. What I did, in general, was use tests to drive a class and then I extracted new classes from the class I was designing as it became bulky. Some tests would cover just one class, but others would cover several classes working together.
The problem that I saw with the mock object approach was that it only tested individual classes, not their interactions. Sure, the tests I wrote were nominally unit tests, but I liked the fact that they occasionally tested real interactions between a class and its immediate collaborators. Yes, I liked isolation but I felt that this little tiptoe into integration level testing gave my tests a bit more power and a bit more strength. But, there was only one problem. The team at Connextra that was using mocks extensively was reporting extremely low defect rates. I just wasn’t sure how they were getting them. After all, it didn’t seem like there was any integration testing going on. Their application should have been rife with integration errors. Or should it have? Let’s examine our reasoning.
One very common theory about unit testing is that quality comes from removing the errors that your tests catch. Superficially, this makes sense. Tests can pass or fail and when they fail we learn that we have a problem and we correct it. If you subscribe to this theory, you expect to find fewer integration errors when you do integration testing and fewer “unit” errors when you do unit testing. It’s a nice theory, but it’s wrong. The best way to see this is to compare unit testing to another way of improving quality – one that has a very dramatic measurable effect.
Back in the 1980s, there was a movement to use something called Clean Room Software Development. The notion behind Clean Room was that you could increase quality by increasing the rigor of development. In Clean Room, you had to write a logical predicate for every little piece of your code and you had to demonstrate, during a review, that your code did no more or less than the predicate described. It was a very serious approach, and it was a bit more radical than what I just described: another tenet of Clean Room was that there was to be no unit testing. None. Zilch. When you wrote your code it was assumed correct after it was reviewed. The only testing that was done was stochastic testing at the functional level.
Amazingly, Clean Room worked. Clean Room teams demonstrated very high quality numbers. When I read about it, I was stunned, but then I came across a passage in a book that I was reading about the process. The author said that many programmers wrote their predicates after writing a section of code, but that experienced programmers often wrote the predicates first.. Gee, that sounds familiar, doesn’t it? In TDD, we write the test first and the test is, essentially, a specification of the behavior of the code we are about to write.
In the software industry, we’ve been chasing quality for years. The interesting thing is there are a number of things that work. Design by Contract works. Test Driven Development works. So do Clean Room, code inspections and the use of higher-level languages.
All of these techniques have been shown to increase quality. And, if we look closely we can see why: all of them force us to reflect on our code.
That’s the magic, and it’s why unit testing works also. When you write unit tests, TDD-style or after your development, you scrutinize, you think, and often you prevent problems without even encountering a test failure.
Now, as you’re reading this, you might think that I'm saying that we can get away with doing nothing as long as we sit back in our chairs, rest our chins on our hands, and think about our code. I don’t think so. I think that approach may work for short periods for some people, but software development is a long-haul activity. We need practices which help us achieve continuous discipline and a continuous state of reflection. Clean Room and TDD are two practices which, despite their radical differences, force us to think with absolute precision about what we are doing.
I have no doubt that a team could do well with, say, Clean Room, but personally, I like the fact that the tests we end up with using TDD give us additional leverage: they make it easier to change our code and know that it is still working without having to re-reason through the entirety of the code base. If you have to change your code often it’s not the best use of your time, especially when you can write tests that have recorded and embodied that reasoning and run them at will. With those tests in place, you free yourself up to reason about other things that you haven’t reasoned about in the past, rather than repeating yourself endlessly.
But, enough about TDD.
My point is that we can't look at testing mechanistically. Unit testing does not improve quality just by catching errors at the unit level. And, integration testing does not improve quality just by catching errors at the integration level. The truth is more subtle than that. Quality is a function of thought and reflection - precise thought and reflection. That’s the magic. Techniques which reinforce that discipline invariably increase quality.