I have Google’s blogsearch set to give me notifications about unit testing. On an average week, I read dozens of blogs and mailing list discussions about the topic. Occasionally, I read something new, but there’s lot of repetition out there. The same arguments crop up often. Of all of them, though, there is one argument about unit testing which really bugs me because it rests on a flawed theory about testing and quality and sadly, it’s an argument that I fell for a long time ago and I’d like to lay it to rest. Hopefully, this blog will help, but I have to relate a little history first.
Back in the very early 2000s, I had a conversation with Steve Freeman at a conference. We were talking about Test-Driven Development and Steve had the strong feeling that most of the people who were practicing TDD at the time were doing it wrong - they'd missed something.
Steve was and is part of a close-knit community in London who have been practicing XP and TDD from the very beginning. Among the fruits of their labor was the entire notion of mock objects. Steve Freeman and Tim MacKinnon wrote the paper that introduced the idea to the broader community. The rest is history. There are mock object frameworks out there for nearly every language in common use.
Mock Objects, however, are part of a larger relatively unpublicized approach to TDD. The story I heard was that it was all started by John Nolan, the CTO of a startup named Connextra. John Nolan, gave his developers a challenge: write OO code with no getters. Whenever possible, tell another object to do something rather than ask. In the process of doing this, they noticed that their code became supple and easy to change. They also noticed that the fake objects that they were writing were highly repetitive, so they came up with the idea of a mocking framework that would allow them to set expectations on objects - mock objects.
When Steve told me about this approach, I thought it sounded okay but I there was one thing that I couldn’t wrap my head around – Steve, and Tim, and the people who had been on that team were using mocks extensively. In fact, they used mocks whenever they could. This was a bit different from the way that I was practicing TDD. What I did, in general, was use tests to drive a class and then I extracted new classes from the class I was designing as it became bulky. Some tests would cover just one class, but others would cover several classes working together.
The problem that I saw with the mock object approach was that it only tested individual classes, not their interactions. Sure, the tests I wrote were nominally unit tests, but I liked the fact that they occasionally tested real interactions between a class and its immediate collaborators. Yes, I liked isolation but I felt that this little tiptoe into integration level testing gave my tests a bit more power and a bit more strength. But, there was only one problem. The team at Connextra that was using mocks extensively was reporting extremely low defect rates. I just wasn’t sure how they were getting them. After all, it didn’t seem like there was any integration testing going on. Their application should have been rife with integration errors. Or should it have? Let’s examine our reasoning.
One very common theory about unit testing is that quality comes from removing the errors that your tests catch. Superficially, this makes sense. Tests can pass or fail and when they fail we learn that we have a problem and we correct it. If you subscribe to this theory, you expect to find fewer integration errors when you do integration testing and fewer “unit” errors when you do unit testing. It’s a nice theory, but it’s wrong. The best way to see this is to compare unit testing to another way of improving quality – one that has a very dramatic measurable effect.
Back in the 1980s, there was a movement to use something called Clean Room Software Development. The notion behind Clean Room was that you could increase quality by increasing the rigor of development. In Clean Room, you had to write a logical predicate for every little piece of your code and you had to demonstrate, during a review, that your code did no more or less than the predicate described. It was a very serious approach, and it was a bit more radical than what I just described: another tenet of Clean Room was that there was to be no unit testing. None. Zilch. When you wrote your code it was assumed correct after it was reviewed. The only testing that was done was stochastic testing at the functional level.
Amazingly, Clean Room worked. Clean Room teams demonstrated very high quality numbers. When I read about it, I was stunned, but then I came across a passage in a book that I was reading about the process. The author said that many programmers wrote their predicates after writing a section of code, but that experienced programmers often wrote the predicates first.. Gee, that sounds familiar, doesn’t it? In TDD, we write the test first and the test is, essentially, a specification of the behavior of the code we are about to write.
In the software industry, we’ve been chasing quality for years. The interesting thing is there are a number of things that work. Design by Contract works. Test Driven Development works. So do Clean Room, code inspections and the use of higher-level languages.
All of these techniques have been shown to increase quality. And, if we look closely we can see why: all of them force us to reflect on our code.
That’s the magic, and it’s why unit testing works also. When you write unit tests, TDD-style or after your development, you scrutinize, you think, and often you prevent problems without even encountering a test failure.
Now, as you’re reading this, you might think that I'm saying that we can get away with doing nothing as long as we sit back in our chairs, rest our chins on our hands, and think about our code. I don’t think so. I think that approach may work for short periods for some people, but software development is a long-haul activity. We need practices which help us achieve continuous discipline and a continuous state of reflection. Clean Room and TDD are two practices which, despite their radical differences, force us to think with absolute precision about what we are doing.
I have no doubt that a team could do well with, say, Clean Room, but personally, I like the fact that the tests we end up with using TDD give us additional leverage: they make it easier to change our code and know that it is still working without having to re-reason through the entirety of the code base. If you have to change your code often it’s not the best use of your time, especially when you can write tests that have recorded and embodied that reasoning and run them at will. With those tests in place, you free yourself up to reason about other things that you haven’t reasoned about in the past, rather than repeating yourself endlessly.
But, enough about TDD.
My point is that we can't look at testing mechanistically. Unit testing does not improve quality just by catching errors at the unit level. And, integration testing does not improve quality just by catching errors at the integration level. The truth is more subtle than that. Quality is a function of thought and reflection - precise thought and reflection. That’s the magic. Techniques which reinforce that discipline invariably increase quality.
Very impressive! So I've translated the article into Japanese, that is available at
http://www.hyuki.com/yukiwiki/wiki.cgi?FlawedTheoryBehindUnitTesting
Please let me know if you have any inconvenience.
Thank you for your good writing.
Posted by: MORITA Hajime | June 15, 2008 at 09:13 AM
Really good post and I definitely agree with the sentiment.
Having said that whilst different testing styles might each result in quality improvements they can also result in tests that affect future development in very different ways.
For example if I do the little integration tests (which I believe you might have gone off since 2000) then I get tests that are more immune to low level refactoring but they do tend to be more complex and slightly slower. Whether they provide good documentation depends on what the view of the system the reader wants, which is hard to judge. Its these sorts of trade offs that I struggle with and that leave me constantly questioning my approaches to testing.
Posted by: Colin Jack | June 15, 2008 at 11:13 AM
Nice post. Your post has been linked to in my post about the same topic: http://moffdub.wordpress.com/2008/06/16/the-getter-setter-debate/
Posted by: MoffDub | June 17, 2008 at 07:53 PM
You say that quality is a function of precise thought and reflection, so any technique that reinforces this discipline increases quality.
In his afterword to Kent Beck's 'Test Driven Development by Example', Martin Fowler has an idea, which in my mind may suggest that TDD is better at reinforcing the discipline than say Clean Room, or writing tests after coding.
Here is the gist of Fowler's idea (I have turned excerpts of his afterword into bullet points):
* Programming is hard.
* It sometimes feels like trying to keep several balls in the air at once: any lapse of concentration and everything comes tumbling down.
* TDD helps reduce this feeling, and results in rapid unhurriedness (really fast progress despite feeling unhurried).
This is because working in a TDD development style gives you the sense of keeping just one ball in the air at once, so you can concentrate on that ball properly and do a really good job with it.
* When you are trying to add some new functionality, you are not worried about what really makes a good design for this piece of function, you are just trying to get a test to pass as easily as possible.
* When you switch to refactoring mode, you are not worried about adding some new function, you are just worried about getting the right design.
* With both of these activities, you are just focused on one thing at a time, and as a result you can concentrate better on that one thing.
* Adding features test-first and refactoring, are two monological flavours of programming.
* A large part of making activities systematic is identifying core tasks and allowing us to concentrate on only one at a time.
* An assembly line is a mind-numbing example of this - mind numbing because you always do the one thing.
* Perhaps what test-driven development suggests is a way of breaking apart the act of programming into elemental modes, but avoiding the monotony by switching rapidly between those modes.
* The combination of monological modes and switching gives you the benefits of focus and lowers the stress on the brain without the monotony of the assembly line.
Posted by: Philip Schwarz | June 18, 2008 at 04:01 PM
Wordled version ;-)
http://wordle.net/gallery/01304/Feathers:_The_Flawed_Theory_Behind_Unit_Testing
Posted by: Mike Bria | June 19, 2008 at 04:38 PM
Hi Michael,
I learned to not think in implementation details when creating testing during tdd process (like described in TDD by Example). This is very useful, because rather than thinking in implementation details I think how my class is suppossed to act. Eg. caculateTax in this scenario should return 15. No matter how this is implemented.
This gives me freedom to refactor in order to make the code cleaner, with no fear of breaking my tests.
This also help me to achieve test as documentation, so users of my class can get examples of how to use it. As a user I want to know the visible changes of the class, not how it works internaly. A lot o expectations that has nothing to do with a real use o my class makes it harder.
Interaction based tests are harder to read and to find in a glance what it is suposed to test.
They are also more fragile to refactoring.
IMO...
Posted by: Ricard Mayerhofer | June 20, 2008 at 03:43 PM
In the 90's I wrote the comment for a method before implementing it. Combined with Design by Contract, It was my way of thinking through the design issues before coding.
I still think in DbC terms, but now I capture that same design information in my tests.
Posted by: Dean Wampler | June 27, 2008 at 05:03 AM
Someone recently said that an effective unit/micro/isolation test suite is the single most important artifact on a software project. More important than the production code, even, since without the test suite, the production code will always slip into irreparable decay. If the production code is the house, the unit test suite is the foundation. Build it well.
I think that integration test suites are important, since they tend to catch real interactions between real components, using closer-to-real data.
But while integration test suites may often be cheaper to write (they are certainly cheaper to *learn* to write), they are not cheaper to maintain than good unit test suites. And they are less likely to catch all the defects, and they are less likely to catch the defects early enough. They will al.so localize failure less well.
For least cost per defect, in the long run, and for least total cost of ownership, per test suite, I don't think you can beat really, really fast isolation tests, complete with requisite mocking, and little tricks like in-memory databases.
That said, I always end up with a contextual, locally-adapted mix of true unit tests (using your definition, Mike), and integration tests. And they are managed differently for CI purposes (fast tests are always run more frequently than slow ones -- so we separate the fast ones out, and the build state is always contingent on their passing).
So what about early on for a newly agile team?
Early on in a team's adoption of unit testing best practices, do I care more about true isolation/mocking and fast tests, or about good semantic coverage, and getting people test infected? An early win?
I care more about the latter. Early on, I just want them to learn that automated testing is possible, worthwhile, and full of delightful unexpected benefits and joys.
Posted by: Patrick Welsh | June 28, 2008 at 11:56 AM
Great article Michael, thanks for sharing! I ran across another post that talks about the possibility that TDD is becoming a Cargo Cult.
http://www.mtelligent.com/journal/2008/6/26/the-cargo-cult-of-test-driven-development.html
Between your article and this one, I ended up writing something as it made me reflect on our practices as developers and how we can fall into the "trap" of forgetting our original purpose for picking up TDD as a tool.
http://blog.saadware.com/2008/07/02/purposeful-software-development/
Thanks again!
Scott
Posted by: Scott Saad | July 03, 2008 at 07:20 AM
BlogReader: One example that I remember off the top of my head re: CleanRoom was in the NASA software process improvement report, as referenced heavily by Steve McConnell in Code Complete.
http://www.sei.cmu.edu/publications/documents/94.reports/94.tr.022.html
Posted by: Michael DiBernardo | July 11, 2008 at 07:21 AM
Summerizing...
What TDD is:
practices which help us achieve continuous discipline and a continuous state of reflection.
Why TDD works:
Quality is a function of thought and reflection - precise thought and reflection. That’s the magic.
Posted by: Brian Henderson | July 27, 2008 at 11:59 PM
[speaking from the grave]
I am proud!
Posted by: David Hume | September 14, 2008 at 07:56 PM
I like that you mentioned the Old Ways :-)
Kids today don't believe that anyone was delivering code of any quality or value before Extreme Programming Explained was written.
I think guided inspections (test/example-driven, of course) are due for a comeback.
Posted by: Jason Gorman | March 01, 2009 at 10:13 AM
This is a nice article and makes good points... essentially that the whole of TDD is more than the parts. Another point I should like to make is that designing code which will run smoothly in a test environment as well as in a production environment is itself a design constraint and should lead to more portable, reusable code with better encapsulation properties
Posted by: George Brooke | March 05, 2009 at 01:51 AM
Nice read. Totally agree with your last point.
" Quality is a function of thought and reflection - precise thought and reflection. That’s the magic. "
Posted by: Ajay george | November 26, 2010 at 05:42 AM
yer an idiot
Posted by: usuck | April 07, 2011 at 05:34 PM
Great post. This makes me think of literate programming but without the overhead of all the muck left around your code.
Posted by: Jim Gay | March 05, 2012 at 01:56 PM
This ties in perfectly with Greg Youngs talk on testing http://skillsmatter.com/podcast/design-architecture/talk-from-greg-young
and his article http://codebetter.com/gregyoung/2008/02/13/mocks-are-a-code-smell/
Posted by: Monkeyonahill | March 05, 2012 at 02:28 PM
@JimBullock There is still a huge number of folks who think that TDD is primarily about testing, rather than about design. Alas, even though many TDD exponents recognize the primacy of the design aspect(as discussed here by me http://drdobbs.com/229218691 ), TDD evangelists often present it primarily as a testing mechanism. I think that's what Mike F. is getting at in this piece, although he never comes out and says that is the "flawed theory."
Posted by: Andrew Binstock | March 05, 2012 at 06:22 PM
Michael, As per Mark's comment above, examples of OO code without getters would be of interest. Do you know of any that are publicly available?
(Even though, as your post suggested, precise thought probably trumps any particular technique -- even removing getters. Would still be interesting to see a good example.)
Posted by: John Rusk | March 06, 2012 at 11:15 PM
I never understood how unit tests don't get you to integration - we have integrating objects that need unit tests, don't we?
I find the value of thorough unit testing is actually in refactoring and extension. Both typically involve redefinition of requirements, and the unit test repository is an efficient way of determining when I've broken a contract somewhere.
Posted by: Brian Balke | March 13, 2012 at 09:59 AM
we need more end to end testing, with the ability for PIXEL DIFF testing!!! we need to test that actual rendering of pages is as expected
Posted by: Gregory Nicholas | March 30, 2012 at 11:42 AM
In the 90's had a similar experience with Design by Contract, applied to C and C++ code. We didn't have any fancy tools to support it, just asserts or similar constructs. I also wrote careful documentation for each function I was about to write, again a process of reflection before doing. I eventually replaced both practices with tests using TDD.
For the functional code I write today, I find myself going back to up-front thinking and ex post facto test writing. By using Big Words like ex post facto, I'm showing how smart I am...
Posted by: Dean Wampler | September 28, 2012 at 06:45 AM