Dead code is my nemesis. Okay, that may overstate it a bit, but not by much. I spend a lot of time with teams looking at ways to refactor complicated code. Often they have the sense that a lot of the code isn't necessary, that it is special-case code that was put in for one customer or another years ago. The business has moved on, yet the code remains, stymying us with its inscrutability.
Recently, I've been suggesting that teams put probes into their code to discover whether particular bits of code are ever executed. The probes are simple logging objects which register on startup and are then called when control flows to particular points in the program. At the end of a run, you know whether code was executed at those points. At the end of month's worth of production runs you know whether that code was executed at all.
Of course, this doesn't tell you whether the code is technically dead, but at least you have a good indication of where to look. If you tie this data with tracing that tells you which features are being exercised, you may be able to retire particular features.
One thing I've been wondering about, though, is what it would be like if we took this process further? What if we started running coverage reports in production? On its face it seems like an odd idea. Typically teams run coverage on their tests to get a (loose) sense of how well they are testing. Running coverage in production is rarely considered because it is often expensive. Imagine, though, what we could learn if it became a common practice. For every line of code, you'd have a sense of how often it is run and whether it is ever run. You'd get a new sense of the risk of modifying a particular bit of code and a sense of what that piece of code's value is in the application.
In the industry, we've incurred runtime overhead at various times for far less useful reasons. It's something to think about.
I've thought about this more than once, but somehow it never seems to reach the top of my to-do list. For my line of work, this would actually be simple - we run a big website, so any piece of code is running on multiple servers anyway. Just add a coverage server, put it on low weight in the load balancer, and wait for the data to come out...
Overhead and cost would be minimal, benefits would be big - especially when you'd compare coverage from integration/acceptance tests (typically high maintenance) to production coverage. What does it say if these tests show coverage on code that never gets triggered in production? ;-)
Posted by: Cees de Groot | December 14, 2010 at 12:49 AM
This depends a lot on which tool you use, but for example the EMMA coverage tool for Java provides excellent results at a cost of some 20% runtime overhead, which might very well be acceptable on your production system (at least part of the time).
http://emma.sourceforge.net/
Posted by: Martin Probst | December 14, 2010 at 02:27 AM
Do you think of something like the Usage Data Collector for Eclipse (http://www.eclipse.org/epp/usagedata/) or really on a fine granular code level.
Posted by: Daniel Wurst | December 14, 2010 at 02:36 AM
20% overhead for coverage seems high. In C, you can get branch coverage by rewriting
if (f()) ...
as
if (temp = f(), Big_array[Branch_number + temp], temp) ...
... which I believe comes out as much less than 20% on average C code (but it's been > 20 years, so I don't remember the numbers).
Posted by: Marick | December 14, 2010 at 05:54 AM
Resharper has the feature of turning dead code grey. I'm surprised at how much code ends up being grey. But it's always easy to delete which is my first step: remove clutter
Posted by: Llewellyn falco | December 14, 2010 at 08:45 AM
Most of the time, I wouldn't say the greatest obstacle is performance. Granted, you need to save the data somewhere but there are a lot of tricks you can do to make it faster. The main problem I'd think is in addition to the initial construction work, all the optimization will require development time and further maintenance. A reason for this not being a common practice might be the management not understanding the benefits, which in itself might be a problem with the development team not being able to properly communicate them.
Posted by: Heikki Naski | December 22, 2010 at 11:36 AM
Why not integrate into your editor!? Imagine an IDE that shows you not just the line number for each line of code, but also its live coverage count !
Posted by: Jon Jagger | January 19, 2014 at 02:14 PM