The other day, I was reviewing test execution times across several projects. The results were not surprising. A small number of tests consumed a lot of the execution time, but past a certain point the execution times tapered away to nothing over a very long range. They didn't look like traditional "long tail" distributions. Yes, the tails were long, but they were very flat. Here's a graph of one project. The data that you are looking at is a list of test execution times sorted in descending order:
This test set takes 19 minutes to run on a cheapo machine and the total number of tests is 741.
Actually, we can't see anything in that graph because of the largest test execution. It clocks in at 300 seconds. I think I can smell a whiff of framework and database, with a side of grand finale testing. Let's lop that one off and look at the remainder of the data.
It's sort of interesting to see this data, but what does it mean for us practically? Well, when we are aware of of these timings, they can help us make decisions.
Suppose we want to speed up our tests. There a a couple of ways that we can go. One is to attempt global optimizations. If we could, say, reduce the execution time of each of our tests by 50%, we have a 50% gain. Here is a graph of original test execution times (in blue) plotted against the same times reduced by 50%:
Another strategy that we can use is to concentrate on speeding up the slowest tests. To get a sense of how much we could gain in an ideal world, we can figure out the smallest number of tests we could delete to get a 50% gain in execution speed. In this case, the the answer is: pick 5.5% of the tests in decreasing order of execution time.
Because we can't just go around deleting tests (as tempting as it is), it makes more sense to pick a more realistic percentage, say 50% of execution speed for the slowest 10% of tests and follow the line backward.
This graph shows the execution times once we've cut %50 of execution time off the slowest. This might appear to be overly optimistic, but with tests it's often the case that you can get dramatic improvements. This is especially true if people are touching the database unnecessarily.
If we re-sort the data, it looks like this:
The terrible thing is that distributions like these can lead you to be optimistic in the short term and pessimistic in the long term. They inspire optimism because it looks like there is a lot of low hanging fruit. The pessimism comes after you've plucked it. You are left with a long tail of timings you can speed up but with ever decreasing return on investment. In fact, the tail is often longer because long running tests are often sped up by breaking them into several tests. This isn't always the case, though. Quite a bit of test optimization involves deleting unnecessary execution paths.
If you are thinking that this scenario plays out in more than just test execution times, you may be right. I have a feeling that this sort of distribution shows up other areas of software. We bias toward something like short tests and execution time correlates pretty well with it, but there are times when we miss and those outliers cause trouble. Take the example of class size. We can tackle the worst god classes in our systems, but we will often end up increasing the number of classes and leaving less low hanging fruit for later.
One reassuring thing for this particular project is that there really are very few slow tests. In fact, 78% of total test execution time is spent by only 20% of the tests. The 50% payoff point is at 5.5% of the tests. 5.5% of 741 tests is about 40 tests. Speeding those up could be well worth the effort.
I don't think that I can definitely say that speeding up your slowests tests gives you better returns than trying global fixes like better hardware and parallelization, but I think there are some implications here. You should perform measures on your own test execution and see what you can get with optimization of slow tests. Remember, though, that eventually you'll have picked all of your low hanging fruit. Global optimizations are the path forward at that point.
Comments