It's often hard to understand large classes. Sometimes we can't see the forest for the trees.
One tool that I often use is something I call a class feature diagram. They are easy enough to create. All you do is create a box for every field and a bubble for every method in your class and then draw an arrow from each method to the fields and methods it uses.
Here's a class feature diagram for one class:
And, here's one for another one:
You can learn a lot from these diagrams. You can see internal coupling and internal cohesion. Often they help you understand how to move forward in the face of a large refactoring task.
For me, though, the really fun bit about feature diagrams is the way that they change under refactoring. For example, the feature diagram for this class:
class A {
private int a, b;
public void foo(int value) {
a++;
b += value * value + value;
}
}
looks like this:
If we extract a method named bar from foo, we can end up with code that looks like this:
class A {
private int a, b;
public void foo(int value) {
a++;
b += bar(value);
}
private int bar(value) {
return value * value + value;
}
}
However, there's another way of extracting a method that will give us this:
Here it is:
class A {
private int a, b;
public void foo(int value) {
a++;
bar(value);
}
private void bar(int value) {
b += value * value + value;
}
}So, what's the difference? Is one any better than the other? On the one hand, the first extract method gives us a pure function, a function without side-effects. In general, pure functions are great. They are easier to reason about, but there are some benefits to the second approach. Let's take a look at that diagram again:
What we've done in this refactoring is introduce a node between two other nodes. By doing so, we've made it possible to split this class into two classes, one which contains foo and a and another which contains bar and b. The first of those classes can use the other.
This strategy is an application of the Dependency Inversion Principle inside classes. We had a method foo which depended on two concrete things and we ended up making it depend on something abstract (bar) and one less abstract thing. In old-school terms, we've encapsulated b within bar.
Now, you might look at this and say "Well, this is just a toy example." Yes, it is, but it points toward a very useful strategy with extract method: you can gain advantage when you extract commmand methods; that is, methods which return void and mutate some fields. You get the advantages in cases where you are able to start encapsulating more - when you start to be able to hide M things behind N methods where M < N. At that point, you are in a great position to do an extract class refactoring.
Sidenote: This definitely isn't the only class splitting strategy you can use with extract method. Extracting pure functions can be very useful but it also moves us away from OO and toward a more functional style of programming. The proper way to mix OO and FP seems to still be an open problem in the industry today.