I don't know where I first heard that global variables were bad. It was probably in school. The rationale that I remember is that they make it hard to reason about your code. You look at a function and unless you look closely you may miss the fact that it reads or updates something that isn't within its scope. Another function may touch those same variables, and the net effect is that you have something like an intergalactic wormhole in your program. Information jumps from one place to another and you don't have much of a clue about where it is going or where it has been.
As reasons go, this is a very good one. But, it took me a while to discover that there is a much more important reason to avoid global variables.
Years ago, I joined a company that was developing embedded systems. One of my first projects was essentially a rewrite of a library, but we had some interesting constraints. The library was going to be used by some analysis routines and we wanted to make sure that there was no way that a problem in one routine could affect another one the same address space.
It seemed like it was an easy enough problem to solve. We were using C++, and as I thought it through I realized that the heap was a problem. When you share the same allocator, it is easy for corruption in one part of the system to affect another part. To mitigate that, I created a special Memory class that managed its own heap. Each routine would have an instance of this Memory class and no memory would be shared between them.
Another area of concern was error handling. At the time, exceptions were new in C++ and we didn't trust them (history proves this was a good judgement). What we settled upon was a scheme where each routine would have an ErrorReporter object that would be used record any runtime errors that occurred.
This all seems straight forward enough, but remember that what I was writing was a class library. I realized that any class in the library that was going to allocate memory or report errors would need to talk an instance of Memory or ErrorReporter and the singleton pattern was not an option – a singleton would be accessible to all of the routines. After a lot of soul searching I decided to make an abstract class that held a reference to a Memory object and an ErrorReporter. All classes in the library would subclass that class and be given those two objects when it was created.
This really bugged me. It seemed stupid. When I gazed into my imaginary crystal ball I saw that, going forward, every new class in the library would have to inherit from that base class. Those two objects would be passed around and held by redundant references all across the system. It felt ugly. But, a few years later, after working on another project, I went back to that system and discovered something interesting -- only about a third of the classes in the library ended up inheriting from that base class. The other classes didn't ever need to allocate memory or report errors(!).
To me, this was shocking. It's hard to think of more pervasive concerns that memory allocation and error reporting, but their use ended up being restricted to a relatively small area of the code.
The lesson I learned is that just because something is globally accessible doesn't mean that it is globally used. But, I also learned something else that was important. When I looked at the system I could see very clearly see which parts of it could allocate memory and report errors, and which parts couldn't. The concerns had been neatly separated. It was sort of like the kind of shaping that happens when you use the IO monad in Haskell. IO happens neatly in a different layer than your pure logic.
The problem with global variables is that they hide design information – they are a cop out. When you avoid introducing globals you are forcing a constraint on yourself. You have to figure out where those variables really belong and who should have access to them. In the end, you have better separation of concerns.
When I mention the idea of designing without global variables to people, there is one strong objection that usually comes up. Many people say that the wouldn't want to do it because they imagine that they'd end up passing extra parameters all over the place in their code. In my experience, additional parameter passing doesn't happen as often as you'd think. You alter your design to minimize it. There is one caveat, though. Getting rid of global variables in an existing system is still very hard. With a bit of reflection we can see why. When a thing is global it is accessible everywhere, and people tend to use it without discipline. The program has never been shaped by the constraint of having to provide access manually, so you are left with passing the former global variable deep down the call stack. It's the price of not having been disciplined earlier.
I encourage you to try this idea out on your next project – design without global access. Let resource use influence the shape of your program.
I like the way you experimenting weird design ideas, and fall in love with them... Seriously, I think it is a good game what you are playing. My closest attempt to write code in functional programming style in C++ was to return everything by value.
The question I have related to your design: Did you use STL in your code?
Previous project I was working were also against using exceptions, and they had to re-implement all the containers. (Same thing I can see in the clang project.)
P.S.: I would love to see a little demo github project, with concrete types, constructor signatures and unit testing tricks around. ;)
Posted by: Laszlo Nagy | December 21, 2012 at 05:58 AM
Lately I realized that when you don't have singletons nor component-style injection of dependencies (setDatabase(...) :-) then you are forced to either pass things around as you mentioned, or pass "smart objects" around that contain (and hide) the dependency - encouraging a more functional style of OO.
Posted by: Bob Lauer | December 21, 2012 at 08:41 AM
Do you mean that insufficient information hiding at the lower levels (global variable use) causes bad information hiding that we don't want at the design level (inadvertent obfuscation of design)? If so, this makes a ton of sense.
Posted by: Steven J. Greenwald | December 21, 2012 at 08:56 AM
Well said.
The same is true of Service Locator:
http://blog.ploeh.dk/2010/02/03/ServiceLocatorIsAnAntiPattern.aspx
Posted by: Truewill | December 21, 2012 at 10:25 AM
Global variables are good, specially when used in embedded systems. With a global variable, you know where it is, the compiler can use a fast address mode to fetch and update it and it doesn't get messed up by stack frames.
To make them easier to use, I like to prefix the variable name with the first 3 characters of the class which references the variable. If more than one class accesses the variable, then the first 3 characters of each class which accesses the variable should be concatenated to make the variable name.
This way you don't need to prepend g_ to your variable names. Prepending g_ is not a good practice because the slows down the compiler.
Posted by: Rajiv Gupta | December 22, 2012 at 02:33 AM
Having a common superclass for something unrelated to a conceptual hierarchy feels wrong. I think dependency injection would be the best solution, I've had very good results with it for embedded systems.
Posted by: J. | December 22, 2012 at 03:35 AM
It's what you get following the GOOS way of developing, I think.
Growing Object-Oriented Software Guided by Tests,
http://www.growing-object-oriented-software.com/
Posted by: Martin Moene | December 22, 2012 at 07:48 AM
Have you looked at Gilad Bracha's Newspeak? It takes the view that module/package/whatever references in most languages are bad global variables in the same way that you were thinking about memory allocation and error reporting.
Posted by: Ben Butler-Cole | January 17, 2013 at 07:52 AM