I love Legacy Code!

This is how I started my Wildcard Talk at XP Days Hamburg 2012. It was really fun doing the kata on legacy code techniques in front of a great audience, and many asked me if I wrote about these techniques somewhere. Well, I just had! (If you just want to see the video, scroll down, I just added it).

So, let’s start with the basics. What is legacy code?

Whenever I ask this question, developers tell me things like: code I inherited, code that someone else wrote and it’s bad, code that I don’t understand. (It’s almost always about what other people wrote because I will always understand my code :irony: ). Some of them know Michael Feathers’ definition that it’s code without tests. But I prefer a different, more personal definition:

Legacy Code = Fear

I don’t care about code I don’t understand as long as I don’t have to change it. And if it works perfectly, I don’t have to change it.

So the main problem of legacy code is that I must change it and I’m afraid to.

For me, working with legacy code means getting rid of the fear. I only know two ways of doing that:

  • Praying – while it might help you personally, I have no proof that it works for software development. If you have proven it works, I’m interested to know how.
  • Growing your understanding of the code.

I know three ways to understand code:

  • Reading, debugging, experimenting with code. This helps create a mental model that can be 99% right or 10% right, depending on how good you are.
  • Refactoring the code until you understand it. I used this method a lot and I learned that you must be extra careful, especially with temporal dependencies. You can use it if you’re really desperate. Hint: it’s safer to move back to the original version after you understood what it does.
  • Writing automated tests that document the code behavior.

Developers who try the third method usually complain it’s very slow. There are a few reasons for that. The most common is that they try to read the code when writing tests. My session at XP Days Hamburg showed a way to write tests without reading the code, unless really necessary. Another reason is that they don’t master the techniques for breaking dependencies, isolating the code you must change and refactoring. Good news: Adi and me are working on a Legacy Code Workshop that will streamline your learning of these techniques. Legacy Code Retreats also help, so join one near you.

When writing tests that document the code, I like to start from system tests and move down to unit tests. I don’t always do that because I adapt to the context, but it’s my default way. To write system tests, I need to figure out the input and the output, pass in an input and save the output somewhere. I can then compare the actual output with the expected one. A lot of different techniques exist, depending on technology, architecture, external systems involved. I try to deploy the technique that works faster and changes as little production code as possible.

I continue going down on the pyramid of tests, until I get to the unit testing level. My first test calls the constructor of the class under test without checking anything. When writing these tests, exceptions are my friends because they make the dependencies and the production code assumptions visible. If I get an exception, I go to the line in the stack trace that shows me the error. I read that line and try to figure out what I must do for the test to pass: break a dependency, do something before calling the method etc.

Once the constructor test passes, I move on and call a method. If it has parameters, I use the simplest value I can: null, empty string, 0 etc. Once this is green, I think about the assert. I use a coverage tool to show me the parts of the production code that the tests are touching and figure out what I must check: a value, a method call etc. Techniques like extract and override and adding public getters usually do the job.

After a while, I have at least two test suites that check my code: the end to end and the unit tests. I managed to understand what the code does in an incremental way, reading only what I really needed to read. Once you got the mechanics right, the only thing slowing you down is the complexity of the code base.

You might wonder why I like this process. I have two reasons: it’s fun to discover what a tangled mess of code does, and it’s rewarding to transform the tangled mess into a clean, beautiful design that’s easy to change and to maintain. It takes effort and time, but that’s the way to go if you want to add features faster and to decrease your bug count.

I’m probably not done with this topic, and I would appreciate your questions and comments, so that I can continue discussing other useful techniques. Until then, I hope you’ll get rid of the fear to change your code!

Update: Here’s the video of the session, recorded from the audience by my brother:

  • http://twitter.com/JeroBarraco Jerónimo Barraco M.

    Really interesting. I like that you focus a lot on productivity and to avoid unnecesary work.