In a conversation last week I was asked for my recommendations on how to retrofit automated testing and build processes into an existing system. I'm not going to dissemble at all, it's hard. Moreover, I think rescuing existing code from the brink of a rewrite while still producing new features is the most challenging thing I've ever had to do - which kind of makes it fun in a way. The 20 months I spent at my previous employer was often spent trying to recover a very poor application architecture and turn it into a productive environment (we made serious strides, but the office evaporated too early).
I revisited my previous posts on dealing with legacy code, and I still feel like they largely stand on their own. What I don't have anywhere is a high level gameplan for how I would get started, so here it is with links for more content.
What is Legacy Code?
In Lessons Learned for Dealing with Legacy Code I stated my definition of Legacy Code:
In his [Working Effectively with Legacy Code] book, Michael Feathers defines legacy code as code without automated tests. I do think that's valid, but I'm going to broaden my personal definition a little bit. Legacy code is code that you're afraid of, but is too valuable or big to toss away.
To summarize, Legacy Code is code that is difficult, inefficient, or risky to change, but too important and useful to throw away or ignore. You have to deal with it. You can ignore it and keep going, but it might be a lot smarter to pay down the Technical Debt that you've accrued to remove friction in your development environment.
- Where does it hurt? - The most important thing is to do is to target the specific areas of code that are causing you the most troubles or just plain inefficiency. You've only got so many people and so much time at your disposal, so every single thing you do has to add value. If a module of code doesn't need to change in the near future, leave it alone. Where are your performance problems? What areas of the code change most often? That's where you go first.
- Read everything that Michael Feathers has ever written, period. Start here. Mr. Feathers has a bust on my Mt. Rushmore of Software Development.
- Make it Build - I think the most bang for your buck is to invest in improved build automation first. A lot of the problems from legacy code I've dealt with has been difficulty in getting the system environment configured correctly before I could even start. I've seen other applications nearly crash and burn just because it takes too long to migrate code between environments. Dose your legacy code with some NAnt (or Ant or Rake or Maven, etc.). Add some environment tests to your build to make troubleshooting the environment easier.
- Start with the End in Mind - You won't get to a desired endstate in one leap forward, but you still need to create and constantly hone your vision for the structure you want your legacy code to evolve into. Start an Idea Wall somewhere on your Wiki or a visible chart to record ideas from the team about possible improvements. You never know when an opportunity to make one of these changes will present itself. Be ready and have ideas queued up.
- Management Visibility - You will not be able to very much in the dark of the night. At some point your actions to improve the existing code is going to have to be visible to management. Any large scale effort to improve the quality of existing code can only succeed with the full blessing of management. We wrestled with this problem quite a bit at my previous employer. I wrote an essay about Balancing Technical Improvements versus New Business Features that included the topic of selling technical improvements to your management as necessary precursors to new business functionality. All I can say is Good Luck. If you do run into management that just doesn't seem to care about technical quality, you might want to change your organization.
- Characterization Tests - The first automated tests you should probably write are characterization tests. These tests are generally very coarse grained and work by testing the system from the outside in. It's basically recording tests. You really want a good set of characterization tests as a safety net first before you start making structural changes in the code. In Lessons Learned for Dealing with Legacy Code I recounted one of team's experiences with characterization tests with a few notes of caution. To summarize, watch the effort to reward ratio of your characterization tests,e and try really hard to make those tests human readable to act as documentation. As far as a long term safety net for regression testing, you're still going to want granular unit tests. Big tests tell you something is wrong. Little tests should tell you exactly where something is wrong. More in a Taxonomy of Tests.
- Cut new Seams - Generally the biggest problem I've seen in retrofitting automated tests into legacy code is tight coupling. One of the best things you can do is to cut new seams into the code to allow for more isolated unit testing.
- Hippocratic Oath - Take the attitude that every time you go into the legacy code to make new changes you will not leave it any worse than it already was. When the hood goes up on an area of code, see if you can quickly slip in some refactorings in to remove complexity, improve readability, or retrofit some better test coverage.
- Be Opportunistic - Sometimes the best thing to do is just to pick off low hanging fruit. Little improvements in readability or reductions in duplication add up into big gains over time.
- Be Patient - It's going to take awhile. Keep your eye on the ball.
Don't Let Code Become a Legacy
I think the best path, the most economical path, is to studiously stomp out Technical Debt in existing code as you make changes. Last year in My Programming Manifesto, Michael Lang left a longish comment I always meant to respond to:
Agile proponents will say “refactor, refactor refactor”. But I think every project reaches a tipping point where major refactoring just can not be justified in business terms, either blowing the budget or causing delivery delays that lead to missed business opportunities. At that stage to some extent you’re pretty much stuck with what you’ve got. At that point if you can’t get to the finish line with the architecture you have without major refactoring, your project, even with heroes and miracle workers, is heading for a melt down.
The rejoinder I would fire back at Michael is that the "tipping point" he refers to is largely brought on by putting off refactorings or design corrections for too long. The more quickly you recognize technical problems in your code, design, or architecture, the easier it is to fix these problems. I don't care how much UML or how many CRC cards you did upfront, you should still do reflective design as you work. Constant small refactorings improve efficiency. Waiting too long and making a refactoring expensive is inefficient. In other words, don't allow technical debt to build up, and avoid Michael's "Tipping Point." The interest rates from Technical Debt are a killer.
By the way Michael, you still do design on Agile projects -- and most of the worst architectures I've ever seen have all been the result of elaborate designs done completely upfront and then executed without deviation from said design until it was far too late.
Other stuff
Another note of interest, some code is just too nasty to recover. Fullblown rewrites are fraught with peril. Take a look at Fowler's idea of a StranglerApplication to find a way to rewrite selective portions of an existing system. Also, check out Brian Marick on Approaches to Legacy Code.
Alberto Savoia has an interesting series going on Characterization Tests.