Getting back on track with the "Maintainability" series of posts. I'm doing this way too late at night, so the coherence might be lacking.
Don't Repeat Yourself
Don't Repeat Yourself (DRY) is a statement exhorting developers to avoid duplication in code. Duplication isn't always the easiest thing to spot or even prevent. From the Pragmatic Programmers:
DRY says that every piece of system knowledge should have one authoritative, unambiguous representation. Every piece of knowledge in the development of something should have a single representation. A system's knowledge is far broader than just its code. It refers to database schemas, test plans, the build system, even documentation.
Duplication is an obvious problem for maintenance, but there's a secondary meaning to the DRY Principle. When I'm adding an all new feature to a system with new classes, database mappings and tables, new screens, web services, etc. I want to make the change with the fewest steps possible with a minimum of repetition. I want to tell the system what I want to happen, and I want to say it only once. More on that second meaning later.
Duplication Retards Change
For the upcoming (soon, knock on wood) StructureMap 2.0 release, I got in and added support for generic templated types. It was nasty. It wasn't really nasty because of Generics, it was nasty because I blundered with this innocuous looking code:
return _pluginType.FullName;
In some spots it was useful or necessary to identify a .Net Type with a string value and early on I fell into using the full class name as a convention. I then promptly duplicated that simple Type.FullName logic over 70 times in the codebase. Flash forward 3 1/2 years to the new Generics support, and I needed a way to go from a string to a type. The obvious answer was to finally change to using assembly qualified names. It took me about 6-8 hours total to make that one little change because of the stupid amount of duplication I had introduced with the FullName logic.
Some other cases:
- Multiple applications, or even subsystems of the same system, reading and writing to a shared database. You almost inevitably end up with duplicated work to read, write, validate, and interpret the exact same data. Think about a column in a database that represents the status of some sort of work item. The logical entity represented by this row has different constraints and business rules depending upon what the value in that status column. If you have more than a single piece of code that "knows" how to interpret that status value, you have duplication, and a particularly pernicious sort of duplication because it's hard to spot by looking at any one codebase. Just as a warning, coding in a data-centric manner can open the door to a great deal of harmful duplication. Ask yourself, if the database structure or status field changes, how many other pieces of code have to be changes?
- Reading and writing values from the HttpContext in ASP.Net. This little bit of code represents a great deal of potential duplication (even if you eliminate the Magic Number Antipattern): string something = (string) HttpContext.Current.Session["something"]; What if you want to change your state management strategy altogether? You'll have to change every single piece of code that dipped directly into HttpContext.
- In .Net applications, you often need to use a subclass of System.Text.Encoding when converting byte arrays to strings or vice versa. In an application I worked with there were 67 different references to the ASCIIEncoding class. Why do I distinctly remember this number you might ask? Because we needed to localize the application to a Unicode encoding and I found out quickly that the change was going to lead to considerable change and effort to hunt down and make all the necessary changes. If the character conversion code had been more centralized into some sort of helper class, that change could have been easier.
Stop Duplication in its Tracks
The worst case I've ever observed was a factory automation system.* The system was originally built to pull upcoming factory build jobs from a MQSeries queue, go through a series of business rules, then determine the proper routing and push the new directions to other MQSeries queues. Fine and dandy, until the day that the factory needed to start the basic process manually from a client application on the factory floor. The developers decided to recreate the business rules portion of the existing code, rule by rule, and created a new implementation of the business rules for the new client. I spent some time learning about both components, and it was very apparent that the new code was better structured, but trouble was right around the corner. It's easy to guess what happened next. Those particular business rules were volatile, but only now you had to make functionally equivalent rules changes in two very different components. The system became harder to maintain and extend.
The duplication was created purposely because the team felt that the original code was just too hard to reuse because the business rules and the workflow was deeply intertwined with the code that called into MQSeries. They didn't have any test automation to catch regression bugs, and the system was hard to deploy, so modifying the existing code was quite risky. If the original code had been much more orthogonal between business rules and the communication infrastructure, they might have been able to simply write some new glue code to interact with the existing code. If the system had been backed up with a software ecosystem of effective build automation and comprehensive test automation coverage, the team would have been much better positioned to morph the existing code into a structure that would allow for reuse between both the automatic MQSeries mechanism and the newer manual client process.
Part of the reason duplication creeps into code is the ease of copy/paste/modify operations to create new code. Runaway "IDE inheritance" (copy/paste/edit coding, I couldn't find a link) can lead to a system that's very difficult to maintain. Sometimes developers do the copy/paste/modify trick because the original code isn't quite what they need in the second case. It definitely requires some skill and experience, but in the "not quite what I need" case, I'd much rather a developer take a little time to refactor out the common pieces first before making the second set of changes. Refactoring is perhaps more work than copy/paste in the short term, but stamping out duplication can only help in the longer run. Refactoring is an invaluable skill that's well worth your time.
The Wormhole Anti-Pattern
Bill Caputo wrote a good description of the Wormhole Anti-Pattern that so commonly afflicts enterprise software systems. Roughly stated, I would define the wormhole as all of the stages a piece of data goes through to get from the database to the screen or service interface and back again. When the wormhole gets long and involved, your development work is going to be a struggle -- hence the "Anti-Pattern" designation.
As an almost canonical example, my first official job in software was supporting a data integration between a third party engineering application and a downstream construction application. Between the two databases, a flat file report, two rule files, and the Tibco definitions, I counted 8 different variable names and mappings for a single piece of data along the data exchange. The big problem was that I had to change that mapping pretty frequently -- and that meant following the path through all 8 steps. Needless to say, that code was very difficult to troubleshoot and modify. Of course I made all of the modifications in production to support ongoing engineering projects because there wasn't any such thing as a development environment;) If you're a thrill seeker, nothing is more exciting than coding in the production environment while it's live.
To apply the Wormhole Anti-Pattern to your architecture efforts, think about how many steps you would have to go through to get a new element on a screen persisted in the database. Or to add a new feature to your application. If the thought of jumping through a lot of Xml configuration hoops or database metadata setup or the sheer number of changes gives you pause, you may be exhibiting the Wormhole Anti-Pattern. At that point you need to start working towards eliminating or combining some of the steps to shorten your wormhole.
Just for comparison, we had to add some fields to a screen after it was built one week. Here is the wormhole we have to go through on my current project. I've had worse, but this is more than enough:
- Element on the screen
- Property on a Domain class
- Property on at least one Data Transfer Object (DTO)
- Mapping from DTO to Domain class in the client
- Repeat on the server side, but differently
- Change unit tests
- Add new field to FitNesse tests
In line with the Wormhole Anti-Pattern, you might also check out the Shotgun Surgery code smell. If you constantly make a repetitive set of changes to the same classes anytime one changes, it might be a sign that you should shorten your Wormhole by collapsing the class structure down into fewer pieces to consolidate related code into a more cohesive structure. Your goal is to enable changes to your application to be made in fewer mechanical steps.
I only want to tell you this once!
Going back to the previous section on The Wormhole Anti-Pattern, the second, more proactive goal of the DRY Principle is to express changes in as few steps and places as possible. My thinking in regards to the quality of a system architecture has changed quite a bit from my brief exposure to Ruby on Rails.
From Nico Mommaerts,
One of the selling points of Rails is that it is built with the DRY principle in mind. DRY stands for Don't Repeat Yourself, meaning that every piece of your system is described once and only once, which should make development and maintenance a lot easier since there is no need to keep multiple parts of the code in sync. Hand in hand with DRY goes 'Convention over Configuration', another one of Rails' core philosophies. Rails uses a set of code and naming conventions that when adhered to eliminates the need for configuring every single aspect of your application. Only the extraordinary stuff needs to be configured, like legacy database schemas or other resources you don't control. Using these two philosophies, DRY and 'Convention Over Configuration', Rails lets you write less code AND more features in the same time as with a typical Java or .NET application, with easier maintenance afterwards.
Even if you're never going to code in Ruby or build web applications, take a look at how Rails puts the various pieces together to eliminate repetition in code and configuration. A good design allows for minimizing the amount of repetitious information.
DRY-ing out StructureMap
After seeing how Ruby on Rails works, it made StructureMap feel just a little shabby in some places. Here's a specific example, one of the features in StructureMap is the ability to define configuration profiles and easily switch between them. Typically, I like to use this feature to handle environmental differences between development, testing, and production. There's a lot more to the functionality, but for now let's just look at the configuration needed for just a single IService today.
Look how ugly this is in general (couldn't get CopyAsHtml to format this for some reason), and the duplicated information between the Profile nodes, the PluginFamily nodes, the Plugin nodes, and the Instance nodes.
<StructureMap MementoStyle='Attribute' DefaultProfile='Development'>
<Assembly Name="SomeAssembly"/>
<Profile Name="Production">
<Override Type="SomeAssembly.IService" DefaultKey="Production"/>
</Profile>
<Profile Name="Testing">
<Override Type="SomeAssembly.IService" DefaultKey="Testing"/>
</Profile>
<Profile Name="Development">
<Override Type="SomeAssembly.IService" DefaultKey="Development"/>
</Profile>
<PluginFamily Type="SomeAssembly.IService" Assembly="SomeAssembly">
<Plugin Type="SomeAssembly.ConcreteService" Assembly="SomeAssembly" ConcreteKey="Concrete"/>
<Instance Type="Concrete" Key="Production">
<Property Name="host" Value="PROD-SERVER"/>
<Property Name="port" Value="5050"/>
</Instance>
<Instance Type="Concrete" Key="Testing">
<Property Name="host" Value="TEST-SERVER"/>
<Property Name="port" Value="5050"/>
</Instance>
<Instance Type="Concrete" Key="Development">
<Property Name="host" Value="localhost"/>
<Property Name="port" Value="2000"/>
</Instance>
</PluginFamily>
</StructureMap>
A major part of my work for StructureMap 2.0 has been ease of use, and that has meant eliminating the duplication and mechanical steps in configuration. Below is the exact equivalent of the profile in StructureMap 2.0:
<StructureMap MementoStyle="Attribute" DefaultProfile="Development">
<Assembly Name="SomeAssembly"/>
<Profile Name="Production">
<Override Type="SomeAssembly.IService">
<Instance PluggedType="SomeAssembly.ConcreteService,SomeAssembly" host="PROD-SERVER" port="5050"/>
</Override>
</Profile>
<Profile Name="Testing">
<Override Type="SomeAssembly.IService">
<Instance PluggedType="SomeAssembly.ConcreteService,SomeAssembly" host="TEST-SERVER" port="5050"/>
</Override>
</Profile>
<Profile Name="Development">
<Override Type="SomeAssembly.IService">
<Instance PluggedType="SomeAssembly.ConcreteService,SomeAssembly" host="localhost" port="2000"/>
</Override>
</Profile>
</StructureMap>
All I really did was enable a user to make all the configuration inline in the Profile node itself. Just doing that took down the number of moving parts and centralized the semantic meaning of the profile configuration into one spot instead of being spread out throughout the Xml file. The underlying model of StructureMap is unchanged, only the configuration code got more sophisticated to streamline the user experience.
More than the Code
Anytime you talk about improving the way you create software it's very hard to treat coding, design, process, and infrastructure as separate topics because they're all tightly intertwined. You definitely want to apply the DRY Principle to your change management. Here are a couple examples of what I mean:
- Long lived code branches. A temporary branch that's short lived for production support or a risky change is one thing, but a long lived branch essentially represents a whole new system. I've seen a couple smaller product companies jeopardize their very existence by maintaining and extending customer specific branches of their system. Hot fixes and newly demanded features often had to be implemented several different times on somewhat divergent versions of the same code. Long lived branches need to be treated as a last resort. If there's any possible way to arrange your system to allow for customer specific features and customizations while maintaining one version of the core code, your company will be far better off. Microkernal designs with IoC engines (like StructureMap) can help. Orthogonal code will help by creating plenty of seams to allow for customization. Build and test automation makes changing code much less risky.
- WSDL or XSD schemas for integration. We hit this on my current project. Our new .Net client communicates with the existing Java server platform by sending Xml messages over a stateful socket. Quite naturally, we devolved into using XSD schema's to describe the contract of the messages. Great, we use the XSD.exe tool in .Net to codegen DTO classes on one side, and JAXB to do the same on the Java side. Both codebases need to have a copy of the XSD's, and that's what we did. A copy in the .Net SVN repository and another in the Java CVS repository. Needless to say, any change in schema from either side requires the XSD's to be copied back and forth. This situation has caused us no small amount of pain from mismatches in the Xml definitions. One way or another, the XSD definitions from .Net to Java need to be locked together automatically to shut down the potential discrepancies.
The Highlander Puts it all into Perspective
Bellware thought this was an awful analogy, so I absolutely have to use it. If you're a big fan of the cult movie Highlander (and who isn't?), this will put it all into perspective. The main characters in the movie were all striving to be the last one standing to win the "Prize." As Christopher Lambert and Sean Connery intone constantly throughout the movie, "there can be only one!"
The Don't Repeat Yourself Principle is "There can be only one!" (expression of any rule or functionality that could conceivably change)
Unfortunately I've been on and seen a couple projects where the basic architecture just didn't allow for outwardly small changes to be made efficiently. A nasty case of a Wormhole, plus clumsy or inefficient build processes, can make the simple addition of an extra piece of information from persistence to user interface turn into a living hell. In the Highlander, there is a scene where the bad guy, the "Kurgan" played by Canadian character actor extraordinaire Clancy Brown, wins a sword duel and disembowels Sean Connery's character. As the Kurgan twists the sword to inflict more pain he utters the line "it hurts, doesn't it!" When a request for a small change comes across your desk and all you can think about is all of the painful and tedious work it will take to get that change done, that's what I call the "Kurgan Moment."
To wrap up, the Highlander and DRY good, Kurgan and Wormhole bad.
Appropos of nothing here, Locke is easily the best character on Lost.