Eight to Late

Sensemaking and Analytics for Organizations

Posts Tagged ‘Legacy Code

Maintenance matters

with 8 comments

Corporate developers spend majority of their programming time doing maintenance work.   My basis for this claim is two years worth of statistics that I have been gathering at my workplace. According to these figures, my group spends about 65 percent of their programming time on maintenance  (with  some developers spending considerably more, depending on the applications they support).   I suspect these numbers are applicable to most corporate IT shops – and possibly, to a somewhat smaller extent, to  software houses as well.  Unfortunately, maintenance work is often looked upon as being “inferior to” development.  This being the case,  it is worth dispelling some myths  about maintenance programming.  As it happens, I’ve just finished reading Robert Glass‘ wonderful book,  Facts and Fallacies of Software Engineering, in which he presents some interesting facts about software maintenance (among lots of other interesting facts).  This post looks at these facts which, I think,  some readers may find surprising.

Let’s get right to it.  Fact 41 in the book reads:

Maintenance typically consumes 40 to 80 percent (average 60 percent) of software costs. Therefore, it is probably the most important life cycle phase of software.

 Surprised? Wait, there’s more: Fact 42 reads: 

Enhancement is responsible for roughly 60 percent of software maintenance costs. Error correction is roughly 17 percent. Therefore software maintenance is largely about adding new capability to old software, not fixing it.

 As a corollary to Fact 42, Glass unveils Fact 43, which simply states that:

 Maintenance is a solution, not a problem.

 Developers who haven’t done any maintenance work may be surprised by these facts. Most corporate IT developers have done considerable maintenance time; so no one in my mob was  surprised when I mentioned these during a coffee break conversation.  Based on the number   quoted in the first paragraph (65 percent maintenance) and Glass’s figure (60 percent of maintenance is modification work), my colleagues  spend close to 40 percent of their time of  enhancing existing applications. All of them reckon this number is about right, and their thinking is  supported by my data.

 A few weeks ago, I wrote a piece entitled the legacy of legacy software in which I pointed out that legacy code is a problem for historians and programmers alike. Both have to understand legacy code, albeit in different ways. The historian needs to understand how it developed over the years so that he can understand its history; why it is the way it is and what made it so. The programmer has a more pragmatic interest – she needs to understand how it works so that she can modify it.  Now, Glass’ Fact 42 tells us that much of maintenance work is adding new functionality. New functionality implies new code, or at least substantial modifications of existing code.  Software is therefore  a palimpsest – written once, and then overwritten again and again.

The maintenance programmer whose job it is to modify legacy code has to first understand it. Like a historian or archaeologist decoding a palimpsest, she has to sort through layers of modifications made by different people at different times for different reasons. The task is often made harder by the fact that modifications are often under-documented (if not undocumented).   In Fact 44 of the book,   Glass states that this effort of understanding code – an effort that he calls undesign – makes up about 30 percent of the total time spent in maintenance. It is therefore the most significant maintenance activity.

But that’s not all.  After completing “undesign” the maintenance programmer has to design the enhancement within the context of the existing code – design under constraints, so to speak.   There are at least a couple of reasons why this is hard.  First,  as Brooks tells us in No Silver Bullet — design itself is hard work; it is one of the essential difficulties of software engineering.  Second, the original design is created with a specific understanding of requirements.  By the time modifications come around, the requirements may have changed substantially. These new requirements may conflict with the original design.  If so, the maintenance task becomes that much harder.

 Ideally, existing design documentation should ease the burden on the maintenance programmer. However it rarely does because such documentation is typically created in the design phase – and rarely modified to reflect design changes as the product is built. As a consequence, most design documentation is hopelessly out of date by the time the original product is released into production. To quote from the book:

Common sense would tell you that the design documentation, produced as the product is being built, would be an important basis for those undesign tasks. But common sense, in this case, would be wrong. As the product is built, the as-built program veers more and more away from the original design specifications. Ongoing maintenance drives the specs and product even further apart. The fact of the matter is, design documentation is almost completely untrustworthy when it comes to maintaining a software product. The result is, almost all of that undesign work involves reading of code (which is invariably up to date) and ignoring the documentation (which commonly is not).

 So, one of the main reasons maintenance work is hard is that the programmer has to expend considerable effort in decoding someone else’s code (some might argue that this is the most time consuming part of undesign). Programmers know that it is hard to infer what a program does by reading it, so the word “code” in the previous sentence could well be used in the sense of code as an obfuscated or encrypted message. As Charles Simonyi said in response to an Edge question:

 Programmers using today’s paradigm start from a problem statement, for example that a Boeing 767 requires a pilot, a copilot, and seven cabin crew with various certification requirements for each—and combine this with their knowledge of computer science and software engineering—that is how this rule can be encoded in computer language and turned into an algorithm. This act of combining is the programming process, the result of which is called the source code. Now, programming is well known to be a difficult-to-invert function, perhaps not to cryptography’s standards, but one can joke about the possibility of the airline being able to keep their proprietary scheduling rules secret by publishing the source code for the implementation since no one could figure out what the rules were—or really whether the code had to do with scheduling or spare parts inventory—by studying the source code, it can be that obscure.

  Glass offers up one final maintenance-related fact in his book (Fact 45):

 Better software engineering leads to more maintenance, not less.

 Huh? How’s that possible.

 The answer is actually implicit in the previous facts and Simonyi’s observation: in the absence of documentation, the ease with which modifications can be made is directly related to the ease with which the code can be understood. Well designed systems are easier to understand, and hence can be modified more quickly. So, in a given time interval, a well designed system will have more modifications done to it than one that is not so well designed. Glass mentions that this is an interesting manifestation of Fact 43: Maintenance as a solution, rather than a problem.

Towards the end of the book, Glass presents the following fallacy regarding maintenance:

The way to predict future maintenance costs and to make product replacement decisions is to look at past cost data.

The reason that prediction based on past data  doesn’t work is that a plot of maintenance costs vs. time plot has a bathtub shape. Initially, when a product is just released,   there is considerable maintenance work (error fixing and enhancements)  done on it. This decreases in time, until it plateaus out. This is the “stable” region corresponding to the period when the product is being used with relatively few modifications or error fixes.  Finally, towards the end of the product’s useful life, enhancements and error fixes become more expensive as technology moves on and/or the product begins to push the limits of its design. At this point costs increase again, often quite steeply.  The point Glass makes is that, in general, one does not know where the product is  on this bathtub curve. Hence, using past data to make predictions is fraught with risk – especially if one is near an inflection point, where the shape of the curve is changing.So what’s the solution? Glass suggests asking customer about their expectations regarding the future of the  product, rather than trying to extrapolate from past data.

Finally, Glass has this to say about replacing software:

Most companies find that retiring an existing software product is nearly impossible. To build  a replacement requires a source of the requirements that match the current version of the product, and those requirements probably don’t exist anywhere. They’re not in the documentation because it wasn’t kept up to date. They’re not to be found from the original customers or users or developers because those folks are long gone…They may be discernable form reverse engineering the existing product, but that’s an error-prone and undesirable task that hardly anyone wants to tackle. To paraphrase an old saying, “Old software never dies, it just tends to fade away.”

And it’s the maintenance programmer who extends its life, often way beyond original design and intent. So, maintenance matters because it adds complexity to the  legacy of legacy software. But above all it matters because it is a solution, not a problem.

Written by K

July 16, 2009 at 10:17 pm

The legacy of legacy software

with 4 comments

Introduction

On a recent ramble through Google Scholar, I stumbled on a fascinating paper by Michael Mahoney entitled, What Makes the History of Software Hard.  History can offer interesting perspectives on the practice of a profession. So it is with this paper. In this post I review the paper, with an emphasis on the insights it provides into the practice of software development.

Mahoney’s thesis is that,

The history of software is the history of how various communities of practitioners have put their portion of the world into the computer. That has meant translating their experience and understanding of the world into computational models, which in turn has meant creating new ways of thinking about the world computationally and devising new tools for expressing that thinking in the form of working programs….

In other words, software– particularly application software – embodies real world practices.  As a consequence,

 …the models and tools that constitute software reflect the histories of the communities that created them and cannot be understood without knowledge of those histories, which extend beyond computers and computing to encompass the full range of human activities

This, according Mahoney, is what makes the history of software hard.

 The standard history of computing

The standard (textbook) history of computing is hardware-focused: a history of computers rather than computing. The textbook version follows a familiar tune starting with the abacus and working its way up via analog computers, ENIAC, mainframes, micros, PCs and so forth. Further, the standard narrative suggests that each of these were invented in order to satisfy a pre-existing demand, which makes their appearance almost inevitable. In Mahoney’s words,

 …Just as it places all earlier calculating devices on one or more lines leading toward the electronic digital computer, as if they were somehow all headed in its direction, so too it pulls together the various contexts in which the devices were built, as if they constituted a growing demand for the invention of the computer and as if its appearance was a response to that demand.

Mahoney says that this is misleading for because,

 …If people have been waiting for the computer to appears as the desired solution to their problems, it is not surprising that they then make use of it when it appears, or indeed that they know how to use it

Further, it

sets up a narrative of revolutionary impact, in which the computer is brought to bear on one area after another, in each case with radically transformative effect….”

The second point – revolutionary impact – is interesting because we still suffer its fallout: just about every issue of any trade journal has an article hyping the Next Big Computing Revolution. It seems that their writers are simply taking their cues from history.  Mahoney puts it very well,

One can hardly pick up a journal in computing today without encountering some sort of revolution in the making, usually proclaimed by someone with something to sell. Critical readers recognise most of it as hype based on future promise than present performance…

The problem with revolutions, as Mahoney notes, is that they attempt to erase (or rewrite) history, ignoring the real continuities and connections between present and the past,

  Nothing is in fact unprecedented, if only because we use precedents tot recognise, accommodate and shape the new…

CIOs and other decision makers, take note!

 But what about software?

The standard history of computing doesn’t say much about software,

 To the extent that the standard narrative covers software, the story follows the generations of machines, with an emphasis on systems software, beginning with programming languages and touching—in most cases, just touching—on operating systems, at least up to the appearance of time-sharing. With a nod toward Unix in the 1970s, the story moves quickly to personal computing software and the story of Microsoft, seldom probing deep enough to reveal the roots of that software in the earlier period.

As far as applications software is concerned –whether in construction, airline ticketing or retail – the only accounts that exist are those of pioneering systems such as the Sabre reservation system. Typically these efforts focus on the system being built, excluding any context and connection to the past. There are some good “pioneer style” histories:  an example is Scott Rosenberg’s book Dreaming in Code – an account of the Chandler software project. But these are exceptions rather than the rule.

In the revolutionary model, people react to computers. In reality, though, it’s the opposite: people figure out ways to use computers in their areas of expertise. They design and implement programs to make computers do useful things. In doing so, they make choices:

 Hence, the history of computing, especially of software, should strive to preserve human agency by structuring its narratives around people facing choices and making decisions instead of around impersonal forces pushing people in a predetermined direction. Both the choices and the decisions are constrained by the limits and possibilities of the state of the art at the time, and the state of the art embodies its history to that point.

The early machines of the 1940s and 50s were almost solely dedicated to numerical computations in the mathematical and physical sciences. Thereafter, as computing became more “mainstream” other communities of practitioners started to look at how they might use computers:

These different groups saw different possibilities in the computer, and they had different experiences as they sought to realize those possibilities, often translating those experiences into demands on the computing community, which itself was only taking shape at the time.

But these different communities have their own histories and ways of doing things – i.e. their own, unique worlds. To create software that models these worlds, the worlds have to be translated into terms the computer can “understand” and work with. This translation is the process of software design. The software models thus created embody practices that have evolved over time. Hence, the models also reflect the histories of the communities that create them.

 Models are imperfect

There is a gap between models and reality, though. As Mahoney states,

 …Programming is where enthusiasm meets reality. The enduring experience of the communities of computing has been the huge gap between what we can imagine computers doing and what we can actually make them do.

This lead to the notion of a “software crisis: and calls to reform the process of software development, which in turn gave rise to the discipline of software engineering. Many improvements resulted: better tools, more effective project management, high-level languages etc. But all these, as Brooks pointed out in his classic paper, addressed issues of implementation (writing code) not those of design (translating reality into computable representations).  As Mahoney state,

 …putting a portion of the world into the computer means designing an operative representation of that portion of the world that captures what we take to be its essential features. This has proved, as I say, no easy task; on the contrary it has proved difficult, frustrating and in some cases disastrous.

The problem facing the software historian is that he or she has to uncover the problem context and reality as perceived by the software designer, and thus reach an understanding of the design choices made. This is hard to do because it is implicit in the software artefact that the historian studies.  Documentation is rarely any help here because,

 …what programs do and what the documentation says they do are not always the same thing.  Here, in a very real sense, the historian inherits the problems of software maintenance: the farther the program lies from its creators, the more difficult it is to discern its architecture and the design decisions that inform it.

There are two problems here:

  1.  That software embodies a model of some aspect of reality.
  2. The only explanation of the model is the software itself.

As Mahoney puts it,

Legacy code is not just old code, but rather a continuing enactment, an operative representation, of the domain knowledge embodied in it. That may explain the difficulties software engineers have experienced in upgrading and replacing older systems.

Most software professionals will recognise the truth of this statement.

 The legacy of legacy code

The problem is that new systems promise much, but are expensive and pose too many risks.  As always continuity must be maintained, but this is nigh impossible because no one quite understands the legacy bequeathed by legacy code: what it does, how it does it and why it was designed so. So, customers play it safe and legacy code lives on. Despite all the advances in software engineering, software migrations and upgrades remain fraught with problems.

Mahoney concludes with the following play on the word “legacy”,

 This situation (the gap between the old and the new) should be of common interest to computer people and to historians. Historians will want to know how it developed over several decades and why software systems have not kept pace with advances in hardware. That is, historians are interested in the legacy. Even as computer scientists wrestle with a solution to the problem the legacy poses, they must learn to live with it. It is part of their history, and the better they understand it, the better they will be able to move on from it.

This last point should be of interest to those running software development projects in corporate IT environments (and to a lesser extent those developing commercial software). An often unstated (but implicit) requirement is that the delivered software must maintain continuity between the past and present. This is true even for systems that claim to represent a clean break from the past; one never has the luxury of a completely blank slate, there are always arbitrary constraints placed by legacy systems. As Fred Brooks mentions in his classic article No Silver Bullet,

…In many cases, the software must conform because it is the most recent arrival on the scene. In others, it must conform because it is perceived as the most conformable. But in all cases, much complexity comes from conformation to other interfaces…

So, the legacy of legacy software is to add complexity to projects intended to replace it. Mahoney’s concluding line is therefore just as valid for project managers and software designers as it is for historians and computer scientists: project managers and software designers must learn to live with and understand this complexity before they can move on from it.

Written by K

June 11, 2009 at 10:29 pm

%d bloggers like this: