Archive for May 2010
Most of the research and practice literature on project management tends to view projects as being isolated from their environment. It is obvious to anyone who has worked on a project that this isn’t so. In view of this, it is useful to look at the relationship between projects and the organisations that host them. This post looks at this issue, drawing on a paper by Gernot Grabher entitled, Cool Projects, Boring Institutions: Temporary Collaboration in Social Context.
The emergence of projects
Grabher begins his discussion with a sketch of the how projects emerged as a distinct work form. Projects – i.e. time bound, goal focused activities – have always been around. The modern notion of a project, however, arose from a development philosophy that came out of the US Department of Defense in the 1950s. He states,
…Instead of fragmenting and pre-specifying the development of military technologies along functional disciplines, these technologies were described in relation to their objectives, i.e. the military parameters of these weapons. The pacing of these concentrated efforts was crucial: parameters had to be met, goals had to be accomplished according to a grand scheme (program?) to win the armament race. Development processes that earlier were seen as separate activities were now conceptualized as an integrated entity called a program, system or project. The overwhelming scale of these projects in terms of financial and scientific resources as well as their ambitious timing created formidable problems of coordination and control. Experiments with various forms of organizational control ultimately lead to the professionalization of the role of the project manager…
From thereon the concepts of projects and project management were taken up (with much enthusiasm and optimism) by business and industry. The formalization of various project management methodologies, standards , qualifications and trade journals can be seen a culmination of this process.
Given the military-industrial origins of the profession, it is easy to see why a “command and control” philosophy dominates much of project management thought and practice. Many of the early projects that are paraded as textbook examples of successful projects operated outside normal organizational oversight. They were, to a large extent, deliberately shielded from external influences. I believe this is why isolation from the environment is seen as a Good Thing by project managers – problems of coordination and control become so much simpler when one does not have to manage relationships and politics that are (perceived as being) external to a project. This practice may be necessary and workable for classified projects that run on billion dollar budgets, but it doesn’t work so well in environments that most project managers work in. Projects don’t take place in a vacuum; they are born, live and die in real-world organizations. To forget that is to see the “tree of the project” and miss the “forest of the organization.” This is particularly so because, unlike those near-mythical mega-projects of the 1950s, the efforts that you and I work on are deeply entwined with their hosting organizations.
Organisation-related characteristics of projects
Grabher then notes some characteristics of projects. I summarize these in the next few paragraphs.
First, it is interesting that the original meaning of the word “project” referred to a “proposal” or “idea”, rather than a “directed, time-bound effort.” Grabher points out that this shift in meaning was accompanied by a shift in focus: from project as idea (or goal) to project as process (or means to achieving a goal). Projects are thus seen as vehicles for achieving organisational goals.
Second, Grabher notes that projects are often hard to decompose into constituent tasks, and that such a (commonly agreed) decomposition is only possible when stakeholders interrelate with each other continually. This underscores the importance of communication in projects.
Third, Grabher highlights the importance of the project manager (he uses the term contractor) as the “lynchpin on whom trust is focused.” The role of the manager is particularly important in projects on which team members do not have the time to get to know each other well.
Fourth, the project manager / contractor is also the wielder of organizational authority as far as the project is concerned. He or she is, in this sense, a representative of the organization – a person whose presence underlines the fact that the project exists to achieve specified organizational goals.
Finally, deadlines are a defining aspect of projects. They serve several functions. For example, they ensure that a sense of urgency for action and progress remains through the duration of the project. They also might serve to legitimize execution of project work without external interference (this argument was frequently used in the military-industrial projects of the 1950s). But above all, the final deadline, which culminates in the termination of the project, also serves as a connector to the rest of the organization. It is a time in which handoffs, documentation, team disbanding etc. occurs, thus enabling the results and experiences from the project disperse into the wider organization.
Projects in organisations
The characteristics noted above highlight the dual nature of projects: on the one hand, as noted earlier, projects are seen as semi-autonomous temporary organisations, but on the other they are also firmly embedded within the hosting organisation. An effect of the latter is particularly evident in consulting and software services firms (or even corporate IT shops), which tend to do similar projects over and over. As Grabher notes,
[projects] apparently operate in a milieu of recurrent collaboration that, after several project cycles, fills a pool of resources and gels into latent networks. Project organising is mostly directed towards the actual realization of a potential that is generated and reproduced by the practice of drawing on core members of (successful) prior projects to serve on derivative successor projects. Such chains of repeated co-operation are held together (or cut off ) by the reputation members gain (or lose) in previous collaborations…
Another aspect of embedded-ness is the co-location of team members within a larger organizational milieu. The standard benefits of co-location are well known. These are:
- Savings of transactional costs such as those incurred in communication, supervision of staff at remote locations etc. See my post on a transaction cost view of outsourcing for more on this.
- Co-location improves the efficacy of communication by encouraging face-to-face interactions.
- It enables “near real-time” monitoring of the health of the project and its environment.
There’s more though. Grabher notes that in addition to the above “intentional” or “strategic” benefits, co-location also ensures that team members are exposed to the same organizational noise – which consists of a “concoction of rumours, impressions, recommendations, trade folklore and strategic misinformation (falsehoods!).” Co-location enables project teams to make collective sense of organisational noise – this shared understanding of the environment can contribute significantly to the creation of a team spirit.
A related notionis that of enculturation: that is, the process of becoming an accepted member of the group, or an insider. This has less to do with expertise and knowledge than learning the unspoken rules and norms of a community. Although becoming a member of a community has much to do with social interactions within the workplace, there is more: a lot of essential know-how and know-what is transferred through informal interactions between senior members of the team (who are often senior members of the organisation) and others.
Projects generally need to draw upon a range of organizational resources: people and physical infrastructure being the most obvious ones. Grabher notes that the increasing projectisation of organizations can be attributed to a perception that project-based management is an efficient way to allocate productive resources in a flexible manner (…whether this perception is correct, is another matter altogether). However, there are other less obvious influences that organisations exert too. For example, Grabher points out that organizational norms and rules provide the basis for the emergence of swift trust, which is trust based on roles and professional ability rather than individuals and personalities. Further, at a higher level, organizational culture plays a role in determining how a project is governed, managed and run. These explicit and implicit norms have a stabilising influence on projects.
In addition to the stabilizing influence of the hosting organisation, projects also offer opportunities to build and enhance links between organisations – for instance, strategic partnerships. This is, in effect, institution building aimed at leveraging the strengths of the participating organisations for a greater joint benefit. In such situations the participating organisations take on the role of “lynchpins” on whom trust is focused.
Grabher makes the point that firms (and institutions comprised of firms) not only provide resources that make projects possible, but also host a range of processes that are needed to organize and run projects. For one, projects are usually preceded by several organisational processes involving deliberation, selection and preparation. These activities have to occur for a project to happen, but they normally fall outside the purview of the project.
A somewhat paradoxical aspect of projects is although they offer the opportunity for enhancing organizational knowledge, this rarely happens in practice. The high pressure environment in projects leaves little time for formal training or informal learning, or even to capture knowledge in documents. To a large extent the hosting organisations are to blame: Grabher suggests that this paradox is a consequence of the lack of organizational redundancy in project-based organizing.
I’ll end this section with the observation that the social dimension of projects is often neglected. Projects are often hindered by organizational politics and inertia. Further, a large number of projects fail because of varying perceptions of project goals and the rationale behind them. Although it seems obvious that a project should not proceed unless all stakeholders have a shared understanding of objectives and the reasons for them, it is surprising how many projects drift along without it. Many project planners neglect this issue, and it invariably comes back to bite them.
In the conclusion to the paper, Grabher states:
The formation and operation of projects essentially relies on a societal infrastructure which is built on and around networks, localities, institutions and firms. Relations between temporary and permanent systems are not a matter of straightforward substitution but have to be regarded in terms of interdependence. ‘Cool’ projects, indeed, rely on ‘boring’ institutions…
This is unarguable, but it should also be kept in mind that projects are often subject to negative organisational influences which can slow them down, or even kill them altogether (which is perhaps why those early defence projects were set up as near-autonomous initiatives). So although it is true that projects are made possible and sustained by the organisations they’re embedded in, they are sometimes hindered by those very organisations .
To sum up in a line: Projects depend on organisations not only for material and human resources, but also draw sustenance from (and are affected by) the social environment and culture that exists within those organisations.
Managers make decisions based on incomplete information, so it is no surprise that the tools of probability and statistics have made their way into management practice. This trend has accelerated somewhat over the last few years, particularly with the availability of software tools that simplify much of the grunt-work of using probabilistic techniques such as Monte Carlo methods or Bayesian networks. Purveyors of tools and methodologies and assume probabilities (or more correctly, probability distributions) to be known, or exhort users to determine probabilities using relevant historical data. The word relevant is important: it emphasises that the data used to calculate probabilities (or distributions) should be from situations that are similar to the one at hand. This innocuous statement papers over a fundamental problem in the foundations of probability: the reference class problem. This post is a brief introduction to the reference class problem and its implications for project management.
I’ll begin with some background and then, after defining the problem, I’ll present a couple of illustrations of the problem drawn from project management.
Background and the Problem
The most commonly held interpretation of probability is that it is a measure of the frequency with which an event of interest occurs. In this frequentist view, as it is called, probability is defined as the ratio of the number of times the event of interest occurs to the total number of events. An example might help clarify what this means: the probability that a specific project will finish on time is given by the ratio of the number of similar projects that have finished on time to the total number of similar projects undertaken (including both on-time and not-on-time projects).
At first sight the frequentist approach seems a reasonable one. However, in this straightforward definition of probability lurks a problem: how do we determine which events are similar to the one at hand? In terms of the example: what are the criteria by which we can determine the projects that resemble the one we’re interested in? Do we look at projects with similar scope, or do we use size (in terms of budget, resources or other measure), or technology or….? There could be a range of criteria that one could use, but one never knows with certainty which one(s) is (are) the right one(s). Why is it an issue? It’s an issue because probability changes depending on the classification criteria used. This is the reference class problem.
The reference class problem arises when we want to assign a probability to a proposition (or sentence, or event) X, which may be classified in various ways, yet its probability can change depending on how it is classified.
Incidentally, in another paper entitled Conditional Probability is the Very Guide of Life, Hajek discusses how the reference class problem afflicts all major interpretations of probability, not just the frequentist approach. We’ll stick with the latter interpretation since it is the one used in project management practice and research… and virtually all the social and natural sciences to boot.
The reference class problem in project management
Let’s look at a couple of project management-related illustrations of the reference class problem.
First up, consider the technique of reference class forecasting which I’ve discussed in this post. Note that reference class forecasting technique is distinct from the reference class problem although, as we shall see in less than a minute, the technique is fatally afflicted by the problem.
What’s reference class forecasting? To quote from the post referenced earlier, the technique involves:
…creating a probability distribution of estimates based on data for completed projects that are similar to the one of interest, and then comparing the said project with the distribution in order to get a most likely outcome. Basically, [it] consists of the following steps:
- Collecting data for a number of similar past projects – these projects form the reference class. The reference class must encompass a sufficient number of projects to produce a meaningful statistical distribution, but individual projects must be similar to the project of interest.
- Establishing a probability distribution based on (reliable!) data for the reference class. The challenge here is to get good data for a sufficient number of reference class projects.
- Predicting most likely outcomes for the project of interest based on comparisons with the reference class distribution.
Now, the key assumption in reference class forecasting is that it is possible to identify a number of completed projects that are similar to the one at hand. But what does “similar” mean? Clearly the composition of the reference class depends on the similarity criteria used, and consequently so does the resulting distribution. Reference class forecasting is a victim of the reference class problem!
The reference class problem will affect any technique that uses arbitrary criteria to determine the set of all possible events. As another example, the probability distributions used in Monte Carlo simulations (of project cost, duration or whatever) are determined using historical data. Again, typically one selects projects (or tasks – if one is doing a task level simulation) that are similar to the one at hand. Defining “similar” is left to common sense or expert judgement or some other subjective approach. Yet, by the most commonly used definition, a project is a “temporary endeavor, having a defined beginning and end, undertaken to meet unique goals and objectives”. By definition, therefore, we never do the same project twice – at best we do the same project differently (and the same applies to tasks). So, despite ones best intentions and efforts, historical data can never be totally congruent to the situation at hand. There will always be differences, and one cannot tell with certainty that those differences do not matter.
Truth be told, most organizations do not retain data on completed projects – except superficial stuff that isn’t much use. The reference class problem seems to justify the position of this slack majority. After all, why bother keeping data when one isn’t able to use it to predict project performance. This argument is wrong-headed: although one cannot use it to calculate probabilities, historical data is useful because it keeps us from repeating our errors. Just don’t expect the data to yield reliable quantitative information on probabilities.
Before I close this piece, I should clarify that there are areas in which the reference class problem is not an issue. In physics, for example, the discipline of statistical mechanics is founded on the principle that the average motion of large collections of molecules can be treated statistically. Clearly, there is no problem here: molecules are indeed indistinguishable from each other, so it is clear that a particular molecule (of a gas in a container of carbon dioxide, say) belongs to the reference class of all carbon dioxide molecules in that container. In general this is true of any situation where one is dealing with a large collection of identical (or very similar) entities.
The reference class problem affects most probabilistic methods in project management and other areas of the social sciences. It is a problem because it is often impossible to know beforehand which attributes of the objects or events of interest are the most significant ones. Consequently it is impossible to determine with certainty whether or not a particular object or event belongs to a defined reference class.
I’ll end with an anecdote to illustrate my point:
Some time ago I was asked to provide estimates for design work that was superficially similar to something I’d done before. “You’ve done this before,” a manager said, “so you should be able to estimate this quite accurately.”
As many best practices and methodologies recommend, I used a mix of historical data and “expert” judgement (and added in a dash of padding) to arrive at (what I thought was) a robust estimate. To all you agilists out there, an incremental approach was not an option in this case.
I got it wrong – badly wrong. It turned out that the unique features of the project, which weren’t apparent at first, made a mockery of my estimates. I didn’t know it then, but I’d fallen victim to the reference class problem.
Finally, it should be clear that although my examples are project management focused, the arguments are quite general. They apply to all areas of management theory and practice, and indeed to most areas of inquiry that use probabilistic techniques. To use the words of Alan Hajek: the reference class problem is your problem too.
I’ll begin with an example. Assume you’re having a dishwasher installed in your kitchen. This (simple?) task requires the services of a plumber and an electrician, and both of them need to be present to complete the job. You’ve asked them to come in at 7:30 am. Going from previous experience, these guys are punctual 50% of the time. What’s the probability that work will begin at 7:30 am?
At first sight, it seems there’s a 50% chance of starting on time. However, this is incorrect – the chance of starting on time is actually 25%, the product of the individual probabilities for each of the tradesmen. This simple example illustrates the central theme of a book by Sam Savage entitled, The Flaw of Averages: Why We Underestimate Risk in the Face of Uncertainty. This post is a detailed review of the book.
The key message that Savage conveys is that uncertain quantities cannot be represented by single numbers, rather they are a range of numbers each with a different probability of occurrence. Hence such quantities cannot be manipulated using standard arithmetic operations. The example mentioned in the previous paragraphs illustrate this point. This is well known to those who work with uncertain numbers (actuaries, for instance), but is not so well understood by business managers and decision makers. Hence the executive who asks his long-suffering subordinate to give him a projected sales figure for next month, with the quoted number then being taken as the 100% certain figure. Sadly such stories are more the norm than the exception, so it is clear that there is a need for a better understanding of how uncertain quantities should be interpreted. The main aim of the book is to help those with little or no statistical training achieve that understanding.
Developing an intuition for uncertainty
Early in the book, Savage presents five tools that can be used to develop a feel for uncertainty. He refers to these tools as mindles – or mind handles. His five mindles for uncertainty are:
- Risk is in the eye of the beholder, uncertainty isn’t. Basically this implies that uncertainty does not equate to risk. An uncertain event is a risk only if there is a potential loss or gain involved. See my review of Douglas Hubbard’s book on the failure of risk management for more on risk vs. uncertainty.
- An uncertain quantity is a shape (or a distribution of numbers) rather than a single number. The broadness of the shape is a measure of the degree of uncertainty. See my post on the inherent uncertainty of project task estimates for an intuitive discussion of how a task estimate is a shape rather than a number.
- A combination of several uncertain numbers is also a shape, but the combined shape is very different from those of the individual uncertainties. Specifically, if the uncertain quantities are independent, the combined shape can be narrower (i.e. less uncertain) than that of the individual shapes. This provides the justification for portfolio diversification, which tells us not to put all our money on one horse, or eggs in one basket etc. See my introductory post on Monte Carlo simulations to see an example of how multiple uncertain quantities can combine in different ways.
- If the individual uncertain quantities (discussed in the previous point) aren’t independent, the overall uncertainty can increase or decrease depending on whether the quantities are positively or negatively related. The nature of the relationship (positive or negative) can be determined from a scatter plot of the quantities. See my post on simulation of correlated project tasks for examples of scatter plots. The post also discusses how positive relationships (or correlations) can increase uncertainty.
- Plans based on average numbers are incorrect on average. Using average numbers in plans usually entails manipulating them algebraically and/or plugging them into functions. Savage explains how the form of the function can lead to an overestimation or underestimation of the planned value. Although this sounds a somewhat abstruse, the basic idea is simple: manipulating an average number using mathematical operations will amplify the error caused by the flaw of averages.
Savage explains the above concepts using simple arithmetic supplemented with examples drawn from a range of real-life business problems.
The two forms of the flaw of averages
The book makes a distinction between two forms of the flaw of averages. In its first avatar, the flaw states that the combined average of two uncertain quantities equals the sum of their individual averages, but the shape of the combined uncertainty can be very different from the sum of the individual shapes (Recall that an uncertain number is a shape, but its average is a number). Savage calls this the weak form of the flaw of averages. The weak form applies when one deals with uncertain quantities directly. An example of this is when one adds up probabilistic estimates for two independent project tasks with no lead or lag between them. In this case the average completion time is the sum of the average completion times for the individual tasks, but the shape of the distribution of the combined tasks does not resemble the shape of the individual distributions. The fact that the shape is different is a consequence of the fact that probabilities cannot be “added up” like simple numbers. See the first example in my post on Monte Carlo simulation of project tasks for an illustration of this point.
In contrast, when one deals with functions of uncertain quantities, the combined average of the functions does not equal the sum of the individual averages. This happens because functions “weight” random variables in a non-uniform manner, thereby amplifying certain values of the variable. An example of this is where we have two sequential tasks with an earliest possible start time for the second. The earliest possible start time for the second task introduces a nonlinearity in cases where the first task finishes early (essentially because there is a lag between the finish of the first task and the start of the second in this situation). The constraint causes the average of the combined tasks to be greater than the sum of the individual averages. Savage calls this the strong form of the flaw of averages. It applies whenever one deals with nonlinear functions of uncertain variables. See the second example in my post on Monte Carlo simulation of multiple project tasks for an illustration of this point.
Much of the book presents real-life illustrations of the two forms of the flaw in risk assessment, drawn from finance to the film industry and from petroleum to pharmaceutical supply chains. He also covers the average-based abuse of statistics in discussions on topical “hot-button” issues such as climate change and health care.
A layperson-friendly feature of the book is that it explains statistical terms in plain English. As an example, Savage spends an entire chapter demystifying the term correlation using scatter plots . Another term that he explains is the Central Limit Theorem (CLT), which states that the sum of independent random variables resembles the Normal (or bell-shaped) distribution. A consequence of CLT is that one can reduce investment risk by diversifying one’s investments – i.e. making several (small) independent investments rather than a single (large) one – this is essentially mindle # 3 discussed earlier.
Towards the middle of the book, Savage makes a foray into decision theory, focusing on the concept of value of information. Since decisions are (or should be) made on the basis of information, one needs to gather pertinent information prior to making a decision. Now, information gathering costs money (and time, which translates to money). This brings up the question as to how much should one spend in collecting information relevant to a particular decision? It turns out that in many cases one can use decision theory to put a dollar value on a particular piece of information. Surprisingly it turns out that organisations often over-spend in gathering irrelevant information. Savage spends a few chapters discussing how one can compute the value of information based on simple techniques of decision theory. As interesting as this section is, however, I think it is a somewhat disconnected from the rest of the book.
Curing the flaw: SIPs, SLURPS and Probability Management
The last part of the book is dedicated to outlining a solution (or as Savage calls it, a cure) to average-based – or flawed – statistical thinking. The central idea is to use pre-generated libraries of simulation trials for variables of interest. Savage calls such a packaged set of simulation trials a Stochastic Information Packet (SIP). Here’s an example of how it might work in practice:
Most business organisations worry about next year’s sales. Different divisions in the organisation might forecast sales using different of techniques. Further, they may use these forecasts as the basis for other calculations (such as profit and expenses for example). The forecasted numbers cannot be compared with each other because each calculation is based on different simulations or worse, different probability distributions. The upshot of this is that forecasted sales results can’t be combined or even compared. The problem can be avoided if everyone in the organisation uses the same SIP for forecasted sales. The results of calculations can be compared, and even combined, because they are based on the same simulation.
Calculations that are based on the same SIP (or set of SIPs) form a set of simulations that can be combined and manipulated using arithmetic operations. Savage calls such sets of simulations, Scenario Library Units with Relationships Preserved (or SLURPS). The name reflects the fact that each of the calculations is based on the same set of sales scenarios (or results of simulation trials). Regarding the terminology: I’m not a fan of laboured acronyms, but concede that they can serve as a good mnemonics.
The proposed approach ensures that the results of the combined calculations will avoid the flaw of averages,and exhibit the correct statistical behaviour. However, it assumes that there is an organisation-wide authority responsible for generating and maintaining appropriate SIPs. This authority – the probability manager – will be responsible for a “database” of SIPs that covers all uncertain quantities of interest to the business, and make these available to everyone in the organisation who needs to use them. To quote from the book, probability management involves:
…a data management system in which the entities being managed are not numbers, but uncertainties, that is, probability distributions. The central database is a Scenario Library containing thousands of potential future values of uncertain business parameters. The library exchanges information with desktop distribution processors that do for probability distributions what word processors did for words and what spreadsheets did for numbers.
Savage sees probability management as a key step towards managing uncertainty and risk in a coherent manner across organisations. He mentions that some organizations that have already started down this route (Shell and Merck, for instance). The book can thus also be seen as a manifesto for the new discipline of probability management.
I have come across the flaw of averages in various walks of organizational life ranging from project scheduling to operational risk analysis. Most often, the folks responsible for analysing uncertainty are aware of the flaw, and have the requisite knowledge of statistics to deal with it. However, such analyses can be hard to explain to those who lack this knowledge. Hence managers who demand a single number. Yes, such attitudes betray a lack of understanding of what uncertain numbers are and how they can be combined, but that’s the way it is in most organizations. The book is directed largely to that audience.
To sum up: the book is an entertaining and informative read on some common misunderstandings of statistics. Along the way the author translates many statistical principles and terms from “jargonese” to plain English. The book deserves to be read widely, especially by those who need it the most: managers and other decision-makers who need to understand the arithmetic of uncertainty.