Eight to Late

Archive for the ‘portfolio management’ Category

On the limitations of scoring methods for risk analysis

with 4 comments

Introduction

A couple of months ago I wrote an article highlighting some of the pitfalls of using risk matrices. Risk matrices are an example of scoring methods , techniques which use ordinal scales to assess risks. In these methods,  risks are ranked by some predefined criteria such as impact or expected loss, and the ranking  is then used as the basis for  decisions on how the risks should be addressed. Scoring methods are popular because they are easy to use. However,  as Douglas Hubbard points out in his critique of current risk management practices, many commonly used scoring techniques are flawed. This post – based on Hubbard’s critique and research papers quoted therein -  is a brief look at some of the flaws of risk scoring techniques.

Commonly used risk scoring techniques and problems associated with them

Scoring techniques fall under two major categories:
 

  1. Weighted scores: These use several ordered scales which are weighted according to perceived importance. For example: one might be asked to rate financial risk, technical risk and organisational risk on a scale of 1 to 5 for each, and then weight then by factors of 0.6, 0.3 and 0.1 respectively (possibly because the CFO – who happens to be the project sponsor – is more concerned about financial risk than any other risks ).
  2. Risk matrices: These rank risks along two dimensions – probability and impact – and assign them a qualitative ranking of high, medium or low depending on where they fall.  Cox’s theorem shows such categorisations are internally inconsistent because the category boundaries are arbitrarily chosen.

 Hubbard makes the point that, although both the above methods are endorsed by many standards and methodologies (including those used in project management), they should be used with caution because they are flawed. To quote from his book:

Together these ordinal/scoring methods are the benchmark for the analysis of risks and/or decisions in at least some component of most large organizations. Thousands of people have been certified in methods based in part on computing risk scores like this. The major management consulting firms have influenced virtually all of these standards. Since what these standards all have in common is the used of various scoring schemes instead of actual quantitative risk analysis methods, I will call them collectively the “scoring methods.” And all of them, without exception, are borderline or worthless. In practices, they may make many decisions far worse than they would have been using merely unaided judgements.

 What is the basis for this claim? Hubbard points to the following:

  1. Scoring methods do not make any allowance for flawed perceptions of analysts who assign scores – i.e. they do not consider the effect of cognitive bias. I won’t dwell on this as I have  previously written  about the effect of cognitive biases in project risk management -see this post and this one, for example.
  2. Qualitative descriptions assigned to each score are understood differently by different people. Further, there is rarely any objective guidance as to how an analyst is to distinguish between a high or medium risk. Such advice may not even help: research by Budescu, Broomell and Po shows that there can be huge variances in understanding of qualitative descriptions, even when people are given specific guidelines what the descriptions or terms mean.
  3. Scoring methods add their own errors.  Below are brief descriptions of some of these:
    1. In his paper on the risk matrix theorem, Cox mentions that “Typical risk matrices can correctly and unambiguously compare only a small fraction (e.g., less than 10%) of randomly selected pairs of hazards. They can assign identical ratings to quantitatively very different risks.” He calls this behaviour “range compression” – and it applies to any scoring technique that uses ranges.
    2. Assigned scores tend to cluster around the mid-low high range. Analysis by Hubbard shows that, on a 5 point scale, 75% of all responses are 3 or 4. This implies that changing a score from 3 to 4 or vice-versa can have a disproportionate effect on classification of risks.
    3. Scores implicitly assume that the magnitude of the quantity being assumed is directly proportional to the scale. For example, a score of 2 implies that the criterion being measured is twice as large as it would be for a score of 1. However, in reality, criteria are rarely linear as implied by such a scale.
    4. Scoring techniques often presume that the factors being scored are independent of each other independence – i.e. there are no correlations between factors. This assumption  is rarely tested or justified in any way. 

Many project management standards advocate the use of scoring techniques.  To be fair, in many situations they are adequate as long as they are used with an understanding of their limitations. Seen in this light, Hubbard’s book is  an admonition to standards and textbook writers to be more critical of the methods they advocate, and a warning to practitioners that an uncritical adherence to standards and best practices is not the best way to manage project risks .

Scoring done right

Just to be clear, Hubbard’s criticism is directed against  scoring methods that use arbitrary, qualitative scales which are not justified by independent analysis. There are other techniques which, though superficially similar to these flawed scoring methods, are actually quite robust because they are:

  1. Based on observations. 
  2. Use real measures (as opposed to arbitrary ones – such as “alignment with business objectives” on a scale of 1 to 5, without defining what ”alignment” means.) 
  3. Validated after the fact (and hence refined with use).

As an example  of a sound scoring technique, Hubbard quotes this paper by Dawes, which presents evidence that linear scoring models are superior to intuition in clinical judgements. Strangely, although the weights themselves can be obtained through intuition, the scoring model outperforms clinical intuition. This happens because human intuition is good at identifying important factors, but not so hot at evaluating the net effect of several, possibly competing factors. Hence simple linear scoring models can outperform intuition. The key here is that the models are validated by checking the predictions against reality.

Another class of techniques use axioms based on logic to reduce inconsistencies in decisions. An example of such a technique is multi-attribute utility theory. Since they are based on logic, these methods can also be considered to have a solid foundation unlike those discussed in the previous section.

 Conclusions

 Many commonly used scoring methods in risk analysis are based on flaky theoretical foundations – or worse, none at all. To compound the problem, they are often used without any validation.  A particularly ubiquitous example is the well-known and loved risk matrix.  In his paper on risk matrices,  Tony Cox  shows how risk matrices can sometimes lead to decisions that are worse than those made on the basis of a coin toss.   The fact that this is a possibility – even if only a  small one – should worry anyone who uses risk matrices  (or other flawed scoring techniques) without an understanding of their limitations.

Written by K

October 6, 2009 at 8:27 pm

Monte Carlo simulation of multiple project tasks – three examples and some general comments

with 9 comments

Introduction

In my previous post I  demonstrated the use of a Monte Carlo technique in simulating  a single project task with completion times  described by a triangular distribution.  My aim in that article was to:  a) describe a Monte Carlo simulation procedure in enough detail for someone interested to be able to reproduce the calculations and b) show that it gives sensible results in a situation where the answer is known.  Now it’s time to take things further.  In this post, I present simulations for two tasks chained together in various ways.  We shall see that, even with this small increase in complexity (from one task to two), the results obtained can be surprising.  Specifically, small changes in inter-task dependencies can have a huge effect on the overall (two-task) completion time distribution. Although, this is something that that most project managers have experienced in real life, it is rarely taken in to account by traditional scheduling techniques. As we shall see, Monte Carlo techniques predict such effects as a matter of course.

Background

The three simulations discussed here are built on the example that I used in my previous article, so it’s worth spending a few lines for a brief recap of that example.  The task simulated in the example was assumed to be described by a triangular distribution with  minimum completion time (t_{min}) of 2  hours,   most likely completion time (t_{ml}) of 4 hours and   a maximum completion time (t_{max}) of 8 hours.   The resulting triangular probability distribution function (PDF), p(t)  –  which gives the probability of completing the task at time t – is shown in Figure 1.

Figure 1 - PDF for triangular distribution (tmin=2, tml=4, tmax=8)

Figure 1 - PDF for triangular distribution (tmin=2, tml=4, tmax=8)

Figure 2 depicts the associated cumulative distribution function (CDF), P(t)  which gives the probability that a task will be completed by time t (as opposed to the PDF which specifies the probability of completion at time t). The value of the CDF at t=8 is 1 because the task must finish within 8 hrs.

Figure 2 - PDF for triangular distribution (tmin=2, tml=4, tmax=6)

Figure 2 - PDF for triangular distribution (tmin=2, tml=4, tmax=6)

The equations describing the PDF and CDF are listed in equations 4-7 of my previous article.  I won’t rehash them here as they don’t add anything new  to the discussion – please see the article for all the gory algebraic details and formulas.   Now, with that background, we’re ready to move on to the examples.

Two tasks in series with no inter-task delay

As a first example, let’s look at two tasks that have to be performed sequentially – i.e. the second task starts as soon as the first one is completed. To simplify things, we’ll also assume that they have identical (triangular) distributions as described earlier and shown in Figure 1  (excepting , of course,  that the distribution is shifted to the right for the second task  – since it starts after the first one finishes).  We’ll also  assume that the second task begins right after the first one is completed (no inter-task delay) – yes, this is unrealistic, but bear with me.  The simulation algorithm for the  combined tasks is very similar to the one for a single task (described in detail in my previous post). Here’s the procedure:

  1. For each of the two tasks, generate a set of N random numbers. Each random number generated corresponds to the cumulative probability of completion for a single task on that particular run.
  2. For each random number generated, find the time corresponding to the cumulative probability by solving equation 6 or 7 in my previous post.
  3. Step 2 gives N sets of completion times. Each set has two completion times – one for each tasks.
  4. Add up the two numbers in each set to yield the comple. The resulting set corresponds to N simulation runs for the combined task.

I then divided the time interval from t=4 hours  (min possible completion time for both tasks) to t=16 hours (max possible completion time for both tasks) into bins of 0.25 hrs each, and then assigned each combined completion time to the appropriate bin. For example, if the predicted completion time for a particular run was 9.806532 hrs, it was assigned to the bin corresponding to 0.975 hrs.  The resulting histogram is shown in Figure 3 below (click on image to view the full-size graphic).

Figure 3 - Frequency histogram for tasks in series with no delay

Figure 3 - Frequency histogram for tasks in series with no inter-task delay

[An aside:  compare the histogram in Figure 3 to the one for a single task (Figure 1):  the distribution for the single task is distinctly asymmetric (the peak is not at the centre of the distribution) whereas the two task histogram is nearly symmetric.  This surprising result is a consequence of the Central Limit Theorem (CLT) – which states that the sum of many identical distributions tends to resemble the Normal (Bell-shaped) distribution, regardless of the shape of the individual distributions.   Note that the CLT holds even though the two task distributions are shifted relative to each other – i.e. the second task begins after the first one is completed.]

The simulation also enables us to compute the cumulative probability of completion for the combined tasks (the CDF). This value of the cumulative probability at a particular bin equals the sum of the number of simulations runs in every bin up to (and including) the bin in question, divided by the total number of simulation runs. In mathematical terms this is:

P(t_{i})=(n_{1}+n_{2}+...+n_{i})/ N \ldots \ldots (1)

where P(t_{i})  is the cumulative probability at the time corresponding to the  ith  bin, n_{i}, the number of simulation runs in the ith  bin and  N  the total number of simulation runs. Note that this formula is an approximate one because time is treated as a constant within each bin. The approximation can be improved by making the bins smaller (and hence increasing the number of bins).

The resulting cumulative probability function is shown in Figure 4. This allows us to answer questions such as:  “What is the probability that the tasks will be completed within 10 days?”. Answer:  .698, or approximately 70%. (Note:  I obtained this number by interpolating between values obtained from equation (1), but this level of precision is uncalled for, particularly because the formula is an approximate one)

Figure 4 - CDF for tasks in series with no inter-task delay

Figure 4 - CDF for tasks in series with no inter-task delay

Many project scheduling techniques compute average completion times for component tasks and then add them up to get the expected completion time for the combined task. In the present case the average works out to 9.33 hrs (twice the average for a single task). However, we see from the CDF that there is a significant probability (.43) that we will not finish by this time – and this in a “best-case ” situation where the second task starts immediately after the first one finishes!

[An aside: If one applies the  well-known PERT formula (t_{min}+4t_{ml}+t_{max})/ 6  to each of the tasks, one gets an expected completion time  of 8.66 hrs for the combined task.  From the CDF one can show that there is a  probability of non-completion of 57%  by t=8.66 hours (see Figure 4) - i.e. there’s a greater than even chance of not finishing by this time!]

As interesting as this case is, it is somewhat unrealistic because successor tasks seldom start the instant the predecessor is completed. More often than not, there is a cut-off time before which the successor cannot start – even if there are no explicit dependencies between the two tasks.  This observation is a perfect segue into my next example, which is…

Two tasks in series with a fixed earliest start for the successor

Now we’ll introduce a slight complication: let’s assume, as before, that the two tasks are done one after the other but that the earliest the second task can start is at t= 6 hours  (as measured from the start of the first task). So, if the first task finishes within 6 hours, there will be a delay between its completion and the start of the second task. However, if the first task takes longer than 6 hours to finish, the second task will start soon after the first one finishes.  The simulation procedure is the same as described in the previous section excepting for the last step – the completion time for the combined task is given by:

t=t_{1}+t_{2}, for t \geq  6 hrs and t=6+t_{2}, for t < 6 hrs

I divided the time interval from t=4hrs to t=20 hrs into bins of 0.25 hr duration (much as I did before) and then assigned each combined completion time to the appropriate bin. The resulting histogram is shown in Figure 5.

Figure 5 - Frequency histogram for tasks in series with inter-task delay

Figure 5 - Frequency histogram for tasks in series with inter-task delay

Comparing Figure 5 to Figure 3, we see that the earliest possible finish time now increases from 4 hrs to 8 hrs. This is no surprise, as we built this in to our assumption.  Further, as one might expect, the distribution is distinctly asymmetric – with a minimum completion time of 8 hrs, a most likely time between 10 and 11 hrs and a maximum completion time of about 15 hrs.

Figure 6 shows the cumulative probability of completion for this case.

Figure 6 - CDF for tasks in series with inter-task delay

Figure 6 - CDF for tasks in series with inter-task delay

Because of the delay condition, it is impossible to calculate the average completion from the formulas for the triangular distribution – we have to obtain it from the simulation.  The average can be calculated from the simulation adding up all completion times and dividing by the total number of simulations, N. In mathematical terms this is:

t_{av} = (t_{1} + t_{2} + ...+ t_{i} + ... + t_{N})/ N \ldots \ldots (2)

where t_{av} is the average time,  t_{i}  the completion time for the ith simulation run and   N the total number of simulation runs.

This procedure gave me a value of about 10.8  hrs for the average.  From the CDF in Figure 6 one sees that the probability that the combined task will finish by this time is only 0.60 – i.e. there’s only a 60% chance that the task will finish by this time.  Any naïve estimation of time would do just as badly unless, of course, one is overly pessimistic and assumes a completion time of 15 – 16 hrs.

From the above it should be evident that the simulation allows one to associate an uncertainty (or probability) with every estimate. If management imposes a time limit of 10 hours,  the project manager can refer to the CDF in Figure 6 and figure out the probability of completing the task by that time (there’s a 40 % chance of completion by 10 hrs).  Of course, the reliability of the numbers depend on how good the distribution is. But  the assumptions that have gone into the model are known -  the optimistic, most likely and pessimistic times and the form of the distribution - and these can be refined as one gains experience.

Two tasks in parallel

My final example is the case of two identical tasks performed in parallel. As above, I’ll assume the uncertainty in each task is characterized by a triangular distribution with t_{min}, t_{ml}  and t_{max}  of 2, 4 and 8 hours respectively. The simulation procedure for this case is the same as in the first example, excepting the last step. Assuming the simulated completion times for the individual tasks are t_{1} and t_{2}, the completion time for the combined tasks is given by the greater of the two – i.e. the combined completion time t is given by t =max(t_{1},t_{2}).

To plot the histogram shown in Figure 7 , I divided the interval from t=2 hrs to t=8 hrs into bins of 0.25 hr duration each (Warning: Note the difference in the time axis scale  from Figures 3 and 5!).

Figure 7 - Frequency histogram for tasks in parallel

Figure 7 - Frequency histogram for tasks in parallel

It is interesting to compare the above  histogram with that for an individual task with the same parameters (i.e. the example that was used in my previous post). Figure 8 shows the histograms for the two examples on the same plot (the combined task in red and the single task in blue). As one might expect, the histogram for the combined task is shifted to the right, a consequence of the nonlinear condition on the completion time.

Figure 8 - Histograms for tasks in parallel (red) and single task (blue)

Figure 8 - Histograms for tasks in parallel (red) and single task (blue)

What about the average? I calculated the average as before, by using equation (2) from the previous section. This gives an average of 5.38 hrs (compared to 4.67 hrs for either task, taken individually).   Note that the method to calculate the average is the same regardless of the form of the distribution. On the other hand,  computing the average from the equations would be a complicated affair, involving a stiff dose of algebra with an optional  sprinkling of  calculus.  Even worse – the calculations would vary from distribution to distribution. There’s no question that simulations are much easier.

The CDF for the overall completion time is also computed easily using equation (1). The resulting plot is shown in Figure 9  (Note the difference in the time axis scale  from Figures  4 and 6!). There are no surprises here – excepting how easy it is to calculate once the simulation is done.

Figure 9 - CDF for tasks in parallel

Figure 9 - CDF for tasks in parallel

Let’s see what time corresponds to a 90% certainty of completion. A rough estimate for this number can be obtained from Figure 9 – just find the value of t (on the x axis) corresponding to a cumulative probability of 0.9 (on the y axis).  This is the graphical equivalent of solving the CDF for time, given the cumulative probability is 0.9. From Figure 9, we get a time of approximately 6.7 hrs. [Note: we could get a more accurate number by fitting the points obtained from equation (1) to a curve and then calculating the time corresponding to P=0.9]. The interesting thing is that the 90% certain completion time is not too different from that of a single task (as calculated from equation 7 of my previous post) – which works out to 6.45 hrs.

Comparing the two histograms in Figure 8, we expect the biggest differences in cumulative probability to occur at about the t=4 hour mark, because by that time the probability for the individual task has peaked whereas that for the combined task is yet to peak. Let’s see if this is so: from Figure 8, the cumulative probability for t=4  hrs is about .15 and from the CDF for the triangular distribution (equation 6 from my previous post), the cumulative probability at t=4 hours  (which is the most likely time) is .333 – double that of the combined task.  This, again, isn’t too surprising (once one has Figure 8 on hand). The really nice thing is that we are able to attach uncertainties to all our estimates.

Conclusion

Although the examples discussed above are simple – two identical tasks with uncertainties described by a triangular distribution – they serve to illustrate some of the non-intuitive outcomes when tasks have dependencies.   It is also worth noting that although the distribution for the individual tasks is known, the only straightforward way to obtain the distributions for the combined tasks (figures 3, 5 and 7) is through simulations. So, even these simple examples are a good demonstration of the utility of Monte Carlo techniques. Of course, real projects are way more complicated, with diverse tasks distributed in many different ways.   To simplify simulations in such cases,  one could  perform  coarse-grained simulations on a small number of high-level tasks,  each consisting of a number of  low-level, atomic tasks. The high-level tasks could be constructed in such a way as to focus attention on areas of greatest complexity, and hence greatest uncertainty.

As I have mentioned several times in this article and the previous one: simulation results are only as good as the distributions on which they are based. This begs the question: how do we know what’s an appropriate distribution for a given situation? There’s no one-size-fits-all answer to this question. However, for project tasks there are some general considerations that apply. These are:

  1. There is a minimum time (t_{min}) before which a task cannot cannot be completed.
  2. The probability will increase from 0 at t_{min} to a maximum at a “most likely” completion time, t_{ml}. This holds true for most atomic tasks – but may be not for composite tasks which consist of many smaller tasks.
  3. The probability decreases as time increases beyond t_{ml},  falling to 0 at a time much larger than t_{ml}.   This is simply another way of saying that the distribution has a long (but not necessarily infinite!) tail.

Asymmetric triangular distributions (such as the one used in my examples) are the simplest distributions that satisfy these conditions. Furthermore, a three point estimate is enought to specify a triangular distribution completely – i.e. given a three point estimate there is only one triangular distribution that can be fitted to it. That said, there are several other distributions that can be used; of particular relevance are certain long-tailed distributions.

Finally, I should mention that I make no claims about the efficiency or accuracy of the method presented here:  it should be seen  as  a demonstration rather than a definitive technique.  The many commercial Monte Carlo tools available in the market probably offer far more comprehensive, sophisticated and reliable algorithms (Note:  I ‘ve never used any of them, so I can’t make any recommendations!).  That said, it is always helpful to know the principles behind such tools,  if for no other reason than to understand how they work and, more important,  how to use them correctly.  The material discussed in this and the previous article came out of my efforts to develop an understanding Monte Carlo techniques and how they can be applied to various aspects of project management (they can also be applied to cost estimation, for example).  Over the last few weeks  I’ve spent many enjoyable evenings developing and running these simulations, and learning from them.  I’ll  leave it here with the hope that you find my articles helpful in your own explorations of the wonderful world of Monte Carlo simulations.

Written by K

September 20, 2009 at 9:34 pm

An introduction to Monte Carlo simulation of project tasks

with 4 comments

Introduction

In an essay on the uncertainty of project task estimates,  I  described how a task estimate corresponds to a  probability distribution.  Put simply, a task estimate is actually a range of possible completion times, each with a probability of occurrence specified by a distribution.   If one knows the distribution,  it is possible to answer questions  such as:  “What is the probability that the task will be completed within x days?”.   Of course, the reliability of such predictions depends on how faithfully the distribution captures the actual  spread of task durations –  and therein lie at least a couple of problems.   First,  the probability distributions for task durations are generally hard to characterise because of the lack of reliable data (estimators are not very good at estimating, and historical data is usually not available).  Second,  many realistic distributions have complicated mathematical forms which can be hard to characterise and manipulate. These problems are compounded by the fact that projects consist of several tasks, each one with its own duration estimate and  (possibly complicated) distribution.  The first issue is usually addressed by fitting distributions to  point estimates (such as optimistic, pessimistic and most likely times as in PERT)  and then  refining these estimates and distributions as one gains experience.  The second issue can be tackled by Monte Carlo techniques, which involve  simulating the task a number of times  (using an appropriate distribution) and then calculating expected completion times based on the results.   My aim in this post  is to present an  example-based  introduction to Monte Carlo simulation of project task durations.

Although my aim is to keep things reasonably simple (not too much beyond high-school maths and a basic understanding of probability), I’ll be covering a fair bit of ground. Given this,  I’d best to start with a brief description of my approach so that my readers know what coming. 

 Monte Carlo simulation is an umbrella term that  covers a range of approaches that use random sampling to simulate events that are described by known probability distributions.  The first task then, is to specify the probability distribution. However, as mentioned earlier, this is generally unknown for task durations. For simplicity, I’ll assume that task duration uncertainty can be described accurately using a triangular probability distribution – a distribution that is particularly easy to handle from the mathematical point of view.  The advantage of using the triangular distribution is that simulation results can be validated easily.  Using the triangular distribution isn’t a limitation because the method I describe can be applied to arbitrarily shaped distributions. More important, the technique can be used to simulate what happens when multiple tasks are strung together as in a project schedule (I’ll cover this in a future post).  Finally, I’ll demonstrate a Monte Carlo simulation method as applied to a single task described by a triangular distribution. Although a simulation is overkill in this case (because questions regarding durations can be answered exactly without using a simulation),  the example serves to illustrate the steps involved in simulating more complex cases – such as those comprising of more than one task and/or involving more complicated distributions. 

So, without further ado, let me begin the journey by describing the triangular distribution.

The triangular distribution

Let’s assume that there’s a project task that needs doing, and the person who is going to do it reckons it will take between 2 and 8 hours to complete it, with a most likely completion time of 4 hours. How the estimator comes up with these numbers isn’t important at this stage – maybe there’s some guesswork, maybe some padding or maybe it is really based on experience (as it should be).  What’s important is that we have three numbers corresponding to a minimum, most likely and maximum time.  To keep the discussion general, we’ll call these t_{min}, t_{ml} and t_{max} respectively, (we’ll get back to our estimator’s specific numbers later).  Now, what about the probabilities associated with each of these times? Since t_{min} and t_{max} correspond to the minimum and maximum times,  the probability associated with these is zero. Why?  Because if it wasn’t zero, then there would be a non-zero probability of completion for a time less than t_{min} or greater than t_{max} – which isn’t possible [Note: this is a consequence of the assumption that the probability varies continuously -  so if it takes on non-zero value, p_{0},  at t_{min} then it must take on a value slightly less than p_{0} - but greater than 0 -  at t slightly smaller than t_{min} ] .   As far as  the most likely time,  t_{ml},  is concerned:  by definition, the probability attains its highest value at time t_{ml}.    So, assuming the probability can be described by a triangular function, the distribution must have the form shown in Figure 1 below.

Figure 1

Figure 1: Triangular Distribution

For the simulation, we need to know the equation describing the above distribution.  Although Wikipedia will tell us the answer in a mouse-click, it is instructive to figure it out for ourselves. First, note that the area under the triangle must be equal to  1 because the task must finish at some time between t_{min} and t_{max}.   As a consequence we have:

\frac{1}{2}\times{base}\times{altitude}=\frac{1}{2}\times{(t_{max}-t_{min})}\times{p(t_{ml})}=1\ldots\ldots{(1)}

where p(t_{ml}) is the probability corresponding to time t_{ml}.  With a bit of rearranging we get,

p(t_{ml})=\frac{2}{(t_{max}-t_{min})}\ldots\ldots(2)

To derive the probability for any time t lying between t_{max} and t_{ml}, we note that:

\frac{(t-t_{min})}{p(t)}=\frac{(t_{ml}-t_{min})}{p(t_{ml})}\ldots\ldots(3)

This is a consequence of the fact that the ratios on either side of equation (3)  are  equal to the slope of the line joining the points (t_{min},0) and (t_{ml}, p(t_{ml})).

Figure 2

Figure 2

Substituting (2) in (3) and simplifying a bit, we obtain:

p(t)=\frac{2(t-t_{min})}{(t_{ml}-t_{min})(t_{max}-t_{min})}\dots\ldots(4) for t_{min}\leq t \leq t_{ml}

In a similar fashion one can show that the probability for times lying between t_{ml} and t_{max} is given by:

p(t)=\frac{2(t_{max}-t)}{(t_{max}-t_{ml})(t_{max}-t_{min})}\dots\ldots(5) for t_{ml}\leq t \leq t_{max}

Equations 4 and 5 together describe the probability distribution function (or PDF)  for all times between t_{min} and t_{max}.


Another quantity of  interest is the cumulative distribution function (or CDF) which is the probability, P,  that the task is completed by a time t. To reiterate, the PDF, p(t), is the probability of the task finishing at  time t whereas the CDF, P(t), is the probability of the task completing by  time t. The CDF, P(t),  is essentially a sum of all probabilities between t_{min} and t. For t < t_{min} this is the area under the triangle with apexes at   (t_{min}, 0), (t, 0) and (t, p(t)).  Using the formula for the area of a triangle (1/2 base times height) and equation (4) we get:

P(t)=\frac{(t-t_{min})^2}{(t_{ml}-t_{min})(t_{max}-t_{min})}\ldots\ldots(6) for t_{min}\leq t \leq t_{ml}

Noting that for t \geq t_{ml}, the area under the curve equals the total area minus the area enclosed by the triangle with base between t and t_{max}, we have:

P(t)=1- \frac{(t_{max}-t)^2}{(t_{max}-t_{ml})(t_{max}-t_{min})}\ldots\ldots(7) for t_{ml}\leq t \leq t_{max}

As expected,  P(t)  starts out with a value 0 at t_{min} and then increases monotonically, attaining a value of 1 at t_{max}.

To end this section let’s plug in the numbers quoted by our estimator at the start of this section: t_{min}=2, t_{ml}=4 and t_{max}=8.  The resulting PDF and CDF are shown in figures 3 and 4.

Figure 3 - Triangular PDF (tmin=2, tml=4, tmax=8)

Figure 3 - Triangular PDF (tmin=2, tml=4, tmax=8)

Figure 4 - Triangular CDF (tmin=2, tml=4, tmax=8)

Figure 4 - Triangular CDF (tmin=2, tml=4, tmax=8)

 

Monte Carlo Simulation of  a Single Task

OK, so now we get to the business end of this essay – the simulation.  I’ll first outline the simulation procedure and  then discuss results for the case of  the task described in the previous section (triangular distribution with t_{min}=2, t_{ml}=4 and t_{max}=8).  Note that I used TK Solver  – a mathematical package created by Universal Technical Systems - to do the simulations. TK Solver has built-in backsolving capability which is extremely helpful for solving some of the equations that come up in the simulation calculations. One could use Excel too, but my spreadsheet skills are not up to it :-( .

So, here’s my  simulation procedure:

  1. Generate a random number between 0 and 1.  Treat this number as the cumulative probability, P(t) for the simulation run. [Technical Note:  I used the random number generator that comes with the TK Solver package (the algorithm used by the generator is described here). Excel's random number generator is even better.]
  2. Find the time, t,  corresponding to P(t) by solving equations (6) or (7) for t. The resulting value of t is the time taken to complete the task. [Technical Note: Solving equation (6) or (7) for t isn't straightforward because t appears in several places in the equations. One has two options to solve for t a) Use numerical techniques such as the bisection or Newton-Raphson method or b) use the backsolve (goal seek) functionality in Excel or other mathematical packages. I used the backsolving capability of TK Solver to obtain t for each random value of P generated. TK Solver backsolves equations automatically -  no fiddling around with numerical methods - which makes it an attractive option for these kinds of calculations.]
  3. Repeat steps (1) and (2)  N  times, where N  is a “sufficiently large” number – say 10,000.

I did the calculations for N=10000 using the triangular distribution with parameters t_{min}=2, t_{ml}=4 and t_{max}=8. This gave me 10,000 values of P(t) and t

As an example of a simulation run proceeds, here’s the data from my first simulation run: the random number generator returned 0.490474700804856 (call it 0.4905). This is the value of P(t). The time corresponding to this cumulative probability is obtained by solving equation (7) numerically for t. This gave t = 4.503057452476027 (call it 4.503) as shown in Figure 5. This is the completion time for the first run.

Figure 5

Figure 5

After completing 10,000 simulation runs, I sorted these into bins corresponding to time intervals of .25 hrs, starting from t=2hrs through to t=8 hrs. The resulting histogram is shown in Figure 6. Each bar corresponds to the number of simulation runs that fall within that time interval.

Figure 6: Distribution of simulation runs

Figure 6: Distribution of simulation runs

As one might expect, this looks like the triangular distribution shown in Figure 4. There is a difference though: Figure 4 plots probability as a continuous function of time  whereas Figure 6 plots the number of simulation runs as a step function of time. To convince ourselves that the two are really the same, lets look at the cumulative probability at t_{ml}  – i.e the probability that the task will be completed within 4 hrs. From equation 6 we get P(t_{ml})=0.3333.  The corresponding number from the simulation is simply the number of simulation runs that had a completion time less than or equal to 4 hrs,  divided by the total number of simulation runs. For my simulation this comes out to be 0.3383. The agreement’s not perfect, but is convincing enough. Just to be sure, I performed the simulation a number of times – generating several sets of random numbers – and took the average of the predicted P(t_{ml}). The agreement between theory and simulation improved, as expected.

Wrap up

A limitation of the triangular distribution is that it imposes an upper cut-off at t_{max}. Long-tailed distributions may therefore be more realistic. In the end, though, the form distribution is neither here nor there because the technique can be applied to any distribution. The real question is:  how do we obtain reliable distributions for our estimates? There’s no easy answer to this one, but one can start with three point estimates (as in PERT) and then fit these to a triangular (or more complicated) distribution.  Although it is best if one  has historical data, in the absence this one can always start with reasonable guesses. The point is to refine these through experience.

Another point worth mentioning is that simulations can be done at a level higher than that of an indivdual task. In their brilliant book – Waltzing With Bears: Managing Risk on Software Projects - De Marco and Lister demonstrate the use of Monte Carlo methods to simulate various aspects of project – velocity, time, cost etc. – at the project level (as opposed to the task level). I believe it is better to perform simulations at the lowest possible level (although it is a lot more work) – the main reason being that it is easier, and less error-prone, to estimate individual tasks than entire projects. Nevertheless, high level simulations can be very useful if one has reliable data to base these on.

I would be remiss if I didn’t mention some of the various Monte Carlo packages available in the market. I’ve never used any of these, but by all accounts they’re pretty good: see this commercial package or this one, for example. Both products use random number generators and sampling techniques that are far more sophisticated than the simple ones I’ve used in my example.

Finally, I have to admit that the example described in this post is a very complicated way of demonstrating the obvious – I started out with the triangular distribution and then got back the triangular distribution via simulation. My point, however, was to illustrate the method and show that it yields expected results in a situation where the answer is known. In a future post I’ll apply the method to more complex situations- for example, multiple tasks in series and parallel, with some dependency rules thrown in for good measure.  Although, I’ll use the triangular distribution for individual tasks, the results will be far from obvious: simulation methods really start to shine as complexity increases. But all that will have to wait for later. For now, I hope my example has helped illustrate how Monte Carlo methods can be used to simulate project tasks.

Note added on 21 Sept 2009:

Follow-up to this article published here.

Written by K

September 11, 2009 at 11:05 pm

Cognitive biases as project meta-risks – part 2

without comments

Introduction 

Risk management is fundamentally about making decisions in the face of uncertainty. These decisions  are based on perceptions of future events,  supplemented by analyses of data relating to those events.  As such, these decisions  are subject to cognitive biases -  human tendencies to base judgements on flawed perceptions of events and/or data. In an earlier post,  I argued that cognitive biases are meta-risks,  i.e.  risks of  risk analysis.   An awareness of how these biases operate can pave the way towards reducing their effects on risk-related decisions. In this post I therefore look into the nature of cognitive biases. In particular:

  1. The role of intuition and rational thought in the expression of cognitive biases.
  2. The psychological process of attribute substitution which underlies judgement-related cognitive biases

I  then take a brief look at ways in which the effect of  bias in decision-making can be reduced. 

 The role of intuition and rational thought in the expression of cognitive biases

Research in psychology has established that human cognition works through two distinct processes:   System 1 which corresponds to intuitive thought and System 2 which corresponds to rational thought. In his Nobel Prize lecture, Daniel Kahneman had this to say about the two systems:

The operations of System 1 are fast, automatic, effortless, associative, and often emotionally charged; they are also governed by habit, and are therefore difficult to control or modify. The operations of System 2 are slower, serial, effortful, and deliberately controlled; they are also relatively flexible and potentially rule-governed.

 The surprise is that judgements always involve System 2 processes. In Kahneman’s words:

 …the perceptual system and the intuitive operations of System 1 generate impressions of the attributes of objects of perception and thought. These impressions are not voluntary and need not be verbally explicit. In contrast, judgments are always explicit and intentional, whether or not they are overtly expressed. Thus, System 2 is involved in all judgments, whether they originate in impressions or in deliberate reasoning.

 So, all judgements, whether intuitive or rational, are monitored by System 2. Kahneman suggests that this monitoring can be very cursory thus allowing System 1 impressions to be expressed directly, whether they are right or not. Seen in this light, cognitive biases are unedited (or at best lightly edited) expressions  of  often incorrect impressions.

 Attribute substitution: a common mechanism for judgement-related biases

 In a paper entitled Representativeness Revisited,  Kahneman and Fredrick suggest that the psychological process of attribute substitution  is the mechanism that underlies many cognitive biases.  Attribute substitution is the tendency of people to answer a difficult decision-making question by interpreting it as a simpler (but related) one. In their paper, Kahneman and Fredrick describe attribute substitution as occurring when:

 …an individual assesses a specified target attribute of a judgment object by substituting a related heuristic attribute that comes more readily to mind…

 An example might help decode this somewhat academic description.  I pick one from Kahneman’s Edge master class where he related the following:

 When I was living in Canada, we asked people how much money they would be willing to pay to clean lakes from acid rain in the Halliburton region of Ontario, which is a small region of Ontario. We asked other people how much they would be willing to pay to clean lakes in all of Ontario.

People are willing to pay the same amount for the two quantities because they are paying to participate in the activity of cleaning a lake, or of cleaning lakes. How many lakes there are to clean is not their problem. This is a mechanism I think people should be familiar with. The idea that when you’re asked a question, you don’t answer that question, you answer another question that comes more readily to mind. That question is typically simpler; it’s associated, it’s not random; and then you map the answer to that other question onto whatever scale there is—it could be a scale of centimeters, or it could be a scale of pain, or it could be a scale of dollars, but you can recognize what is going on by looking at the variation in these variables. I could give you a lot of examples because one of the major tricks of the trade is understanding this attribute substitution business. How people answer questions.

 Attribute substitution boils down to making judgements based on specific, known instances of events or issues under consideration. For example,  people often overrate their own abilities because they base their self-assessments on specific instances where they did well, ignoring situations in which their performance was below par.  Taking another example from the Edge class,

 COMMENT: So for example in the Save the Children—types of programs, they focus you on the individual.

 KAHNEMAN: Absolutely. There is even research showing that when you show pictures of ten children, it is less effective than when you show the picture of a single child. When you describe their stories, the single instance is more emotional than the several instances and it translates into the size of contributions.  People are almost completely insensitive to amount in system one. Once you involve system two and systematic thinking, then they’ll act differently. But emotionally we are geared to respond to images and to instances…

 Kahnemann sums it up in a line in his Nobel lecture: The essence of attribute substitution is that respondents offer a reasonable answer to a question that they have not been asked.

Several decision-making biases in risk analysis operate via attribute substitution -  some of these include availability, representativeness, overconfidence and selective perception (see this post for specific examples drawn from high-profile failed projects).  Armed with this understanding of how these meta-risks operate, lets look at how their effect can be minimised.

 System two to the rescue, but…

 The discussion of the previous section suggests that people often base judgements on specific instances that come to mind, ignoring the range of all possible instances. They do this because specific instances – usually concrete instances that have been experienced – come to mind more easily than the abstract “universe of possibilities.” 

Those who make erroneous judgements will correct them only if they become aware of factors that they did not take into account when making the judgement, or when they realise that their conclusions are not logical. This can only happen through deliberation:   rational analysis,  which is possible only through a deliberate invocation of System 2 thinking.

 Some of the ways in which System 2 can be helped along are: 

  1. By reframing the question or issue in terms that forces analysts to consider the range of possible instances rather than specific instances.  A common manifestation of the latter is when risk managers base their plans on the assumption that average conditions will occur – an assumption that Professor Sam Savage calls the flaw of averages (see Dr. Savage’s very entertaining and informative book for more on the flaw of averages and related statistical fallacies).
  2. By requiring analysts to come up with pros and cons for any decision they make. This forces them to consider possibilities they may not have taken into account when making the original decision.
  3. By basing decisions on relevant empirical or historical data instead of relying on intuitive impressions.
  4. By making the analysts aware of their  propensity to be overconfident (or under-confident) by evaluating their probability calibration. One way to do this is by asking them  to answer a series of trivia questions with confidence estimates for each of their answers (i.e. their self-estimated probability of being right). Their confidence estimates are then compared to the fraction of questions correctly answered. A well calibrated individual’s confidence estimates should be close to the percentage of correct answers.  There is some evidence to suggest that analysts can be trained improve their calibration through cycles of testing and feedback.  Calibration training is discussed in Douglas Hubbard’s book, The Failure of Risk Management. However, as discussed here, improved calibration by through feedback and repeated tests may not carry over to judgements in real-life situations.

Each of the above options forces  analysts to consider instances other than the ones that readily come to mind.  That said, they aren’t a sure-cure for the problem:  System 2 thinking does not guarantee correctness.  Kahneman discusses several reasons why this is so.  First, it has been found that education and training in decision-related disciplines (like statistics) does not eliminate incorrect intuitions; it only reduces them in favourable circumstances (such as when the question is reframed to make statistical cues obvious). Second, he  notes that sytem 2 thinking is easily derailed: research has shown that the efficiency of system 2 is impaired by time pressure and multi-tasking. (Managers who put their teams under time and multi-tasking pressures should take note!). Third, highly accessible values, which form the basis for initial intuitive judgements serve as anchors for subsequent system 2-based corrections. These corrections are generally insufficient – i.e. too small.  And finally, System 2 thinking is of no use if it is based on incorrect assumptions:  as a colleague once said, “Logic doesn’t get you anywhere if  your premise is wrong.” 

Conclusion

 Cognitive biases are meta-risks that are  responsible for many incorrect judgements in project (or any other) risk analysis . An apposite example is the financial crisis of 2008, which can be traced back to several biases such as groupthink, selective perception and over-optimism (among many others). An understanding of how these meta-risks operate suggest ways in which their effects can be reduced,  though not eliminated altogether.  In the end,  the message is simple and obvious: for judgements that matter, there’s no substitute for  due diligence -  careful observation and thought, seasoned with an awareness of one’s own  fallibility.

Written by K

September 3, 2009 at 11:10 pm

An introduction to the critical chain method

with 2 comments

Introduction

All project managers have to deal with uncertainty as a part of their daily work. Project schedules, so carefully constructed, are riddled with assumptions and  uncertainties – particularly in task durations. Most project management treatises (the PMBOK included) recognise this, and so exhort project managers to include uncertainties in their activity duration estimates. However, the same books have little to say on how these uncertainties should be integrated into the project schedule in a meaningful way. Sure, well-established techniques such as PERT incorporate probabilities into a schedule via an averaged or expected duration, but the final schedule is deterministic  – i.e each task is assigned a definite completion date, based on the expected duration.  Any float that appears in the schedule is purely a consequence of an activity not being on the critical path. The float, such as it is, is not an allowance for uncertainty.

Since PERT was invented in the 1950s, there have been several other attempts to incorporate uncertainty into project scheduling.  Some of these include, Monte Carlo simulation and, more recently, Bayesian Networks. Although these techniques have a more sound basis, they don’t really address the question of how uncertainty is to be managed in a project schedule, where individual tasks are strung together one after another.   What’s needed is a simple technique to protect a project schedule   from Murphy, Parkinson or any other variations that invariably occur during the execution of individual tasks. In the 1990s, Eliyahu Goldratt proposed just such a technique in his business novel, Critical Chain.  This post presents a short, yet comprehensive introduction to Goldratt’s critical chain method .

[An Aside: Before proceeding any further I should mention that Goldratt formulated the critical chain method within the framework of his Theory of Constraints (TOC). I won't discuss TOC in this article, mainly because of space limitations. Moreover, an understanding of TOC isn't really needed to understand the critical chain method. For those interested in learning about TOC, the best starting point is Goldratt's business novel, The Goal.]

I begin with a discussion of some general characteristics of activity or task estimates, highlighting the reason why task estimators tend to pad up their estimates.  This is followed by a discussion on why the buffers (or safety) that estimators build into individual activities don’t help  - i.e. why projects come in late despite the fact that most people add considerable safety factors on to their activity estimates. This then naturally leads on to the heart of the matter:  how buffers should be added in order to protect schedules effectively.

Characteristics of activity duration estimates

(Note:  Portions of this section have been published previously in my post on the inherent uncertainty of project task estimates)

Consider an activity that you do regularly – such as getting ready in the morning. You have a pretty good idea how long the activity takes on average. Say, it takes you an hour on average to get ready – from when you get out of bed to when you walk out of your front door. Clearly, on a particular day you could be super-quick and finish in 45 minutes, or even 40 minutes. However, there’s a lower limit to the early finish – you can’t get ready in 0 minutes!. On the other hand, there’s really no upper limit. On a bad day you could take a few hours. Or if you slip in the shower and hurt your back, you mayn’t make it at all.

If we were to plot the probability of activity completion for this example as a function of time, it might look something like I’ve depicted in Figure 1. The distribution starts at a non-zero cutoff (corresponding to the minimum time for the activity); increases to a maximum (corresponding to the most probable time); and then falls off rapidly at first, then with a long, slowly decaying, tail. The mean (or average) of the distribution is located to the right of the maximum because of the long tail. In the example, t_{0}(30 mins) is the minimum time for completion so the probability of finishing within 30 mins is 0%. There’s a 50% probability of completion within an hour, 80% probability of completion within 2 hours and a 90% probability of completion in 3 hours. The large values for t_{80} and t_{90}compared to t_{50}are a consequence of the long tail. OK, this particular example may be an exaggeration – but you get my point: if you want to be really really sure of completing any activity, you have to add a lot of safety because there’s a chance that you may “slip in the shower” so to speak.

Probability distribution function for an activity

It turns out that many phenomena can be modeled by this kind of long-tailed distribution. Some of the better known long-tailed distributions include lognormal and power law distributions. A quick (but admittedly informal) review of the project management literature revealed that lognormal distributions are more commonly used than power laws to model activity duration uncertainties. This may be because lognormal distributions have a finite mean and variance whereas power law distributions can have infinite values for both (see this presentation by Michael Mitzenmacher, for example). [An Aside:If you're curious as to why infinities are possible in the latter, it is because power laws decay more slowly than lognormal distributions - i.e they have "fatter" tails, and hence enclose larger (even infinite) areas.]. In any case, regardless of the exact form of the distribution for activity estimates, what’s important and non-controversial is the short cutoff, the peak and long, decaying tail.

Most activity estimators are intuitively aware of the consequences of the long tail. They therefore add a fair amount of “air” or safety in their estimates. Goldratt suggests that typical activity estimates tend to correspond to t_{80} or t_{90}. Despite this, real life projects still have difficulty in maintaining schedules. Why this is so is partially answered in the next section.

Delays accumulate; gains don’t

A schedule is essentially made up of several activities (of varying complexity and duration) connnected sequentially or in parallel. What are the implications of uncertain activity durations on a project schedule? Well, let’s look at the case of sequential and parallel steps separately:

Sequential steps: If an activity finishes early, the successor activity rarely starts right away. More often, the successor activity starts only when it was originally scheduled to. Usually this happens because the resource responsible for the successor activity is not free – or hasn’t been told about the early finish of the predecessor activity. On the other hand, if an activity finishes late, the start of the successor activity is delayed by at least the same amount as the delay. The upshot of all this is that – delays accumulate but early finishes are rarely taken advantage of. So, in a long chain of sequential activities, you can be pretty sure that there will be delays.

Parallel steps: In this case, the longest duration activity dictates the finish time. For example, if we have three parallel activities that take 5 days each. If one of them ends up taking 10 days, the net effect is that three activities, taken together, will complete only after 10 days. In contrast, an early finish will not have an effect unless all activities finish early (and by the same amount!). Again we see that delays accumulate; early finishes don’t.

The above discussion assumed that activities are independent. In a real project activities can be highly dependent. In general this tends to make things worse – a delay in an activity is usually magnified by a dependent successor activity.

This partially explains why projects come in late. However it’s not the whole story. According to Goldratt, there are a few other factors that lead to dissipation of safety. I discuss these next.

Other time wasters

In the previous section we saw that dependencies between activities can eat into safety significantly because delays accumulate while gains don’t. There are a couple of other ways safety is wasted. These are:

Multitasking It is recognised that multitasking – i.e. working on more than one task concurrently – introduces major delays in completing tasks. See these articles by Johanna Rothman and Joel Spolsky, for a discussion of why this is so. I’ve discussed techniques to manage multitasking in an earlier post.

Student syndrome This should be familiar to any one who’s been a student. When saddled with an assignment, the common tendency is to procrastinate until the last moment. This happens on projects as well. “Ah, there’s so much time. I’ll start later…” Until, of course, there isn’t very much time at all.

Parkinson’s Law states that “work expands to fill the allocated time.” This is most often a consequence of there being no incentive to finish a task early. In fact, there’s a strong disincentive from finishing early because the early finisher may be a) accused of overestimating the task or b) rewarded by being allocated more work. Consequently people tend to adjust their pace of work to just make the scheduled delivery date, thereby making the schedule a self-fulfilling prophecy.

Any effective project management system must address and resolve the above issues. The critical chain method does just that. Now with the groundwork in place, we can move on to a discussion of the technique. We’ll do this in two steps. First, we discuss the special case in which there is no resource contention – i.e. multitasking does not occur. The second, more general, case discusses the situation in which there is resource contention.

The critical chain – special case

In this section we look at the case where there’s no resource contention in the project schedule. In this (ideal) situation, where every resource is available when required, each task performer is ready to start work on a specific task just as soon as all its predecessor tasks are complete. Sure, we’ll need to put in place a process to notify successor task performers about when they need to be ready to start work. I’ll discuss this notification process a little later in this section. Let’s first tackle Parkinson and the procrastinators.

Preventing the student syndrome and Parkinson’s Law

To cure habitual procrastinators and followers of Parkinson, Goldratt suggests that project task durations estimates be based on a 50% probability of completion. This corresponds to an estimate that is equal to t_{50} for an activity (you may want to have another look at Figure 1  to remind yourself of what this means). Remember, as discussed earlier, estimates tend to be based on t_{80} or t_{90}, both of which are significantly larger than t_{50} because of the nature of the distribution. The reduction in time should encourage task performers to start the task on schedule, thereby avoiding the student syndrome. Further, it should also discourage people from deliberately slowing their work pace, thereby preventing Parkinson from taking hold.

As discussed earlier, a t_{50} estimate implies there’s a 50% chance that the task will not complete on time. So, to reassure task estimators / performers, Goldratt recommends implementing the following actions:

  1. Removal of individual activity completion dates from the schedule altogether. The only important date is the project completion date.
  2. No penalties for going over the t_{50}estimate. Management must accept that the estimate is based on t_{50}, so the activity is expected to overrun the estimate 50% of the time.

The above points should be explained clearly to project team members before attempting  to elicit t_{50}estimates from them.  

So, how does one get reliable  t_{50} estimates? Here are some approaches:

  1. Assume that the initial estimates obtained from team members are t_{80} or t_{90}, so simply halve these to get a rough t_{50}. This is the approach Goldratt recommends. However, I’m not a fan of this method because it is sure to antagonise folks.
  2. Another option is to ask the estimator how long a task is going to take. They’ll come back to you with a number. This is likely to be their t_{80} or t_{90}. Then ask them for their t_{50}, explaining what it means (i.e. estimate which you have a 50% chance of going over). They should come back to you with a smaller number. It may not be half the original estimate or less, but it should be significantly smaller.
  3. Yet another option  is to calibrate estimators’  abilities to predict task durations based on their history  (i.e. based on how good earlier estimates were).  In the absence of prior data one can  quantify an estimator’s reliability in making judgements by asking him or her to answer a series of trivia questions, giving an estimated probability of being correct along with each answer.  An individual is  said to be calibrated if  the fraction of questions correctly answered coincides (or is close to) their stated probability estimates.  In theory, a calibrated individual’s duration estimate should be pretty good. However, it is questionable as to whether calibration as determined through trivia questions carries over to real-world estimates. See this site for more on evaluating calibration.
  4. Finally, project managers can use Monte Carlo simulations to estimate task durations. The hard part here is coming up with a probability distribution for the task duration. One commonly used approach is to ask task estimators to come up with best case, worst case and most likely estimates, and then fit these to a probability distribution. There are at least two problems with this approach: a) the only sensible fit to a three point estimate is a triangular distribution, but this isn’t particularly good because it ignores the long tail and b) the estimates still need to be quality assured through independent checks (historical comparison, for example)  or via calibration as discussed above – else the distribution is worthless.  See this paper for more on the use of Monte Carlo simulations in project management.

Folks who’ve read my articles on cognitive biases in project management (see this post and this one)  may be wondering how these fit in to the above argument. According to Goldratt, most people tend to offer their t_{80} or t_{90} numbers,  rather than their t_{50} ones. The reason this happens is that folks tend to remember the instances when things went wrong,  so they pad up their estimates to avoid getting burned again – a case of the availability bias  in action.  

Getting team members to come up with reliable t_{50} numbers depends very much on how safe they feel doing so. It is important that management understands that there is a 50% chance of not meeting t_{50} deadlines for an individual tasks; the only important deadline is the completion of the project. This is why  Goldratt and other advocates of the critical chain method emphasise that a change in organisational culture is required in order for the technique to work in practice. Details of how one might implement this change is out of scope for an introductory article, but readers should be aware that the biggest challenges are not technical ones.

The resource buffer

Readers may have noticed a problem arising from the foregoing discussion of t_{50} estimates: if there is no completion date for a task, how does a successor task performer know when he or she needs to be ready to start work? This problem is handled via a notification process that works as follows: the predecessor task peformer notifies successor task performers about expected completion dates on a regular basis. These notifications occur at regular, predetermined intervals. Further, a final confirmation should be given a day or two before task completion so all successor task performers are ready to start work exactly when needed. Goldratt calls this notification process the resource buffer. It is a simple yet effective method to ensure that a task starts exactly when it should. Early finishes are no longer wasted!

The project buffer

Alright, so now we’ve reduced activity estimates, removed completion dates for individual tasks and ensured that resources are positioned to pick up tasks when they have to. What remains? Well, the most important bit really – the safety! Since tasks now only have a 50% chance of completion within the estimated time, we need to put safety in somewhere. The question is, where should it go? The answer lies in recognising that the bottleneck (or constraint) in a project is the critical path. Any delay in the critical path necessarily implies a delay in the project. Clearly, we need to add the safety somewhere on the critical path. I hope the earlier discussion has convinced you that adding safety to individual tasks is an exercise in futility. Goldratt’s insight was the following: safety should be added to the end of the critical path as a non-activity buffer. He calls this the project buffer. If any particular activity is delayed, the project manager “borrows” time from the project buffer and adds it on to the offending activity. On the other hand, if an activity finishes early the gain is added to the project buffer. Figure 2 depicts a project network diagram with the project buffer added on to the critical path (C1-C2-C3 in the figure).

cc1

What size should the buffer be? As a rule of thumb, Goldratt proposed that the buffer should be 50% of the safety that was removed from the tasks. Essentially this makes the critical path 75% as long as it would have been with the original (t_{80} or t_{90}) estimates (see this paper for example). Other methods of buffer estimation are discussed in this book on critical chain project management.

The feeding buffer

As shown in Figure 2 the project buffer protects the critical path. However, delays can occur in non-critical paths as well (A1-A2 and B1-B2 in the figure). If long enough, these delays can affect subsequent critical path. To prevent this from happening, Goldratt suggests adding buffers at points where non-critical paths join the critical path. He terms these feeding buffers

cc2

Figure 3 depicts the same project network diagram as before with feeding buffers added in. Feeding buffers are sized the same way as project buffers are – i.e. based on a fraction of the safety removed from the activities on the relevant (non-critical) path.

 The critical chain – a first definition

This completes the discussion of the case where there’s no resource contention. In this special case, the critical chain of the project is identical to the critical path. The activity durations for all tasks are based on t50 estimates, with the project buffer protecting the project from delays. In addition, the feeding buffers protect critical chain activities from delays in non-critical chain activities.

The critical chain – general case

Now for the more general case where there is contention for resources. Resource contention implies that task performers are scheduled to work on multiple tasks simultaneously, at one or more points along the project timeline. Although it is well recognised that multitasking is to be avoided, most algorithms for finding the critical path do not take resource contention into account. The first step, therefore, is to resource level the schedule – i.e ensure that tasks that are to be performed the same resource(s) are scheduled sequentially rather than simultaneously. Typically this changes the critical path from what it would otherwise be. This resource leveled critical path is the critical chain.

The above can be illustrated by modifying the example network shown in Figure 3. Assume tasks C1, B2 and A2 (marked X) are performed by the same resources. The resource leveled critical path thus changes from that shown in Figures 2 and 3 to that shown in Figure 4 (in red). As per the definition above, this is the critical chain. Notice that the feeding buffers change location, as (by definition) these have to be moved to points where non-critical paths merge with the critical path. The location of the project buffer remains unchanged.

cc3

Endnote

This completes my introduction to the critical chain method. Before closing, I should mention that there has been some academic controversy regarding the critical chain method. In practice, though, the method seems to work well as evidenced by the number of companies offering consulting and software related to critical chain project scheduling. 

I can do no better than to end with a list of online references which I’ve found immensely useful in learning about the method. Here they are, in no particular order:

Critical Chain Scheduling and Buffer Management . . . Getting Out From Between Parkinson’s Rock and Murphy’s Hard Place by Francis Patrick.

Critical Chain Project Management Improves Project Performance by Larry Leach.

Critical Chain: a hands-on project application by Ernst Meijer.

The best place to start, however, is where it all began: Goldratt’s novel, Critical Chain.

(Note:  This essay is a revised version of my article on the critical chain,  first  published in 2007)

Written by K

August 20, 2009 at 10:50 pm