Archive for February 2011
(Note to the reader: An Excel sheet showing sample calculations and plots discussed in this post can be downloaded here.)
Monte Carlo simulation techniques have been applied to areas ranging from physics to project management. In earlier posts, I discussed how these methods can be used to simulate project task durations (see this post and this one for example). In those articles, I described simulation procedures in enough detail for readers to be able to reproduce the calculations for themselves. However, as my friend Paul Culmsee mentioned, the mathematics tends to obscure the rationale behind the technique. Indeed, at first sight it seems somewhat paradoxical that one can get accurate answers via random numbers. In this post, I illustrate the basic idea behind Monte Carlo methods through an example that involves nothing more complicated than squares and circles. To begin with, however, I’ll start with something even simpler – a drunken darts player.
Consider a sozzled soul who is throwing darts at a board situated some distance from him. To keep things simple, we’ll assume the following:
- The board is modeled by the circle shown in Figure 1, and our souse scores a point if the dart falls within the circle.
- The dart board is inscribed in a square with sides 1 unit long as shown in the figure, and we’ll assume for simplicity that the dart always falls somewhere within the square (our protagonist is not that smashed).
- Given his state, our hero’s aim is not quite what it should be - his darts fall anywhere within the square with equal probability. (Note added on 01 March 2011: See the comment by George Gkotsis below for a critique of this assumption)
We can simulate the results of our protagonist’s unsteady attempts by generating two sets of uniformly distributed random numbers lying between 0 and 1 (This is easily done in Excel using the rand() function). The pairs of random numbers thus generated – one from each set – can be treated as the (x,y) coordinates of the dart for a particular throw. The result of 1000 pairs of random numbers thus generated (representing the drunkard’s dart throwing attempts) is shown in Figure 2 (For those interested in seeing the details, an Excel sheet showing the calculations for 100o trials can be downloaded here).
A trial results in a “hit” if it lies within the circle. That is, if it satisfies the following equation:
(Note: if we replace “<” by “=” in the above expression, we get the equation for a circle of radius 0.5 units, centered at x=0.5 and y=0.5.)
Now, according to the frequency interpretation of probability, the probability of the plastered player scoring a point is approximated by the ratio of the number of hits in the circle to the total number of attempts. In this case, I get an average of 790/1000 which is 0.79 (generated from 10 sets of 1000 trials each). Your result will be different from because you will generate different sets of random numbers from the ones I did. However, it should be reasonably close to my result.
Further, the frequency interpretation of probability tells us that the approximation becomes more accurate as the number of trials increases. To see why this is so, let’s increase the number of trials and plot the results. I carried out simulations for 2000, 4000, 8000 and 16000 trials. The results of these simulations are shown in Figures 3 through 6.
Since a dart is equally likely to end up anywhere within the square, the exact probability of a hit is simply the area of the dartboard (i.e. the circle) divided by the entire area over which the dart can land. In this case, since the area of the enclosure (where the dart must fall) is 1 square unit, the area of the dartboard is actually equal to the probability. This is easily seen by calculating the area of the circle using the standard formula where is the radius of the circle (0.5 units in this case). This yields 0.785398 sq units, which is reasonably close to the number that we got for the 1000 trial case. In the 16000 trial case, I get a number that’s closer to the exact result: an average of 0.7860 from 10 sets of 16000 trials.
As we see from Figure 6, in the 16000 trial case, the entire square is peppered with closely-spaced “dart marks” – so much so, that it looks as though the square is a uniform blue. Hence, it seems intuitively clear that as we increase, the number of throws, we should get a better approximation of the area and, hence, the probability.
There are a couple of points worth mentioning here. First, in principle this technique can be used to calculate areas of figures of any shape. However, the more irregular the figure, the worse the approximation – simply because it becomes less likely that the entire figure will be sampled correctly by “dart throws.” Second, the reader may have noted that although the 16000 trial case gives a good enough result for the area of the circle, it isn’t particularly accurate considering the large number of trials. Indeed, it is known that the “dart approximation” is not a very good way of calculating areas – see this note for more on this point.
Finally, let’s look at connection between the general approach used in Monte Carlo techniques and the example discussed above (I use the steps described in the Wikipedia article on Monte Carlo methods as representative of the general approach):
- Define a domain of possible inputs – in our case the domain of inputs is defined by the enclosing square of side 1 unit.
- Generate inputs randomly from the domain using a certain specified probability distribution – in our example the probability distribution is a pair of independent, uniformly distributed random numbers lying between 0 and 1.
- Perform a computation using the inputs – this is the calculation that determines whether or not a particular trial is a hit or not (i.e. if the x,y coordinates obey inequality (1) it is a hit, else it’s a miss)
- Aggregate the results of the individual computations into the final result – This corresponds to the calculation of the probability (or equivalently, the area of the circle) by aggregating the number of hits for each set of trials.
To summarise: Monte Carlo algorithms generate random variables (such as probability) according to pre-specified distributions. In most practical applications one will use more efficient techniques to sample the distribution (rather than the naïve method I’ve used here.) However, the basic idea is as simple as playing drunkard’s darts.
Thanks go out to Vlado Bokan for helpful conversations while this post was being written and to Paul Culmsee for getting me thinking about a simple way to explain Monte Carlo methods.
When developing duration estimates for a project task, it is useful to make a distinction between the uncertainty inherent in the task and uncertainty due to known risks. The former is uncertainty due to factors that are not known whereas the latter corresponds uncertainty due to events that are known, but may or may not occur. In this post, I illustrate how the two types of uncertainty can be combined via Monte Carlo simulation. Readers may find it helpful to keep my introduction to Monte Carlo simulations of project tasks handy, as I refer to it extensively in the present piece.
Setting the stage
Let’s assume that there’s a task that needs doing, and the person who is going to do it reckons it will take between 2 and 8 hours to complete it, with a most likely completion time of 4 hours. How the estimator comes up with these numbers isn’t important here – maybe there’s some guesswork, maybe some padding or maybe it is really based on experience (as it should be). For simplicity we’ll assume the probability distribution for the task duration is triangular. It is not hard to show that, given the above mentioned estimates, the probability, , that the task will finish at time is given by the equations below (see my introductory post for a detailed derivation):
for 2 hours 4 hours
for 4 hours 8 hours
These two expressions are sometimes referred to as the probability distribution function (PDF). The PDF described by equations (1) and (2) is illustrated in Figure 1. (Note: Please click on the Figures to view full-size images)
Now, a PDF tells us the probability that the task will finish at a particular time . However, we are more interested in knowing whether or not the task will be completed by time . – i.e. at or before time . This quantity, which we’ll denote by (capital P), is sometimes known as the cumulative distribution function (CDF). The CDF is obtained by summing up the probabilities from hrs to time . It is not hard to show that the CDF for the task at hand is given by the following equations:
for 2 hours 4 hours
for 4 hours 8 hours
For a detailed derivation, please see my introductory post. The CDF for the distribution is shown in Figure 2.
Now for the complicating factor: let us assume there is a risk that has a bearing on this task. The risk could be any known factor that has a negative impact on task duration. For example, it could be that a required resource is delayed or that the deliverable will fails a quality check and needs rework. The consequence of the risk – should it eventuate – is that the task takes longer. How much longer the task takes depends on specifics of the risk. For the purpose of this example we’ll assume that the additional time taken is also described by a triangular distribution with a minimum, most likely and maximum time of 1, 2 and 3 hrs respectively. The PDF for the additional time taken due to the risk is:
for 1 hour 2 hours
for 2 hrs 3 hours
The figure for this distribution is shown in Figure 3.
The CDF for the additional time taken if the risk eventuates (which we’ll denote by ) is given by:
for 1 hour 2 hours
for 2 hours 3 hours
The CDF for the risk consequence is shown in Figure 4.
Before proceeding with the simulation it is worth clarifying what all this means, and what we want to do with it.
Firstly, equations 1-4 describe the inherent uncertainty associated with the task while equations 5 through 8 describe the consequences of the risk, if it eventuates.
Secondly, we have described the task and the risk separately. In reality, we need a unified description of the two – a combined distribution function for the uncertainty associated with the task and the risk taken together. This is what the simulation will give us.
Finally, one thing I have not yet specified is the probabilty that the risk will actually occur. Clearly, the higher the probability, the greater the potential delay. Below I carry out simulations for risk probabilities of varying from 0.1 to 0.5.
That completes the specification of the problem – let’s move on to the simulation.
The simulation procedure I used for the zero-risk case (i.e. the task described by equations 1 and 2 ) is as follows :
- Generate a random number between 0 and 1. Treat this number as the cumulative probability, for the simulation run. [You can generate random numbers in Excel using the rand() function]
- Find the time, , corresponding to by solving equations (3) or (4) for . The resulting value of is the time taken to complete the task.
- Repeat steps (1) and (2) for a sufficiently large number of trials.
The frequency distribution of completion times for the task, based on 30,000 trials is shown in Figure 5.
As we might expect, Figure 5 can be translated to the probability distribution shown in Figure 1 by a straightforward normalization – i.e. by dividing each bar by the total number of trials.
What remains to be done is to incorporate the risk (as modeled in equations 5-6) into the simulation. To simulate the task with the risk, we simply do the following for each trial:
- Simulate the task without the risk as described earlier.
- Generate another random number between 0 and 1.
- If the random number is less than the probability of the risk, then simulate the risk. Note that since the risk is described by a triangular function, the procedure to simulate it is the same as that for the task (albeit with different parameters).
- If the random number is greater than the probability of the risk, do nothing.
- Add the results of 1 and 4. This is the outcome of the trial.
- Repeat steps 1-5 for as many trials as required.
I performed simulations for the task with risk probabilities of 10%, 30% and 50%. The frequency distributions of completion times for these are displayed in Figures 6-8 (in increasing order of probability). As one would expect, the spread of times increases with increasing probability. Further, the distribution takes on a distinct second peak as the probability increases: the first peak is at , corresponding to the most likely completion time of the risk-free task and the second at corresponding to the most likely additional time of 2 hrs if the risk eventuates.
It is also instructive to compare average completion times for the four cases (zero-risk and 10%, 30% and 50%). The average can computed from the simulation by simply adding up the simulated completion times (for all trials) and dividing by the total number of simulation trials (30,000 in our case). On doing this, I get the following:
Average completion time for zero-risk case = 4.66 hr
Average completion time with 10% probability of risk = 4.89 hrs
Average completion time with 30% probability of risk = 5.36 hrs
Average completion time with 50% probability of risk= 5.83 hrs
No surprises here.
One point to note is that the result obtained from the simulation for the zero-risk case compares well with the exact formula for a triangular distribution (see the Wikipedia article for the triangular distribution):
This serves as a sanity check on the simulation procedure.
It is also interesting to compare the cumulative probabilities of completion in the zero-risk and high risk (50% probability) case. The CDFs for the two are shown in Figure 9. The co-plotted CDFs allow for a quick comparison of completion time predictions. For example, in the zero-risk case, there is about a 90% chance that the task will be completed in a little over 6 hrs whereas when the probability of the risk is 50%, the 90% completion time increases to 8 hrs (see Figure 9).
Next steps and wrap up
For those who want to learn more about simulating project uncertainty and risk, I can recommend the UK MOD paper – Three Point Estimates And Quantitative Risk Analysis A Process Guide For Risk Practitioners. The paper provides useful advice on how three point estimates for project variables should be constructed. It also has a good discussion of risk and how it should be combined with the inherent uncertainty associated with a variable. Indeed, the example I have described above was inspired by the paper’s discussion of uncertainty and risk.
Of course, as with any quantitative predictions of project variables, the numbers are only as reliable as the assumptions that go into them, the main assumption here being the three point estimates that were used to derive the distributions for the task uncertainty and risk (equations 1-2 and 5-6). Typically these are obtained from historical data. However, there are well known problems associated with history-based estimates. For one, as we can never be sure that the historical tasks are similar to the one at hand in ways that matter (this is the reference class problem). As Shim Marom warns us in this post, all our predictions depend on the credibility of our estimates. Quoting from his post:
Can you get credible three point estimates? Do you have access to credible historical data to support that? Do you have access to Subject Matter Experts (SMEs) who can assist in getting these credible estimates?
If not, don’t bother using Monte Carlo.
In closing, I hope my readers will find this simple example useful in understanding how uncertainty and risk can be accounted for using Monte Carlo simulations. In the end, though, one should always keep in mind that the use of sophisticated techniques does not magically render one immune to the GIGO principle.
Over the last year or so, I’ve used IBIS (Issue-Based Information System) to map a variety of discussions at work, ranging from design deliberations to project meetings [Note: see this post for an introduction to IBIS and this one for an example of mapping dialogues using IBIS]. Feedback from participants indicated that IBIS helps to keep the discussion focused on the key issues, thus leading to better outcomes and decisions. Some participants even took the time to learn the notation (which doesn’t take long) and try it out in their own meetings. Yet, despite their initial enthusiasm, most of them gave it up after a session or two. Their reasons are well summed up by a colleague who said, “It is just too hard to build a coherent map on the fly while keeping track of the discussion.”
My colleague’s comment points to a truth about the technique: the success of a sense-making session depends rather critically on the skill of the practitioner. The question is: how do experienced practitioners engage their audience and build a coherent map whilst keeping the discussion moving in productive directions? Al Selvin, Simon Buckingham-Shum and Mark Aakhus provide a part answer to this question in their paper entitled, The Practice Level in Participatory Design Rationale : Studying Practitioner Moves and Choices. Specifically, they describe a general framework within which the practice of participatory design rationale (PDR) can be analysed (Note: more on PDR in the next section). This post is a discussion of some aspects of the framework and some personal reflections based on my (limited) experience.
A couple of caveats are in order before I proceed. Firstly, my discussion focuses on understanding the dimensions (or variables) that describe the act of creating design representations in real-time. Secondly, my comments and reflections on the model are based on my experiences with a specific design rationale technique – IBIS.
First up, it is worth clarifying the meaning of participatory design rationale (PDR). The term refers to the collective reasoning behind decisions that are made when a group designs an artifact. Generally such rationale involves consideration of various alternatives and why they were or weren’t chosen by the group. Typically this involves several people with differing views. Participatory design is thus an argumentative process, often with political overtones.
Clearly, since design involves deliberation and may also involve a healthy dose of politics, the process will work more effectively if it is structured. The structure should, however, be flexible: it must not constrain the choices and creativity of those involved. This is where the notation and practitioner (facilitator) come in: the notation lends structure to the discussion and the practitioner keeps it going in productive directions, yet in a way that sustains a coherent narrative. The latter is a creative process, much like an art. The representation (such as IBIS) – through its minimalist notation and grammar – helps keep the key issues, ideas and arguments firmly in focus. As Selvin noted in an earlier paper, this encourages collective creativity because it forces participants to think through their arguments more carefully than they would otherwise. Selvin coined the term knowledge art to refer to this process of developing and engaging with design representations. Indeed, the present paper is a detailed look at how practitioners create knowledge art – i.e. creative, expressive representations of the essence of design discussions. Quoting from the paper:
…we are looking at the experience of people in the role of caretakers or facilitators of such events – those who have some responsibility for the functioning of the group and session as a whole. Collaborative DR practitioners craft expressive representations on the fly with groups of people. They invite participant engagement, employing techniques like analysis, modeling, dialogue mapping, creative exploration, and rationale capture as appropriate. Practitioners inhabit this role and respond to discontinuities with a wide variety of styles and modes of action. Surfacing and describing this variety are our interests here.
The authors have significant experience in leading deliberations using IBIS and other design rationale methods. They propose a theoretical framework to identify and analyse various moves that practitioners make in order to keep the discussion moving in productive direction. They also describe various tools that they used to analyse discussions, and specific instances of the use of these tools. In the remainder of this post, I’ll focus on their theoretical framework rather than the specific case studies, as the former (I think) will be of more interest to readers. Further, I will focus on aspects of the framework that pertain to the practice – i.e. the things the practitioner does in order to keep the design representation coherent, the participants engaged and the discussion useful, i.e. moving in productive directions.
The dimensions of design rationale practice
So what do facilitators do when they lead deliberations? The key actions they undertake are best summed up in the authors’ words:
…when people act as PDR practitioners, they inherently make choices about how to proceed, give form to the visual and other representational products , help establish meanings, motives, and causality and respond when something breaks the expected flow of events , often having to invent fresh and creative responses on the spot.
This sentence summarises the important dimensions of the practice. Let’s look at each of the dimensions in brief:
- Ethics: At key points in the discussion, the practitioner is required to make decisions on how to proceed. These decisions cannot (should not!) be dispassionate or objective (as is often assumed), they need to be made with due consideration of “what is good and what is not good” from the perspective of the entire group.
- Aesthetics: This refers to the representation (map) of the discussion. As the authors put it, “All diagrammatic DR approaches have explicit and implicit rules about what constitutes a clear and expressive representation. People conversant with the approaches can quickly tell whether a particular artifact is a “good” example. This is the province of aesthetics.” In participatory design, representations are created as the discussion unfolds. The aesthetic responsibility of the practitioner is to create a map that is syntactically correct and expressive. Another aspect of the aesthetic dimension is that a “beautiful” map will engage the audience, much like a work of art.
- Narrative: One of the key responsibilities of the practitioner is to construct a coherent narrative from the diverse contributions made by the participants. Skilled practitioners pick up connections between different contributions and weave these into a coherent narrative. That said, the narrative isn’t just the practitioner’s interpretation; the practitioner has to ensure that everyone in the group is happy with the story; the story is to the group’ s story. A coherent narrative helps the group make sense of the discussion: specifically the issues canvassed, ideas offered and arguments for and against each of them. Building such a narrative can be challenging because design discussions often head off in unexpected directions.
- Sensemaking: During deliberations it is quite common that the group gets stuck. Progress can be blocked for a variety of reasons ranging from a lack of ideas on how to make progress to apparently irreconcilable differences of opinion on the best way to move forward. At these junctures the role of the practitioner is to break the impasse. Typically this involves conversational moves that open new ground (not considered by the group up to that point) or find ways around obstacles (perhaps by suggesting compromises or new choices). The key skill in sensemaking is the ability to improvise, which segues rather nicely into the next variable.
- Improvisation: Books such as Jeff Conklin’s classic on dialogue mapping describe some standard moves and good practices in PDR practice. In reality, however, a practitioner will inevitably encounter situations that cannot be tackled using standard techniques. In such situations the practitioner has to improvise. This could involve making unconventional moves within the representation or even using another representation altogether. These improvisations are limited only by the practitioner’s creativity and experience.
Using case studies, the authors illustrate how design rationale sessions can be analysed along the aforementioned dimensions, both at a micro and macro level. The former involves a detailed move-by-move study of the session and the latter an aggregated view, based on the overall tenor of episodes consisting of several moves. I won’t say any more about the analyses here, instead I’ll discuss the relevance of the model to the actual practice of design rationale techniques such as dialogue mapping.
Some reflections on the model
When I first heard about dialogue mapping, I felt the claims made about the technique were exaggerated: it seemed impossible that a simple notation like IBIS (which consists of just three elements and a simple grammar) could actually enhance collaboration and collective creativity of a group. With a bit of experience, I began to see that it actually did do what it claimed to do. However, I was unable to explain to others how or why it worked. In one conversation with a manager, I found myself offering hand-waving explanations about the technique – which he (quite rightly) found unconvincing. It seemed that the only way to see how or why it worked was to use it oneself. In short: I realised that the technique involved tacit rather than explicit knowledge.
Now, most practices– even the most mundane ones – involve a degree of tacitness. In fact, in an earlier post I have argued that the concept of best practice is flawed because it assumes that the knowledge involved in a practice can be extracted in its entirety and codified in a manner that others can understand and reproduce. This assumption is incorrect because the procedural aspects of a practice (which can be codified) do not capture everything – they miss aspects such as context and environmental influences, for instance. As a result a practice that works in a given situation may not work in another, even though the two may be similar. So it is with PDR techniques – they work only when tailored on the fly to the situation at hand. Context is king. In contrast the procedural aspects of PDR techniques– syntax, grammar etc – are trivial and can be learnt in a short time.
In my opinion, the value of the model is that it attempts to articulate tacit aspects of PDR techniques. In doing so, it tells us why the techniques work in one particular situation but not in another. How so? Well, the model tells us the things that PDR practitioners worry about when they facilitate PDR sessions – they worry about the form of the map (aesthetics), the story it tells (narrative), helping the group resolve difficult issues (sensemaking), making the right choices (ethics) and stepping outside the box if necessary (improvisation). These are tacit skills- they can’t be taught via textbooks, they can only be learnt by doing. Moreover, when such techniques fail the reason can usually be traced back to a failure (of the facilitator) along one or more of these dimensions.
Techniques to capture participatory design rationale have been around for a while. Although it is generally acknowledged that such techniques aid the process of collaborative design, it is also known that their usefulness depends rather critically on the skill of the practitioner. This being the case, it is important to know what exactly skilled practitioners do that sets them apart from novices and journeymen. The model is a first step towards this. By identifying the dimensions of PDR practice, the model gives us a means to analyse practitioner moves during PDR sessions – for example, one can say that this was a sensemaking move or that was an improvisation. Awareness of these types of moves and how they work in real life situations can help novices learn the basics of the craft and practitioners master its finer points.