On the interpretation of probabilities in project management
Managers have to make decisions based on an imperfect and incomplete knowledge of future events. One approach to improving managerial decision-making is to quantify uncertainties using probability. But what does it mean to assign a numerical probability to an event? For example, what do we mean when we say that the probability of finishing a particular task in 5 days is 0.75? How is this number to be interpreted? As it turns out there are several ways of interpreting probabilities. In this post I’ll look at three of these via an example drawn from project estimation.
Although the question raised above may seem somewhat philosophical, it is actually of great practical importance because of the increasing use of probabilistic techniques (such as Monte Carlo methods) in decision making. Those who advocate the use of these methods generally assume that probabilities are magically “given” and that their interpretation is unambiguous. Of course, neither is true – and hence the importance of clarifying what a numerical probability really means.
Assume there’s a task that needs doing – this may be a project task or some other job that a manager is overseeing. Let’s further assume that we know the task can take anywhere between 2 to 8 days to finish, and that we (magically!) have numerical probabilities associated with completion on each of the days (as shown in the table below). I’ll say a teeny bit more about how these probabilities might be estimated shortly.
|Task finishes on||Probability|
This table is a simple example of what’s technically called a probability distribution. Distributions express probabilities as a function of some variable. In our case the variable is time.
How are these probabilities obtained? There is no set method to do this but commonly used techniques are:
- By using historical data for similar tasks.
- By asking experts in the field.
Estimating probabilities is a hard problem. However, my aim in this article is to discuss what probabilities mean, not how they are obtained. So I’ll take the probabilities mentioned above as given and move on.
The rules of probability
Before we discuss the possible interpretations of probability, it is necessary to mention some of the mathematical properties we expect probabilities to possess. Rather than present these in a formal way, I’ll discuss them in the context of our example.
Here they are:
- All probabilities listed are numbers that lie between 0 (impossible) and 1 (absolute certainty).
- It is absolutely certain that the task will finish on one of the listed days. That is, the sum of all probabilities equals 1.
- It is impossible for the task not to finish on one of the listed days. In other words, the probability of the task finishing on a day not listed in the table is 0.
- The probability of finishing on any one of many days is given by the sum of the probabilities for all those days. For example, the probability of finishing on day 2 or day 3 is 0.20 (i.e, 0.05+0.15). This holds because the two events are mutually exclusive – that is, the occurence of one event precludes the occurence of the other. Specifically, if we finish on day 2 we cannot finish on day 3 (or any other day) and vice-versa.
These statements illustrate the mathematical assumptions (or axioms) of probability. I won’t write them out in their full mathematical splendour, those interested in this should head off to the Wikipedia article on the axioms of probability.
Another useful concept is that of cumulative probability which, in our example, is the probability that the task will be completed by a particular day . For example, the probability that the task will be completed by day 5 is 0.75 (the sum of probabilities for days 2 through 5). In general, the cumulative probability of finishing on any particular day is the sum of probabilities of completion for all days up to and including that day.
Interpretations of probability
With that background out of the way, we can get to the main point of this article which is:
What do these probabilities mean?
We’ll explore this question using the cumulative probability example mentioned above, and by drawing on a paper by Glen Shafer entitled, What is Probability?
OK, so what is meant by the statement, “There is a 75% chance that the task will finish in 5 days.” ?
It could mean that:
- If this task is done many times over, it will be completed within 5 days in 75% of the cases. Following Shafer, we’ll call this the frequency interpretation.
- It is believed that there is a 75% chance of finishing this task in 5 days. Note that belief can be tested by seeing if the person who holds the belief is willing to place a bet on task completion with odds that are equivalent to the believed probability. Shafer calls this the belief interpretation.
- Based on a comparison to similar tasks this particular task has a 75% chance of finishing in 5 days. Shafer refers to this as the support interpretation.
(Aside: The belief and support interpretations involve subjective and objective states of knowledge about the events of interest respectively. These are often referred to as subjective and objective Bayesian interpretations because knowledge about these events can be refined using Bayes Theorem, providing one has relevant data regarding the occurrence of events.)
The interesting thing is that all the above interpretations can be shown to satisfy the axioms of probability discussed earlier (see Shafer’s paper for details). However, it is clear from the above that each of these interpretations have very different meanings. We’ll take a closer look at this next.
More about the interpretations and their limitations
The frequency interpretation appears to be the most rational one because it interprets probabilities in terms of results of experiments – I.e. it interprets probabilities as experimental facts, not beliefs. In Shafer’s words:
According to the frequency interpretation, the probability of an event is the long-run frequency with which the event occurs in a certain experimental setup or in a certain population. This frequency is a fact about the experimental setup or the population, a fact independent of any person’s beliefs.
However, there is a big problem here: it assumes that such an experiment can actually be carried out. This definitely isn’t possible in our example: tasks cannot be repeated in exactly the same way – there will always be differences, however small.
There are other problems with the frequency interpretation. Some of these include:
- There are questions about whether a sequence of trials will converge to a well-defined probability.
- What if the event cannot be repeated?
- How does one decide on what makes up the population of all events. This is sometimes called the reference class problem.
See Shafer’s article for more on these.
The belief interpretation treats probabilities as betting odds. In this interpretation a 75% probability of finishing in 5 days means that we’re willing to put up 75 cents to win a dollar if the task finishes in 5 days (or equivalently 25 cents to win a dollar if it doesn’t). Note that this says nothing about how the bettor arrives at his or her odds. These are subjective (personal) beliefs. However, they are experimentally determinable – one can determine peoples’ subjective odds by finding out how theyactually place bets.
There is a good deal of debate about whether the belief interpretation is normative or descriptive: that is, do the rules of probability tell us what people’s beliefs should be or do they tell us what they actually are. Most people trained in statistics would claim the former – that the rules impose conditions that beliefs should satisfy. In contrast, in management and behavioural science, probabilities based on subjective beliefs are often assumed to describe how the world actually is. However, the wealth of literature on cognitive biases suggests that the people’s actual beliefs, as reflected in their decisions, do not conform to the rules of probability. The latter observation seems to favour normative option, but arguments can be made in support (or refutation) of either position.
The problem mentioned the previous paragraph is a perfect segue into the support interpretation, according to which the probability of an event occurring is the degree to which we should believe that it will occur (based on available evidence). This seems fine until we realize that evidence can come in many “shapes and sizes.” For example, compare the statements “the last time we did something similar we finished in 5 days, based on which we reckon there’s a 70-80% chance we’ll finish in 5 days” and “based on historical data for gathered for 50 projects, we believe that we have a 75% chance of finishing in 5 days. “ The two pieces of evidence offer very different levels of support. Therefore, although the support interpretation appears to be more objective than the belief interpretation, it isn’t actually so because it is difficult to determine which evidence one should use. So, unlike the case of subjective beliefs (where one only has to ask people about their personal odds), it is not straightforward to determine these probabilities empirically.
So we’re left with a situation in which we have three interpretations, each of which address specific aspects of probability but also have major shortcomings.
Is there any way to break the impasse?
Shafer suggests that the three interpretations of probability are best viewed as highlighting different aspects of a single situation: that of an idealized case where we have a sequence of experiments with known probabilities. Let’s see how this statement (which is essentially the frequency interpretation) can be related to the other two interpretations.
Consider my belief that that the task has a 75% chance of finishing in 5 days. This is analogous to saying that if the task is done several times over, I believe it would finish in 5 days in 75% of the cases. My belief can be objectively confirmed by testing my willingness to put up 75 cents to win a dollar if the task finishes in five days. Now, when I place this bet I have my (personal) reasons for doing so. However, these reasons ought to relate to knowledge of the fair odds involved in the said bet. Such fair odds can only be derived from knowledge of what would happen in a (possibly hypothetical) sequence of experiments.
The key assumption in the above argument is that my personal odds aren’t arbitrary – I should be able to justify them to another (rational) person.
Let’s look at the support interpretation. In this case I have hard evidence for stating that there’s a 75% chance of finishing in 5 days. I can take this hard evidence as my personal degree of belief (remember, as stated in the previous paragraph, any personal degree of belief should have some such rationale behind it.) However, since it is based on hard evidence, it should be rationally justifiable and hence can be associated with a sequence of experiments.
The main point from the above is the following: probabilities may be interpreted in different ways, but they have an underlying unity. That is, when we state that there is a 75% probability of finishing a task in 5 days, we are implying all the following statements (with no preference for any particular one):
- If we were to do the task several times over, it will finish within five days in three-fourths of the cases. Of course, this will hold only if the task is done a sufficiently large number of times (which may not be practical in most cases)
- We are willing to place a bet given 3:1 odds of completion within five days.
- We have some hard evidence to back up statement (1) and our betting belief (2).
In reality, however, we tend to latch on to one particular interpretation depending on the situation. One is unlikely to think in terms of hard evidence when one is buying a lottery ticket but hard evidence is a must when estimating a project. When tossing a coin one might instinctively use the frequency interpretation but when estimating a task that hasn’t been done before one might use personal belief. Nevertheless, it is worth remembering that regardless of the interpretation we choose, all three are implied. So the next time someone gives you a probabilistic estimate, ask them if they have the evidence to back it up for sure, but don’t forget to ask if they’d be willing to accept a bet based on their own stated odds. :-)