Archive for the ‘Probability’ Category
The stats on the 200+ posts I’ve written since I started blogging make it pretty clear that:
- Much of what I write does not get much attention – i.e. it is not of interest to most readers.
- An interesting post – a rare occurrence in itself - is invariably followed by a series of uninteresting ones.
In this post, I ignore the very real possibility that my work is inherently uninteresting and discuss how the above observations can be explained via concepts of probability.
Base rate of uninteresting ideas
A couple of years ago I wrote a piece entitled, Trumped by Conditionality, in which I used conditional probability to show that majority of the posts on this blog will be uninteresting despite my best efforts. My argument was based on the following observations:
- There are many more uninteresting ideas than interesting ones. In statistical terminology one would say that the base rate of uninteresting ideas is high. This implies that if I write posts without filtering out bad ideas, I will write uninteresting posts far more frequently than interesting ones.
- The base rate as described above is inapplicable in real life because I do attempt to filter out the bad ideas. However, and this is the key: my ability to distinguish between interesting and uninteresting topics is imperfect. In other words, although I can generally identify an interesting idea correctly , there is a small (but significant) chance that I will incorrectly identify an uninteresting topic as being interesting.
Now, since uninteresting ideas vastly outnumber interesting ones and my ability to filter out uninteresting ideas is imperfect, it follows that the majority of the topics I choose to write about will be uninteresting. This is essentially the first point I made in the introduction.
Regression to the mean
The observation that good (i.e. interesting) posts are generally followed by a series of not so good ones is a consequence of a statistical phenomenon known as regression to the mean. In everyday language this refers to the common observation that an extreme event is generally followed by a less extreme one. This is simply a consequence of the fact that for many commonly encountered phenomena extreme events are much less likely to occur than events that are close to the average.
In the case at hand we are concerned with the quality of writing. Although writers might improve through practice, it is pretty clear that they cannot write brilliant posts every time they put fingers to keyboard. This is particularly true of bloggers and syndicated columnists who have to produce pieces according to a timetable – regardless of practice or talent, it is impossible to produce high quality pieces on a regular basis.
It is worth noting that people often incorrectly ascribe causal explanations to phenomena that can be explained by regression to the mean. Daniel Kahneman and Amos Tversky describe the following example in their classic paper on decision-related cognitive biases:
…In a discussion of flight training, experienced instructors noted that praise for an exceptionally smooth landing is typically followed by a poorer landing on the next try, while harsh criticism after a rough landing is usually followed by an improvement on the next try. The instructors concluded that verbal rewards are detrimental to learning, while verbal punishments are beneficial, contrary to accepted psychological doctrine. This conclusion is unwarranted because of the presence of regression toward the mean. As in other cases of repeated examination, an improvement will usually follow a poor performance and a deterioration will usually follow an outstanding performance, even if the instructor does not respond to the trainee’s achievement on the first attempt…
So, although I cannot avoid the disappointment that follows the high of writing a well-received post, I can take (perhaps, false) comfort in the possibility that I’m a victim of statistics.
Finally, l would be remiss if I did not consider an explanation which, though unpleasant, may well be true: there is the distinct possibility that everything I write about is uninteresting. Needless to say, I reckon the explanations (rationalisations?) offered above are far more likely to be correct :-)
Project estimates are generally based on assumptions about future events and their outcomes. As the future is uncertain, the concept of probability is sometimes invoked in the estimation process. There’s enough been written about how probabilities can be used in developing estimates; indeed there are a good number of articles on this blog – see this post or this one, for example. However, most of these writings focus on the practical applications of probability rather than on the concept itself – what it means and how it should be interpreted. In this article I address the latter point in a way that will (hopefully!) be of interest to those working in project management and related areas.
Uncertainty is a shape, not a number
Since the future can unfold in a number of different ways one can describe it only in terms of a range of possible outcomes. A good way to explore the implications of this statement is through a simple estimation-related example:
Assume you’ve been asked to do a particular task relating to your area of expertise. From experience you know that this task usually takes 4 days to complete. If things go right, however, it could take as little as 2 days. On the other hand, if things go wrong it could take as long as 8 days. Therefore, your range of possible finish times (outcomes) is anywhere between 2 to 8 days.
Clearly, each of these outcomes is not equally likely. The most likely outcome is that you will finish the task in 4 days. Moreover, the likelihood of finishing in less than 2 days or more than 8 days is zero. If we plot the likelihood of completion against completion time, it would look something like Figure 1.
Figure 1 begs a couple of questions:
- What are the relative likelihoods of completion for all intermediate times – i.e. those between 2 to 4 days and 4 to 8 days?
- How can one quantify the likelihood of intermediate times? In other words, how can one get a numerical value of the likelihood for all times between 2 to 8 days? Note that we know from the earlier discussion that this must be zero for any time less than 2 or greater than 8 days.
The two questions are actually related: as we shall soon see, once we know the relative likelihood of completion at all times (compared to the maximum), we can work out its numerical value.
Since we don’t know anything about intermediate times (I’m assuming there is no historical data available, and I’ll have more to say about this later…), the simplest thing to do is to assume that the likelihood increases linearly (as a straight line) from 2 to 4 days and decreases in the same way from 4 to 8 days as shown in Figure 2. This gives us the well-known triangular distribution.
Note: The term distribution is simply a fancy word for a plot of likelihood vs. time.
Of course, this isn’t the only possibility; there are an infinite number of others. Figure 3 is another (admittedly weird) example.
Further, it is quite possible that the upper limit (8 days) is not a hard one. It may be that in exceptional cases the task could take much longer (say, if you call in sick for two weeks) or even not be completed at all (say, if you leave for that mythical greener pasture). Catering for the latter possibility, the shape of the likelihood might resemble Figure 4.
From the figures above, we see that uncertainties are shapes rather than single numbers, a notion popularised by Sam Savage in his book, The Flaw of Averages. Moreover, the “shape of things to come” depends on a host of factors, some of which may not even be on the radar when a future event is being estimated.
Making likelihood precise
Thus far, I have used the word “likelihood” without bothering to define it. It’s time to make the notion more precise. I’ll begin by asking the question: what common sense properties do we expect a quantitative measure of likelihood to have?
Consider the following:
- If an event is impossible, its likelihood should be zero.
- The sum of likelihoods of all possible events should equal complete certainty. That is, it should be a constant. As this constant can be anything, let us define it to be 1.
In terms of the example above, if we denote time by and the likelihood by then:
Where denotes the sum of all non-zero likelihoods – i.e. those that lie between 2 and 8 days. In simple terms this is the area enclosed by the likelihood curves and the x axis in figures 2 to 4. (Technical Note: Since is a continuous variable, this should be denoted by an integral rather than a simple sum, but this is a technicality that need not concern us here)
is , in fact, what mathematicians call probability- which explains why I have used the symbol rather than . Now that I’ve explained what it is, I’ll use the word “probability” instead of ” likelihood” in the remainder of this article.
With these assumptions in hand, we can now obtain numerical values for the probability of completion for all times between 2 and 8 days. This can be figured out by noting that the area under the probability curve (the triangle in figure 2 and the weird shape in figure 3) must equal 1. I won’t go into any further details here, but those interested in the maths for the triangular case may want to take a look at this post where the details have been worked out.
The meaning of it all
(Note: parts of this section borrow from my post on the interpretation of probability in project management)
So now we understand how uncertainty is actually a shape corresponding to a range of possible outcomes, each with their own probability of occurrence. Moreover, we also know, in principle, how the probability can be calculated for any valid value of time (between 2 and 8 days). Nevertheless, we are still left with the question as to what a numerical probability really means.
As a concrete case from the example above, what do we mean when we say that there is 100% chance (probability=1) of finishing within 8 days? Some possible interpretations of such a statement include:
- If the task is done many times over, it will always finish within 8 days. This is called the frequency interpretation of probability, and is the one most commonly described in maths and physics textbooks.
- It is believed that the task will definitely finish within 8 days. This is called the belief interpretation. Note that this interpretation hinges on subjective personal beliefs.
- Based on a comparison to similar tasks, the task will finish within 8 days. This is called the support interpretation.
Note that these interpretations are based on a paper by Glen Shafer. Other papers and textbooks frame these differently.
The first thing to note is how different these interpretations are from each other. For example, the first one offers a seemingly objective interpretation whereas the second one is unabashedly subjective.
So, which is the best – or most correct – one?
A person trained in science or mathematics might claim that the frequency interpretation wins hands down because it lays out an objective, well -defined procedure for calculating probability: simply perform the same task many times and note the completion times.
Problem is, in real life situations it is impossible to carry out exactly the same task over and over again. Sure, it may be possible to almost the same task, but even straightforward tasks such as vacuuming a room or baking a cake can hold hidden surprise (vacuum cleaners do malfunction and a friend may call when one is mixing the batter for a cake). Moreover, tasks that are complex (as is often the case in the project work) tend to be unique and can never be performed in exactly the same way twice. Consequently, the frequency interpretation is great in theory but not much use in practice.
“That’s OK,” another estimator might say,” when drawing up an estimate, I compared it to other similar tasks that I have done before.”
This is essentially the support interpretation (interpretation 3 above). However, although this seems reasonable, there is a problem: tasks that are superficially similar will differ in the details, and these small differences may turn out to be significant when one is actually carrying out the task. One never knows beforehand which variables are important. For example, my ability to finish a particular task within a stated time depends not only on my skill but also on things such as my workload, stress levels and even my state of mind. There are many external factors that one might not even recognize as being significant. This is a manifestation of the reference class problem.
So where does that leave us? Is probability just a matter of subjective belief?
No, not quite: in reality, estimators will use some or all of three interpretations to arrive at “best guess” probabilities. For example, when estimating a project task, a person will likely use one or more of the following pieces of information:
- Experience with similar tasks.
- Subjective belief regarding task complexity and potential problems. Also, their “gut feeling” of how long they think it ought to take. These factors often drive excess time or padding that people work into their estimates.
- Any relevant historical data (if available)
Clearly, depending on the situation at hand, estimators may be forced to rely on one piece of information more than others. However, when called upon to defend their estimates, estimators may use other arguments to justify their conclusions depending on who they are talking to. For example, in discussions involving managers, they may use hard data presented in a way that supports their estimates, whereas when talking to their peers they may emphasise their gut feeling based on differences between the task at hand and similar ones they have done in the past. Such contradictory representations tend to obscure the means by which the estimates were actually made.
Estimates are invariably made in the face of uncertainty. One way to get a handle on this is by estimating the probabilities associated with possible outcomes. Probabilities can be reckoned in a number of different ways. Clearly, when using them in estimation, it is crucial to understand how probabilities have been derived and the assumptions underlying these. We have seen three ways in which probabilities are interpreted corresponding to three different ways in which they are arrived at. In reality, estimators may use a mix of the three approaches so it isn’t always clear how the numerical value should be interpreted. Nevertheless, an awareness of what probability is and its different interpretations may help managers ask the right questions to better understand the estimates made by their teams.
The essential idea behind group estimation is that an estimate made by a group is likely to be more accurate than one made by an individual in the group. This notion is the basis for the Delphi method and its variants. In this post, I use arguments involving probabilities to gain some insight into the conditions under which group estimates are more accurate than individual ones.
An insight from conditional probability
Let’s begin with a simple group estimation scenario.
Assume we have two individuals of similar skill who have been asked to provide independent estimates of some quantity, say a project task duration. Further, let us assume that each individual has a probability of making a correct estimate.
Based on the above, the probability that they both make a correct estimate, , is:
This is a consequence of our assumption that the individual estimates are independent of each other.
Similarly, the probability that they both get it wrong, , is:
Now we can ask the following question:
What is the probability that both individuals make the correct estimate if we know that they have both made the same estimate?
This can be figured out using Bayes’ Theorem, which in the context of the question can be stated as follows:
In the above equation, is the probability that both individuals get it right given that they have made the same estimate (which is what we want to figure out). This is an example of a conditional probability – i.e. the probability that an event occurs given that another, possibly related event has already occurred. See this post for a detailed discussion of conditional probabilities.
Similarly, is the conditional probability that both estimators make the same estimate given that they are both correct. This probability is 1.
Answer: If both estimators are correct then they must have made the same estimate (i.e. they must both within be an acceptable range of the right answer).
Finally, is the probability that both make the same estimate. This is simply the sum of the probabilities that both get it right and both get it wrong. Expressed in terms of this is, .
Now lets apply Bayes’ theorem to the following two cases:
- Both individuals are good estimators – i.e. they have a high probability of making a correct estimate. We’ll assume they both have a 90% chance of getting it right ().
- Both individuals are poor estimators – i.e. they have a low probability of making a correct estimate. We’ll assume they both have a 30% chance of getting it right ()
Consider the first case. The probability that both estimators get it right given that they make the same estimate is:
Thus we see that the group estimate has a significantly better chance of being right than the individual ones: a probability of 0.9878 as opposed to 0.9.
In the second case, the probability that both get it right is:
The situation is completely reversed: the group estimate has a much smaller chance of being right than an individual estimate!
In summary: estimates provided by a group consisting of individuals of similar ability working independently are more likely to be right (compared to individual estimates) if the group consists of competent estimators and more likely to be wrong (compared to individual estimates) if the group consists of poor estimators.
Assumptions and complications
I have made a number of simplifying assumptions in the above argument. I discuss these below with some commentary.
- The main assumption is that individuals work independently. This assumption is not valid for many situations. For example, project estimates are often made by a group of people working together. Although one can’t work out what will happen in such situations using the arguments of the previous section, it is reasonable to assume that given the right conditions, estimators will use their collective knowledge to work collaboratively. Other things being equal, such collaboration would lead a group of skilled estimators to reinforce each others’ estimates (which are likely to be quite similar) whereas less skilled ones may spend time arguing over their (possibly different and incorrect) guesses. Based on this, it seems reasonable to conjecture that groups consisting of good estimators will tend to make even better estimates than they would individually whereas those consisting of poor estimators have a significant chance of making worse ones.
- Another assumption is that an estimate is either good or bad. In reality there is a range that is neither good nor bad, but may be acceptable.
- Yet another assumption is that an estimator’s ability can be accurately quantified using a single numerical probability. This is fine providing the number actually represents the person’s estimation ability for the situation at hand. However, typically such probabilities are evaluated on the basis of past estimates. The problem is, every situation is unique and history may not be a good guide to the situation at hand. The best way to address this is to involve people with diverse experience in the estimation exercise. This will almost often lead to a significant spread of estimates which may then have to be refined by debate and negotiation.
Real-life estimation situations have a number of other complications. To begin with, the influence that specific individuals have on the estimation process may vary – a manager who is a poor estimator may, by virtue of his position, have a greater influence than others in a group. This will skew the group estimate by a factor that cannot be estimated. Moreover, strategic behaviour may influence estimates in a myriad other ways. Then there is the groupthink factor as well.
…and I’m sure there are many others.
Finally I should mention that group estimates can depend on the details of the estimation process. For example, research suggests that under certain conditions competition can lead to better estimates than cooperation.
In this post I have attempted to make some general inferences regarding the validity of group estimates based on arguments involving conditional probabilities. The arguments suggest that, all other things being equal, a collective estimate from a bunch of skilled estimators will generally be better than their individual estimates whereas an estimate from a group of less skilled estimators will tend to be worse than their individual estimates. Of course, in real life, there are a host of other factors that can come into play: power, politics and biases being just a few. Though these are often hidden, they can influence group estimates in inestimable ways.
Thanks go out to George Gkotsis and Craig Brown for their comments which inspired this post.
(Note: An Excel sheet showing sample calculations and plots discussed in this post can be downloaded here.)
Some months ago, I wrote a post explaining the basics of Monte Carlo simulation using the example of a drunkard throwing darts at a board. In that post I assumed that the darts could land anywhere on the dartboard with equal probability. In other words, the hit locations were assumed to be uniformly distributed. In a comment on the piece, George Gkotsis challenged this assumption, arguing that that regardless of the level of inebriation of the thrower, a dart would be more likely to land near the centre of the board than away from it (providing the player is at least moderately skilled). He also suggested using the Normal Distribution to model the spread of hits, with the variance of the distribution serving as a rough measure of the inaccuracy (or drunkenness!) of the drunkard. In George’s words:
I would propose to introduce a ‘skill’ factor, which represents the circle/square ratio (maybe a normal-Gaussian distribution). Of course, this skill factor would be very low (high variance) for a drunken player, but would still take into account the fact that throwing darts into a square is not purely random.
In this post I revisit the drunkard’s dartboard, taking into account George’s suggestions.
Setting the stage
To keep things simple, I’ll make the following assumptions:
- The dartboard is a circle of radius 0.5 units centred at the origin (see Figure 1)
- The chance of a hit is greatest at the centre of the dartboard and falls off as one moves away from it.
- The distribution of hits is a function of distance from the centre but does not depend on direction. In mathematical terms, for a given distance from the centre of the dartboard, the dart can land at any angle with equal probability, being the angle between the line joining the centre of the board to the dart and the x axis. See Figure 2 for graphical representations of a hit location in terms of and . Note that that the and coordinates can be obtained using the formulas and as s shown in Figure 2.
- Hits are distributed according to the Normal distribution with maximum at the centre of the dartboard.
- The variance of the Normal distribution is a measure of inaccuracy/drunkenness of the drunkard: the more drunk the drunk, the greater the variation in his aim.
These assumptions are consistent with George’s suggestions.
[Note to the reader: you may want to download the demo before continuing.]
The steps of a simulation run are as follows:
- Generate a number that is normally distributed with a zero mean and a specified standard deviation. This gives the distance, , of a randomly thrown dart from the centre of the board for a player with a “inaccuracy factor” represented by the standard deviation. Column A in the demo contains normally distributed random numbers with zero mean and a standard deviation of 0.2 . Note that I selected the latter number for no other reason than the results show up clearly on a fixed-axis plot shown in Figure 2.
- Generate a uniformly distributed random number lying between 0 and . This represents the angle . This is the content of column B of the demo.
- The numbers obtained from steps 1 and 2 for completely specify the location of a hit. The location’s and coordinates can be worked out using the formulas and . These are listed in columns C and D in the Excel demo.
- Re-run steps 1 through 4 as many times as needed. Note that the demo is set up for 5000 runs. You can change this manually or, better yet, automate it. The latter is left as an exercise for you.
It is instructive to visualize the resulting hits using a scatter plot. Among other things this can tell you, at a glance, if the results make sense. For example, we would expect hits to be symmetrically distributed about the origin because the drunkard’s throws are not biased in any particular direction around the centre). A non-symmetrical distribution is thus an indication that there is an error in the calculations.
Now, any finite collection of hits is unlikely to be perfectly symmetrical because of outliers. Nevertheless, the distributions should be symmetrical on average. To test this, run the demo a few times (hit F9 with the demo open). Notice how the position of outliers and the overall shape of the distribution of points changes randomly from simulation to simulation. In all cases, however, there is a clear maximum at the centre of the dartboard with the probability of a hit falling with distance from the centre.
Figure 3 shows the results of simulations for a standard deviation of 0.2. Figures 4 and 5 show the results of simulations for standard deviations of 0.1 and 0.4.
Note that the plot has fixed axes- i.e. the area depicted is the 1×1 square that encloses the dartboard, regardless of the standard deviation. Consequently, for larger standard deviations (such as 0.4) many hits will be out of range and will not show up on the plot.
As I have stressed in my previous posts on Monte Carlo simulation, the usefulness of a simulation depends on the choice of an appropriate distribution. If the selected distribution does not reflect reality, neither will the simulation. This is true regardless of whether one is simulating a drunkard’s wayward aim or the duration of project task. You may have noted that the assumption of normally-distributed hits has no justification whatsoever; it is just as arbitrary as my original assumption of uniformity. In fact, the hit locations of drunken dart throws is highly unlikely to be either uniform or Normal. Nevertheless, I hope that some of my readers will find the above example to be of pedagogical value.
Thanks to George Gkotsis for his comment which got me thinking about this post.
More often than not, managerial decisions are made on the basis of uncertain information. To lend some rigour to the process of decision making, it is sometimes assumed that uncertainties of interest can be quantified accurately using probabilities. As it turns out, this assumption can be incorrect in many situations because the probabilities themselves can be uncertain. In this post I discuss a couple of ways in which such uncertainty about uncertainty can manifest itself.
The problem of vagueness
In a paper entitled, “Is Probability the Only Coherent Approach to Uncertainty?”, Mark Colyvan made a distinction between two types of uncertainty:
- Uncertainty about some underlying fact. For example, we might be uncertain about the cost of a project – that there will be a cost is a fact, but we are uncertain about what exactly it will be.
- Uncertainty about situations where there is no underlying fact. For example, we might be uncertain about whether customers will be satisfied with the outcome of a project. The problem here is the definition of customer satisfaction. How do we measure it? What about customers who are neither satisfied nor dissatisfied? There is no clear-cut definition of what customer satisfaction actually is.
The first type of uncertainty refers to the lack of knowledge about something that we know exists. This is sometimes referred to as epistemic uncertainty – i.e. uncertainty pertaining to knowledge. Such uncertainty arises from imprecise measurements, changes in the object of interest etc. The key point is that we know for certain that the item of interest has well-defined properties, but we don’t know what they are and hence the uncertainty. Such uncertainty can be quantified accurately using probability.
Vagueness, on the other hand, arises from an imprecise use of language. Specifically, the term refers to the use of criteria that cannot distinguish between borderline cases. Let’s clarify this using the example discussed earlier. A popular way to measure customer satisfaction is through surveys. Such surveys may be able to tell us that customer A is more satisfied than customer B. However, they cannot distinguish between borderline cases because any boundary between satisfied and not satisfied customers is arbitrary. This problem becomes apparent when considering an indifferent customer. How should such a customer be classified – satisfied or not satisfied? Further, what about customers who choose not to respond? It is therefore clear that any numerical probability computed from such data cannot be considered accurate. In other words, the probability itself is uncertain.
Ambiguity in classification
Although the distinction made by Colyvan is important, there is a deeper issue that can afflict uncertainties that appear to be quantifiable at first sight. To understand how this happens, we’ll first need to take a brief look at how probabilities are usually computed.
An operational definition of probability is that it is the ratio of the number of times the event of interest occurs to the total number of events observed. For example, if my manager notes my arrival times at work over 100 days and finds that I arrive before 8:00 am on 62 days then he could infer that the probability my arriving before 8:00 am is 0.62. Since the probability is assumed to equal the frequency of occurrence of the event of interest, this is sometimes called the frequentist interpretation of probability.
The above seems straightforward enough, so you might be asking: where’s the problem?
The problem is that events can generally be classified in several different ways and the computed probability of an event occurring can depend on the way that it is classified. This is called the reference class problem. In a paper entitled, “The Reference Class Problem is Your Problem Too”, Alan Hajek described the reference class problem as follows:
“The reference class problem arises when we want to assign a probability to a proposition (or sentence, or event) X, which may be classified in various ways, yet its probability can change depending on how it is classified.”
Consider the situation I mentioned earlier. My manager’s approach seems reasonable, but there is a problem with it: all days are not the same as far as my arrival times are concerned. For example, it is quite possible that my arrival time is affected by the weather: I may arrive later on rainy days than on sunny ones. So, to get a better estimate my manager should also factor in the weather. He would then end up with two probabilities, one for fine weather and the other for foul. However, that is not all: there are a number of other criteria that could affect my arrival times – for example, my state of health (I may call in sick and not come in to work at all), whether I worked late the previous day etc.
What seemed like a straightforward problem is no longer so because of the uncertainty regarding which reference class is the right one to use.
Before closing this section, I should mention that the reference class problem has implications for many professional disciplines. I have discussed its relevance to project management in my post entitled, “The reference class problem and its implications for project management”.
In this post we have looked at a couple of forms of uncertainty about uncertainty that have practical implications for decision makers. In particular, we have seen that probabilities used in managerial decision making can be uncertain because of vague definitions of events and/or ambiguities in their classification. The bottom line for those who use probabilities to support decision-making is to ensure that the criteria used to determine events of interest refer to unambiguous facts that are appropriate to the situation at hand. To sum up: decisions made on the basis of probabilities are only as good as the assumptions that go into them, and the assumptions themselves may be prone to uncertainties such as the ones described in this article.