Eight to Late

Sensemaking and Analytics for Organizations

Improving project forecasts

with 17 comments

Many projects are plagued by cost overruns and benefit shortfalls. So much so that a quick search on Google News  almost invariably returns a recent news item reporting a high-profile cost overrun.  In a 2006 paper entitled, From Nobel Prize to Project Management: Getting Risks Right, Bent Flyvbjerg discusses the use of reference class forecasting to reduce inaccuracies in project forecasting. This technique, which is based on theories of decision-making in uncertain (or risky) environments,1 forecasts the outcome of a planned action based on actual outcomes in a collection of actions similar to the one being forecast. In this post I present a brief overview of reference class forecasting and its application to estimating projects. The discussion is based on Flyvbjerg’s paper.

According to Flyvbjerg, the reasons for inaccuracies in project forecasts fall into one or more of the following categories:

  • Technical – These are reasons pertaining to unreliable data or the use of inappropriate forecasting models.
  • Psychological  – This pertains to the inability of most people to judge future events in an objective way. Typically it manifests itself as undue optimism, unsubstantiated by facts; behaviour that is sometimes referred to as optimism bias. This is the reason for statements like, “No problem, we’ll get this to you in a day.” – when the actual time is more like a week.
  • Political – This refers to the tendency of people to misrepresent things for their own gain – e.g. one might understate costs and / or overstate benefits in order to get a project funded. Such behaviour is sometimes called strategic misrepresentation (commonly known as lying!) .

Technical explanations are often used to explain inaccurate forecasts. However, Flyvbjerg rules these out as valid explanations for the following reasons. Firstly, inaccuracies attributable to data errors (technical errors) should be normally distributed with average zero, but actual inaccuracies were shown to be non-normal in a variety of cases. Secondly, if inaccuracies in data and models were the problem, one would expect this to get better as models and data collection techniques get better. However, this clearly isn’t the case, as projects continue to suffer from huge forecasting errors.

Based on the above Flyvbjerg concludes that technical explanations do not account for forecast inaccuracies as comprehensively as psychological and political explanations do.   Both the latter involve human bias. Such bias is inevitable when one takes an inside view, which focuses on the internals of a project – i.e. the means (or processes) through which a project will be implemented.  Instead, Flyvbjerg suggests taking an outside view – one which focuses on outcomes of similar (already completed) projects rather than on the current project. This is precisely what reference class forecasting does, as I explain below.  

Reference class forecasting is a systematic way of taking an outside view of planned activities, thereby eliminating human bias. In the context of projects this amounts to creating a probability distribution of estimates based on data for completed projects that are similar to the one of interest, and then comparing the said project with the distribution in order to get a most likely outcome. Basically, reference class forecasting consists of the following steps:

  1. Collecting data for a number of similar past projects – these projects form the reference class. The reference class must encompass a sufficient number of projects to produce a meaningful statistical distribution, but individual projects must be similar to the project of interest.
  2. Establishing a probability distribution based on (reliable!) data for the reference class.  The challenge here is to get good data for a sufficient number of reference class projects.
  3. Predicting most likely outcomes for the project of interest based on comparisons with the reference class distribution.

In the paper, Flyvbjerg describes an application of reference class forecasting to large scale transport infrastructure projects. The processes and procedures used are published in a guidance document entitled Procedures for Dealing with Optimism Bias in Transport Planning, so I won’t go into details here. The trick, of course, is to get reliable data for similar projects. Not an easy task.

To conclude, project forecasts are often off the mark by a wide margin. Reference class forecasting is an objective technique that eliminates human bias from the estimating process. However, because of the cost and effort involved in building the reference distribution, it may only be practical to use it on megaprojects.



1Daniel Kahnemann received the Nobel Prize in Economics in 2002 for his work on how people make decisions in uncertain situations. His work, which is called Prospect Theory, forms the basis of Reference Class Forecasting.

Written by K

June 15, 2008 at 12:18 pm

Posted in Bias, Project Management

Tagged with

17 Responses

Subscribe to comments with RSS.

  1. Readers should be cautioned that bias risk is only one of many risks, and bias is not the dominant cost risk driver on most projects. Bias mostly impacts megaprojects where government is involved. With corporate survival at risk, private industry cannot afford to kid themselves to the same extent. In fact, one of the few solid empirical studies reference by Flyvbjerg (e.g., Merrow RAND study of 1981) showed that “technical” reasons explained 83% of the cost overruns for pioneer process plant projects. The research I have reviewed to date does not at all justify Flyvberjg’s conclusion. Unlike the RAND study, most of the research into the causes of risk have been shallow–in other words, it is not fair to say that technical risks are not there.
    Non-normal variations in cost are a well known outcome of authorizing projects on the basis of too little information (i.e., if you don’t know the scope, you are going to miss a lot of it in the estimate). No “improvement” in the base estimating method can address missing scope.
    It is lazy and detrimental to improving outcomes to rely solely on this method to understand cost risk. For the vast majority of projects, the drivers of cost risks are identifiable and managable.
    Estimate validation using empirically based reference data (validation has been around for decades) is always recommended. Just don’t expect it to help you really understand and mitigate the risks it uncovers.


    John Hollmann

    June 24, 2008 at 6:16 am

  2. John,

    Thank you for your comments.

    I agree – no estimating technique is going to help if you don’t know the scope. It also makes sense that non-normal cost variations are a consequence of authorising projects on the basis of too little information. However, scope definition is really something that precedes (or should precede!) estimation. It is the responsibility of those involved to ensure that it is properly defined before estimating time and costs.

    Regarding bias, I have seen a fair number of senior managers in the corporate world, who ingnore evidence that contradicts or detracts from their pet projects or initiatives. Having said that, I do agree that government megaprojects are more likely to suffer from bias. Further, the stakes involved are also much higher (scope, cost, time etc.)

    You make an excellent point about reference class forecasting not being of any help in understanding and mitigating risks. That is indeed a separate exercise (regardless of what the literature may say on this).

    Finally, I should say that I’m no expert in reference class forecasting. So, thanks again for your comments, which will provide readers a practical view on the use of reference class forecasting in projects.





    June 24, 2008 at 9:28 am

  3. […] to be objective in his or her estimates. There are objective estimation methods – see my post on reference class forecasting for example – but these can be hard to apply in practice. Another technique that has been used to […]


  4. In response to the two comments above. Let me correct some points.
    First hypothesis of reference class forecasting: There is no non-normal cost variation.
    Second hypothesis all experts bias their forecasts.

    Optimism bias influences all forecasts, costs and risks alike. Even if you use historical records, and cost databases to plan your project, you are more than likely that something unexpected hits you. In the PMBOK these painful experiences are called unknown-unknowns. There are always unexpected delays in delivery, strikes, demonstrations of environmentalist groups etc. You name it. These things happen, and yet they might not happen at all.

    Reference Class Forecasting, regresses your forecast to the mean of previous forecasts. In the flavor described in this article, it is the mean of cost-overruns previously experienced.

    Example 1:
    We have a perfectly planned project, extensive validation against historical records, cost databases, well defined and evaluated risk register etc. Using the technique described you might find that 80% of these well-planned projects overrun their budget by +20%. As such this method suggest that you plan another buffer/contingency of 20% for these unknown unknowns.

    Example II:
    Let’s assume the other extreme. A project is decided upon with little or no information. Then this technique would suggest building a reference class of similar projects. To no surprise this will uncover that you need a risk uplift of, for instance, +250%. It’s all in the reference class. As such you automatically account for environmental influences which are typically for project type, industry, this specific organisation. There is an endless list of quantitative and qualitative research on the origins and sources of risks on projects. Instead of analysing them all one by one, this method applies a little short cut to the problem.

    A final word: This technique tries not to manage risks of a single project. As the calculation example in the article suggests, there is a contingency interval to your reference class forecast, you still might be wrong in 20% or 50% of projects (depending on the contingency level you choose). This is a planning and forecasting technique ideally suited to minimize risks in a programme or portfolio.
    That a project manager has to deal with issues, risks, and politics on a daily basis is whole different animal, one which this technique acknowledges and budgets for, but it’s neither explained nor solved by this technique.




    September 24, 2008 at 6:36 am

  5. Alex,

    Thanks for your detailed comments and insights. I particularly like the examples you have used to illustrate your points. A few remarks follow:

    Although all experts bias their estimates, it seems intuitively clear that using historical data will improve the objectivity (if not accuracy!) of one’s estimates – that is, it will reduce optimism bias. Clearly, this depends (rather critically!) on choosing history that is congruent to the project at hand.

    You make a very important point that managing unknown-unknowns is a separate exercise. RCF will only account for known factors (to the extent that these are captured in the historical data). In this connection, I like your interpretation of RCF as a short cut to incorporating known risks – one doesn’t need to know the individual risks, one just uses historical data to work out the overall consequences.

    Finally, as you point out, it is clear that RCF predictions are to be treated as probablistic. It therefore follows that one will be “wrong” in specific cases. You make an excellent point about using RCF to manage programme/portfolio risks. However, it will be even more challenging to obtain reliable historical data for a mix of projects in a programme or portfolio (as compared to a single project).

    Nice blog, BTW. I’ll definitely be visiting often.





    September 24, 2008 at 12:53 pm

  6. […] Base estimates on historical data for similar tasks. This is the basis of reference class forecasting which I have written about in an earlier post.  […]


  7. […] how the risks posed by optimism bias can be addressed using reference class forecasting (see my post on improving project forecasts for more on this).  However, as suggested in the introduction, one can go further. The first point […]


  8. I have some questions regarding how to use the method practically.

    How exactly do perform a RCF. I have of course read the article but Flyvbjerg does not go into detail on how you calculate the probability distribution of cost overrun when you have collected valid data. I know that collecting valid data is another question. What I am interested in is how you calculate this probability distribution.

    Since I am no mathematician can anyone tell me how or where to find out?



    November 13, 2009 at 11:33 pm

    • Once you have valid data, you can use a variety of statistical techniques to fit the data to a distribution. Generally one assumes a particular distribution and then finds the best-fit parameters for the given data. A good place to find a quick introduction to the technique is in the documentation for mathematical/statistical software (see this link from the Matlab documentation, for example). If one doesn’t know the distribution a priori, one could also run through the exercise for a whole bunch of distributions and pick the one that fits best.

      Hope this helps.





      November 14, 2009 at 10:26 am

  9. Thank you Kailash
    I will look into it.
    Best regards



    November 15, 2009 at 1:45 am

  10. Hi everybody. I now understand how to make the probability distribution. What I do not understand is how to calculate the uplift which is a vital part of making a RCF. Can anyone please help me on how to calculate it?
    Best regards



    November 18, 2009 at 8:16 am

  11. […] problem. First up, consider the technique of reference class forecasting which I’ve discussed in this post. Note that reference class forecasting technique is distinct from the reference class problem […]


  12. […] such as reference class forecasting have been proposed to improve estimation for projects where incremental approaches are not possible […]


  13. Bent Flyvberg starts by stating as fact a very doubtful hypothesis: “inaccuracies attributable to data errors (technical errors) should be normally distributed with average zero” and then using this to support the rest of his thesis. This is patently wrong in a large number of cases – just think about forecasting the time for you rjourney to work: it has a very long overrun tail but a short under-run.



    February 1, 2013 at 4:50 pm

    • Hi pineyk,

      Thanks for your comment.

      As I understand it, Flybjerg assumes that the errors in measuring the variable of interest (such as cost or completion time) are normally distributed, not the variable itself. This is a reasonable assumption if the errors in measurement are random.

      Whether errors are random or systematic in specific cases is another matter altogether: there are many situations in which they are not random – an obvious (and dare I say, not uncommon) example being when data points are selectively chosen to get an estimate that will be deemed acceptable.





      February 1, 2013 at 10:34 pm

  14. […] While making decisions (based on estimates) come naturally to us when we cross the road, the reason why we are comfortable making these probability assessments is because we have learned to execute them based on past experience. From an early childhood our parents have taken our hand and instilled in us the need to accumulate experience and knowledge we could utilize at a later age, without giving it much thought. Statisticians and Economists would call this process Reference Class Forecasting (and see Kailash Awati’s elaboration on this topic in Improving Project Forecasts). […]


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: