Eight to Late

Sensemaking and Analytics for Organizations

On the limitations of scoring methods for risk analysis

with 12 comments

Introduction

A couple of months ago I wrote an article highlighting some of the pitfalls of using risk matrices. Risk matrices are an example of scoring methods , techniques which use ordinal scales to assess risks. In these methods,  risks are ranked by some predefined criteria such as impact or expected loss, and the ranking  is then used as the basis for  decisions on how the risks should be addressed. Scoring methods are popular because they are easy to use. However,  as Douglas Hubbard points out in his critique of current risk management practices, many commonly used scoring techniques are flawed. This post – based on Hubbard’s critique and research papers quoted therein –  is a brief look at some of the flaws of risk scoring techniques.

Commonly used risk scoring techniques and problems associated with them

Scoring techniques fall under two major categories:

  1. Weighted scores: These use several ordered scales which are weighted according to perceived importance. For example: one might be asked to rate financial risk, technical risk and organisational risk on a scale of 1 to 5 for each, and then weight then by factors of 0.6, 0.3 and 0.1 respectively (possibly because the CFO – who happens to be the project sponsor – is more concerned about financial risk than any other risks ). The point is, the scores and weights assigned can be highly subjective – more on that below.
  2. Risk matrices: These rank risks along two dimensions – probability and impact – and assign them a qualitative ranking of high, medium or low depending on where they fall.  Cox’s theorem shows such categorisations are internally inconsistent because the category boundaries are arbitrarily chosen.

Hubbard makes the point that, although both the above methods are endorsed by many standards and methodologies (including those used in project management), they should be used with caution because they are flawed. To quote from his book:

Together these ordinal/scoring methods are the benchmark for the analysis of risks and/or decisions in at least some component of most large organizations. Thousands of people have been certified in methods based in part on computing risk scores like this. The major management consulting firms have influenced virtually all of these standards. Since what these standards all have in common is the used of various scoring schemes instead of actual quantitative risk analysis methods, I will call them collectively the “scoring methods.” And all of them, without exception, are borderline or worthless. In practices, they may make many decisions far worse than they would have been using merely unaided judgements.

What is the basis for this claim? Hubbard points to the following:

  1. Scoring methods do not make any allowance for flawed perceptions of analysts who assign scores – i.e. they do not consider the effect of cognitive bias. I won’t dwell on this as I have  previously written  about the effect of cognitive biases in project risk management -see this post and this one, for example.
  2. Qualitative descriptions assigned to each score are understood differently by different people. Further, there is rarely any objective guidance as to how an analyst is to distinguish between a high or medium risk. Such advice may not even help: research by Budescu, Broomell and Po shows that there can be huge variances in understanding of qualitative descriptions, even when people are given specific guidelines what the descriptions or terms mean.
  3. Scoring methods add their own errors.  Below are brief descriptions of some of these:
    1. In his paper on the risk matrix theorem, Cox mentions that “Typical risk matrices can correctly and unambiguously compare only a small fraction (e.g., less than 10%) of randomly selected pairs of hazards. They can assign identical ratings to quantitatively very different risks.” He calls this behaviour “range compression” – and it applies to any scoring technique that uses ranges.
    2. Assigned scores tend to cluster around the mid-low high range. Analysis by Hubbard shows that, on a 5 point scale, 75% of all responses are 3 or 4. This implies that changing a score from 3 to 4 or vice-versa can have a disproportionate effect on classification of risks.
    3. Scores implicitly assume that the magnitude of the quantity being assumed is directly proportional to the scale. For example, a score of 2 implies that the criterion being measured is twice as large as it would be for a score of 1. However, in reality, criteria are rarely linear as implied by such a scale.
    4. Scoring techniques often presume that the factors being scored are independent of each other – i.e. there are no correlations between factors. This assumption  is rarely tested or justified in any way.

Many project management standards advocate the use of scoring techniques.  To be fair, in many situations they are adequate as long as they are used with an understanding of their limitations. Seen in this light, Hubbard’s book is  an admonition to standards and textbook writers to be more critical of the methods they advocate, and a warning to practitioners that an uncritical adherence to standards and best practices is not the best way to manage project risks .

Scoring done right

Just to be clear, Hubbard’s criticism is directed against  scoring methods that use arbitrary, qualitative scales which are not justified by independent analysis. There are other techniques which, though superficially similar to these flawed scoring methods, are actually quite robust because they are:

  1. Based on observations.
  2. Use real measures (as opposed to arbitrary ones – such as “alignment with business objectives” on a scale of 1 to 5, without defining what “alignment” means.)
  3. Validated after the fact (and hence refined with use).

As an example  of a sound scoring technique, Hubbard quotes this paper by Dawes, which presents evidence that linear scoring models are superior to intuition in clinical judgements. Strangely, although the weights themselves can be obtained through intuition, the scoring model outperforms clinical intuition. This happens because human intuition is good at identifying important factors, but not so hot at evaluating the net effect of several, possibly competing factors. Hence simple linear scoring models can outperform intuition. The key here is that the models are validated by checking the predictions against reality.

Another class of techniques use axioms based on logic to reduce inconsistencies in decisions. An example of such a technique is multi-attribute utility theory. Since they are based on logic, these methods can also be considered to have a solid foundation unlike those discussed in the previous section.

Conclusions

Many commonly used scoring methods in risk analysis are based on flaky theoretical foundations – or worse, none at all. To compound the problem, they are often used without any validation.  A particularly ubiquitous example is the well-known and loved risk matrix.  In his paper on risk matrices,  Tony Cox  shows how risk matrices can sometimes lead to decisions that are worse than those made on the basis of a coin toss.   The fact that this is a possibility – even if only a  small one – should worry anyone who uses risk matrices  (or other flawed scoring techniques) without an understanding of their limitations.

Written by K

October 6, 2009 at 8:27 pm

12 Responses

Subscribe to comments with RSS.

  1. Thanks for the post about my book. I woud add one clarification, though. I wouldn’t necessarilly say I focus my criticism on “ad hoc” scoring methods. Many of the scoring methods I discuss claim a high degree of “formality” and “structure”. And they are right. They are every bit as formal and structured as, say, astrology. Of course, a formal structure is not the same as having measurable effectiveness. My criticism is directed at methods that have no theoretical or empirical basis or actually improving decisions, regardless of how formal or structured they appear to be.

    Thanks,
    Doug Hubbard

    I have some articles coming out later this year that included additional research not available when I wrote the book. It shows how even completely ineffectual methods can still have a strong placebo effect and users will be convinced they method is “working”.

    I’m also updating my first book, How to Measure Anything, with some of the newer research I incorporated into The Failure of Risk Management.

    Like

    Douglas W. Hubbard

    October 6, 2009 at 10:23 pm

  2. Doug,

    Thanks for the clarification. I’ve used the term ad-hoc in the sense that the use of these techniques is often “justified” using arbitrary hypotheses that cannot be refuted easily. However, I take your point that my use of the term could be misleading, so I’ve updated the post to fix this.

    I’ve enjoyed reading your book – a big thank you for writing it. I intend to do a longer write up on the book in a future post.

    I look forward to your other articles on risk management. Will you be making them available on your site?

    Thanks again for your comments.

    Regards,

    Kailash.

    Like

    K

    October 7, 2009 at 4:05 am

  3. Kailash,

    Very interesting article. If I understood you correctly, you support using the familiar risk metrics, just emphasize the need of using some kind of ‘scale table’ which will defined what is the impact level?

    Also, do you have any recommendations when scaling the mitigation plans in order to get the weighted score?

    Gilad

    http://giladlsh.wordpress.com/about/

    Like

    giladlsh

    October 14, 2009 at 5:42 pm

  4. Gilad,

    Thanks for your comments.

    The problem with many of these methods is that they have no sound theoretical basis. As an example, Cox’s theorem shows that risk matrices, as they are commonly used are inconsistent – i.e. they can result in an incorrect ranking of risks. Any ad-hoc add-on methods such as a scale tables cannot fix such problems.

    Secondly, a host of scoring techniques (weighted scores etc.) use subjective methods to assign scores – i.e. the score assigned is whatever the analyst thinks it should be (without any independent justification). Such assignments are subject to cognitive bias, and hence cannot be trusted unless they are calibrated.

    The third problem Hubbard highlights is the lack of empirical observation or validation: scores are often assigned arbitrarily, and the performance of techniques rarely checked after the project is done and dusted.

    I’m not sure what you mean by a scale table, but any technique that addresses the above issues would be an improvement.

    Regards,

    Kailash.

    Like

    K

    October 15, 2009 at 6:09 am

  5. […] my posts limitations of scoring methods in risk analysis and cognitive biases as project meta-risks for more on the above […]

    Like

  6. […] why a very popular scoring method (risk matrices) is “worse than useless.”  See my posts on the  limitations of scoring techniques and Cox’s risk matrix theorem for detailed discussions of these […]

    Like

  7. K,
    Could you give the full citation for your ref to Budescu, Broomell and Po, please? I can’t track the paper from the link you’ve provided.

    Like

    David

    April 28, 2011 at 2:31 pm

    • David,

      Thanks for pointing out the broken link. The abstract and full citation can be found here.

      Regards,

      K.

      Like

      K

      April 28, 2011 at 4:56 pm

  8. […] ad-hoc techniques abound in risk analysis:  see my posts on Cox’s risk matrix theorem and limitations of risk scoring methods for more on these.  Risk metrics based on such techniques can be misleading.  As Glen Alleman […]

    Like

  9. […] scales on qualitative data can lead to meaningless numerical measures. See my post on the limitations of scoring methods in risk analysis for an example of this […]

    Like

  10. […] because they use quantitative measures to rate decision options However, as I have pointed out in  this post, measures are often misleading. There are those who claim that this can be fixed by “doing it […]

    Like


Leave a reply to K Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.