Eight to Late

Sensemaking and Analytics for Organizations

Archive for November 2011

On the limitations of business intelligence systems

with 7 comments

Introduction

One of the main uses of business intelligence  (BI) systems is to support decision making in organisations.  Indeed, the old term Decision Support Systems is more descriptive of such applications than the term BI systems (although the latter does have more pizzazz).   However, as Tim Van Gelder pointed out in an insightful post,  most BI tools available in the market do not offer a means to clarify the rationale behind decisions.   As he stated, “[what] business intelligence suites (and knowledge management systems) seem to lack is any way to make the thinking behind core decision processes more explicit.”

Van Gelder is absolutely right:  BI tools do not support the process of decision-making directly, all they do is present data or information on which a decision can be based.  But there is more:  BI systems are based on  the view that data should be the primary consideration when making decisions.   In this post I explore some of the (largely tacit) assumptions that flow from such a data-centric view. My discussion builds on some points made by Terry Winograd and Fernando Flores in their wonderful book, Understanding Computers and Cognition.

As we will see, the assumptions regarding the centrality of data are questionable, particularly when dealing with complex decisions. Moreover, since these assumptions are implicit in all BI systems, they highlight the limitations of using BI systems for making business decisions.

An example

To keep the discussion grounded, I’ll use a scenario to illustrate how assumptions of data-centrism can sneak into decision making. Consider a sales manager who creates sales action plans for representatives based on reports extracted from his organisation’s BI system. In doing this, he makes a number of tacit assumptions. They are:

  1. The sales action plans should be based on the data provided by the BI system.
  2. The data available in the system is relevant to the sales action plan.
  3. The information provided by the system is objectively correct.
  4. The  side-effects of basing decisions (primarily) on data are negligible.

The assumptions and why they are incorrect

Below I state some of the key assumptions of the data-centric paradigm of BI and discuss their limitations using the example of the previous section.

Decisions should be based on data alone:    BI systems promote the view that decisions can be made based on data alone.  The danger in such a view is that it overlooks social, emotional, intuitive and qualitative factors that can and should influence decisions.  For example, a sales representative may have qualitative information regarding sales prospects that cannot be inferred from the data. Such information should be factored into the sales action plan providing the representative can justify it or is willing to stand by it.

The available data is relevant to the decision being made: Another tacit assumption made by users of BI systems is that the information provided is relevant to the decisions they have to make. However, most BI systems are designed to answer specific, predetermined questions. In general these cannot cover all possible questions that managers may ask in the future.

More important is the fact that the data itself may be based on assumptions that are not known to users. For example, our sales manager may be tempted to incorporate market forecasts simply because they are available in the BI system.  However, if he chooses to use the forecasts, he will likely not take the trouble to check the assumptions behind the models that generated the forecasts.

The available data is objectively correct:  Users of BI systems tend to look upon them as a source of objective truth. One of the reasons for this is that quantitative data tends to be viewed as being more reliable than qualitative data.  However, consider the following:

  1. In many cases it is impossible to establish the veracity of quantitative data, let alone its accuracy. In extreme cases, data can be deliberately distorted or fabricated (over the last few years there have been some high profile cases of this that need no elaboration…).
  2. The imposition of arbitrary quantitative scales on qualitative data can lead to meaningless numerical measures. See my post on the limitations of scoring methods in risk analysis for a deeper discussion of this point.
  3. The information that a BI system holds is based the subjective choices (and biases) of its designers.

In short, the data in a BI system does not represent an objective truth. It is based on subjective choices of users and designers, and thus may not be an accurate reflection of the reality it allegedly represents. (Note added on 16 Feb 2013:  See my essay on data, information and truth in organisations for more on this point).

Side-effects of data-based decisions are negligible:  When basing decisions on data, side-effects are often ignored. Although this point is closely related to the first one, it is worth making separately.  For example, judging a sales representative’s performance on sales figures alone may motivate the representative to push sales at the cost of building sustainable relationships with customers.  Another example of such behaviour is observed in call centers where employees are measured by number of calls rather than call quality (which is much harder to measure). The former metric incentivizes employees to complete calls rather than resolve issues that are raised in them. See my post entitled, measuring the unmeasurable, for a more detailed discussion of this point.

Although I have used a scenario to highlight problems of the above assumptions, they are independent of the specifics of any particular decision or system. In short, they are inherent in BI systems that are based on data – which includes most systems in operation.

Programmable and non-programmable decisions

Of course, BI systems are perfectly adequate – even indispensable –  for certain situations. Examples of these include, financial reporting (when done right!) and other operational reporting (inventory, logistics etc).  These generally tend to be routine situations with clear cut decision criteria and well-defined processes. Simply put, they are the kinds of decisions that can be programmed.

On the other hand, many decisions cannot be programmed: they have to be made based on incomplete and/or ambiguous information that can be interpreted in a variety of ways. Examples include issues such as what an organization should do in response to increased competition or formulating a sales action plan in a rapidly changing business environment. These issues are wicked: among other things, there is a diversity of viewpoints on how they should be resolved. A business manager and a sales representative are likely to have different views on how sales action plans should be adjusted in response to a changing business environment. The shortcomings of BI systems become particularly obvious when dealing with such problems.

Some may argue that it is naïve to expect BI systems to be able to handle such problems. I agree entirely. However, it is easy to overlook over the limitations of these systems, particularly when called upon to make snap decisions on complex matters. Moreover, any critical reflection regarding what BI ought to be is drowned in a deluge of vendor propaganda and advertisements masquerading as independent advice in the pages of BI trade journals.

Conclusion

In this article I have argued that BI systems have some inherent limitations as decision support tools because they focus attention on data to the exclusion of other, equally important factors.  Although the data-centric paradigm promoted by these systems is adequate for routine matters, it falls short when applied to complex decision problems.

Written by K

November 24, 2011 at 6:20 am

Not on the same page, not even reading the same book

with 4 comments

In the course of a project it is not uncommon to have stakeholders with conflicting viewpoints on a particular issue. Some examples of this include:

  • The sponsor who wants a set of reports done in a day and the report writer who reckons it will take a week.
  • The project manager who believes that tasks can be tracked to a very fine level and the developer who “knows” they can’t.
  • The developer who is convinced that method A is the best way to go and her colleague who is equally certain that method B is the way to go.

These are but a small selection of the conflicts I have encountered in my work. Most project professionals would undoubtedly have had similar experiences. It can be difficult to reconcile such conflicting viewpoints because they are based on completely different worldviews. Unless these are made explicit, it is difficult to come to for those involved to understand each other let alone agree.

Consider, for example, the first case above: the sponsor’s worldview is likely based on his reality, perhaps a deadline imposed on him by his boss , whereas the report writer’s view is based on what she thinks  is a reasonable time to create the reports requested.

Metaphorically, the two parties are not on the same page.  Worse, they are not even reading the same book. The sponsor’s reality – his “book” – is based on an imposed deadline whereas the report writer’s is based on an estimate.

So, how does one get the two sides to understand each other’s point of view?

The metaphor gives us a clue – we have to first get them to understand that they are “reading from different books.”  Only then do they have a hope in hell of understanding each other’s storylines.

This isn’t easy because people tend to believe their views are reasonable (even when they aren’t!). The only way to resolve these differences are through dialogue or collective deliberation. As I have written in my post on rational dialogue in project environments:

Someone recently mentioned to me that the problem in project meetings (and indeed any conversation) is that participants  see their own positions  as being rational, even when they are not.  Consequently, they stick to their views, even when faced with evidence to the contrary. However, such folks aren’t being rational because they do not subject their positions and views to “trial by argumentation.”  Rationality lies in dialogue, not in individual statements or positions. A productive discussion is one in which conflicting claims are debated until they converge on an optimal decision.  The best (or most rational) position is one that emerges from such collective deliberation.

The point is a simple one: we have to get the two sides talking to each other, with each one accepting that their views may need to be revised in the light of the arguments presented by the other.  Dialogue Mapping, which I have discussed in many posts on this blog is a great way to facilitate such dialogue.

In our forthcoming book entitled, The Heretic’s Guide to Best Practices, Paul Culmsee and I describe Dialogue Mapping and a host of other techniques that can help organisations tackle problems associated with people who are “not on the same page” or “reading different books.”

The book is currently in the second round of proofs. We’ll soon be putting up a website with excerpts, review comments, pricing, release dates and much more – stay tuned!

Written by K

November 11, 2011 at 6:03 am

The drunkard’s dartboard revisited: yet another Excel-based example of Monte Carlo simulation

with 6 comments

(Note: An Excel sheet showing sample calculations and plots discussed in this post can be downloaded here.)

Introduction

Some months ago, I wrote a post explaining the basics of Monte Carlo simulation using the example of a drunkard throwing darts at a board. In that post I assumed that the darts could land anywhere on the dartboard with equal probability. In other words, the hit locations were assumed to be uniformly distributed. In a comment on the piece, George Gkotsis challenged this assumption, arguing that that regardless of the level of inebriation of the thrower, a dart would be more likely to land near the centre of the board than away from it (providing the player is at least moderately skilled). He also suggested using the Normal Distribution to model the spread of hits, with the variance of the distribution serving as a rough measure of the inaccuracy (or drunkenness!) of the drunkard. In George’s words:

I would propose to introduce a ‘skill’ factor, which represents the circle/square ratio (maybe a normal-Gaussian distribution). Of course, this skill factor would be very low (high variance) for a drunken player, but would still take into account the fact that throwing darts into a square is not purely random.

In this post I revisit the drunkard’s dartboard, taking into account George’s suggestions.

Setting the stage

To keep things simple, I’ll make the following assumptions:

Figure 1: The dartboard

  1. The dartboard is a circle of radius 0.5 units centred at the origin (see Figure 1)
  2. The chance of a hit is greatest at the centre of the dartboard and falls off as one moves away from it.
  3. The distribution of hits is a function of distance from the centre but does not depend on direction. In mathematical terms, for a given distance r from the centre of the dartboard, the dart can land at any angle \theta with equal probability, \theta being the angle between the line joining the centre of the board to the dart and the x axis. See Figure 2 for graphical representations of a hit location in terms of r and \theta. Note that that the x and y coordinates can be obtained using the formulas x = r\cos\theta and y= r\sin\theta as s shown in Figure 2.
  4. Hits are distributed according to the Normal distribution with maximum at the centre of the dartboard.
  5. The variance of the Normal distribution is a measure of inaccuracy/drunkenness of the drunkard: the more drunk the drunk, the greater the variation in his aim.

Figure 2: The coordinates of a hit location

These assumptions are consistent with George’s suggestions.

The simulation

[Note to the reader: you may want to download the demo before continuing.]

The steps of a simulation run are as follows:

  1. Generate a number that is normally distributed with a zero mean and a specified standard deviation. This gives the distance, r, of a randomly thrown dart from the centre of the board for a player with a “inaccuracy factor” represented by the standard deviation. Column A in the demo contains normally distributed random numbers with zero mean and a standard deviation of 0.2 . Note that I selected the latter number for no other reason than the results show up clearly on a fixed-axis plot shown in Figure 2.
  2. Generate a uniformly distributed random number lying between 0 and 2\pi. This represents the angle \theta. This is the content of column B of the demo.
  3. The numbers obtained from steps 1 and 2 for completely specify the location of a hit. The location’s x and y coordinates can be worked out using the formulas x = r\cos\theta and y= r\sin\theta. These are listed in columns C and D in the Excel demo.
  4. Re-run steps 1 through 4 as many times as needed. Note that the demo is set up for 5000 runs. You can change this manually or, better yet, automate it. The latter is left as an exercise for you.

It is instructive to visualize the resulting hits using a scatter plot. Among other things this can tell you, at a glance, if the results make sense. For example, we would expect hits to be symmetrically distributed about the origin because the drunkard’s throws are not biased in any particular direction around the centre). A non-symmetrical distribution is thus an indication that there is an error in the calculations.

Now, any finite collection of hits is unlikely to be perfectly symmetrical because of outliers. Nevertheless, the distributions should be symmetrical on average. To test this, run the demo a few times (hit F9 with the demo open). Notice how the position of outliers and the overall shape of the distribution of points changes randomly from simulation to simulation. In all cases, however, there is a clear maximum at the centre of the dartboard with the probability of a hit falling with distance from the centre.

Figure 3: Scatter plot for standard deviation=0.2

Figure 3 shows the results of simulations for a standard deviation of 0.2. Figures 4 and 5 show the results of simulations for standard deviations of 0.1 and 0.4.

Figure 4: Scatter plot for standard deviation=0.1

Note that the plot has fixed axes- i.e. the area depicted is the 1×1 square that encloses the dartboard, regardless of the standard deviation. Consequently, for larger standard deviations (such as 0.4) many hits will be out of range and will not show up on the plot.

Figure 5: Scatter plot for standard deviation=0.4

Closing remarks

As I have stressed in my previous posts on Monte Carlo simulation, the usefulness of a simulation depends on the choice of an appropriate distribution. If the selected distribution does not reflect reality, neither will the simulation. This is true regardless of whether one is simulating a drunkard’s wayward aim or the duration of project task. You may have noted that the assumption of normally-distributed hits has no justification whatsoever; it is just as arbitrary as my original assumption of uniformity. In fact, the hit locations of drunken dart throws is highly unlikely to be either uniform or Normal. Nevertheless, I hope that some of my readers will find the above example to be of pedagogical value.

Acknowledgement

Thanks to George Gkotsis for his comment which got me thinking about this post.

Written by K

November 3, 2011 at 4:59 am