Eight to Late

Sensemaking and Analytics for Organizations

Archive for the ‘Power Laws’ Category

On the origin of power laws in organizational phenomena

with 4 comments

Introduction

Uncertainty is a fact of organizational life –   managers often have to make decisions based on uncertain or incomplete information. Typically such decisions are based on a mix of intuition, experience and blind guesswork or “gut feel”.  In recent years, probabilistic (or statistical) techniques have entered mainstream organizational practice. These have enabled managers to base their decisions and consequent actions on something more than mere subjective judgement – or so the theory goes.

Much of the statistical analysis in organisational theory and research is based  on the assumption that the variables of interest have a Normal (aka Gaussian) distribution. That is, the probability of a variable taking on a particular value can be reckoned from the familiar  bell-shaped curve.  In a paper entitled Beyond Gaussian averages: redirecting organizational science towards extreme events and power laws, Bill McKelvey and Pierpaolo Andriani, suggest that many (if not most) organizational variables aren’t normally distributed, but are better described by power law or   fat-tailed (aka long-tailed or heavy-tailed) distributions. If correct, this has major consequences for quantitative analysis in many areas of organizational theory and practice. To quote from their paper:

Quantitative management researchers tend to presume Gaussian (normal) distributions with matching statistics – for evidence, study any random sample of their current research. Suppose this premise is mostly wrong. It follows that (1) publication decisions based on Gaussian statistics could be mistaken, and (2) advice to managers could be misguided.

Managers generally assume that their actions will not have extreme outcomes. However, if organisational phenomena exhibit power law behaviour, it is possible that seemingly minor actions could have disproportionate results. It is therefore important to understand how such  extreme outcomes  can come about. This post, based on the aforementioned paper and some of the references therein discusses a couple of general mechanisms via which power laws can  arise in organizational phenomena.

I’ll begin by outlining the main differences between normal and power law distributions, and then present a few social phenomena that display power law behaviour. Following that, I get to my main point – a discussion of general mechanisms that underlie power-law type behaviour in organisational phenomena. I conclude by outlining the implication of power-law phenomena for managerial actions and their (intended) outcomes.

Power laws vs. the Normal distribution

Probabilistic variables that are described by the normal distributions tend to take on values that cluster around the average, with the probability dropping off to zero rapidly on either side of the average. In contrast, for long –tailed distributions, there is a small but significant probability that the variable will take on a value that is very far from the average (what is sometimes called a black swan event).  Long-tailed distributions are often  described by power laws.  In such cases, the probability of variable taking a value x is described by a function like x^{-\alpha}  where \alpha is called the power law exponent .  A well-known power law distribution in business and marketing theory is the Pareto distribution.  An important characteristic of power law distributions  is that they have infinite variances and unstable means, implying that outliers cannot be ignored and that averages are meaningless.

Power laws in social phenomena

In their paper Mckelvey and Andriani mention a number  of examples of power laws in natural and social phenomena.  Examples of the latter include:

  1. The sizes of US firms : the probability that a firm is greater than size N (where N is the number of employees), is inversely proportional to N .
  2. The number of criminal acts committed by individuals: the frequency of conviction is a power law function of the ranked number of convictions.
  3. Information access on the Web: The access rate of new content on the web decays with time according to a power law.
  4. Frequency of family names: Frequency of family names has a power law dependence on family size (number of people with the same family name).

Given the ubiquity of power laws in social phenomena, Mckelvey and Adriani suggest that they may be common in organizational phenomena as well.  If this is so, managerial decisions based on the assumption of normality could be wildly incorrect. In effect, such an assumption treats extreme events as aberrations and ignores them. But extreme events have extreme business implications and hence must be factored in to any sensible analysis.

If power laws are indeed as common as claimed, there must be some common underlying mechanism(s) that give rise to them.  We look at a couple of these in the following sections.

Positive feedback

In a classic paper entitled, The Second Cybernetics: Deviation-Amplifying Mutual Causal Processes, published in 1963, Magoroh Maruyama pointed out that small causes can have disproportionate effects if they are amplified through positive feedback.   Audio feedback is a well known example of this process.  What is, perhaps, less well appreciated is that mutually dependent deviation-amplifying processes can cause qualitative changes in the phenomenon of interest. A classic example is the phenomenon of a run on a bank : as people withdraw money in bulk, the likelihood of bank insolvency increases thus causing more people to make withdrawals. The qualitative change at the end of this positive feedback cycle is, of course, the bank going bust.

Maruyama also draws attention to the fact that the law of causality – that similar causes lead to similar effects – needs to be revised in light of positive feedback effects. To quote from his paper:

A sacred law of causality in the classical philosophy stated that similar conditions produce similar effects. Consequently, dissimilar results were attributed to dissimilar conditions. Many scientific researches were dictated by this philosophy. For example, when a scientist tried to find out why two persons under study were different, he looked for a difference in their environment or in their heredity. It did not occur to him that neither environment nor heredity may be responsible for the difference – He overlooked the possibility that some deviation-amplifying interactional process in their personality and in their environment may have produced the difference.

In the light of the deviation-amplifying mutual causal process, the law of causality is now revised to state that similar conditions may result in dissimilar products. It is important to note that this revision is made without the introduction of indeterminism and probabilism. Deviation-amplifying mutual causal processes are possible even within the deterministic universe, and make the revision of the law of causality even within the determinism. Furthermore, when the deviation-amplifying mutual causal process is combined with indeterminism, here again a revision of a basic law becomes necessary. The revision states:

A small initial deviation, which is within the range of high probability, may develop into a deviation of very low probability or more precisely, into a deviation which is very improbable within the framework of probabilistic unidirectional causality.

The effect of positive feedback can be further amplified if the variable of interest is made up of several interdependent (rather than independent) effects. We’ll look at what this means next.

Interdependence, not independence

Typically we invoke probabilities when we are uncertain about outcomes. As an example from project management, the uncertainty in the duration of a project task can be modeled using a probability distribution.  In this case the probability distribution is a characterization of our uncertainty regarding how long it is going to take to complete the task. Now, the accuracy of one’s predictions depends on whether the probability distribution is a good representation of (the yet to materialize) reality.  Where does the distribution come from? Generally one fits the data to an assumed distribution.  This is an important point: the fit is an assumption – one can fit historical data to any reasonable distribution, but one can never be sure that it is the right one. To get the form of the distribution from first principles one has to understand the mechanism behind the quantity  of interest. To do that one has to first figure out what the quantity depends on .  It is hard to do this for  organisational phenomena  because they depend on several factors.

I’ll explain using an example: what does a  project task duration depend on?  There are several possibilities – developer productivity, technology used, working environment or even the quality of the coffee!  Quite possibly it depends on  all of the above and many more factors. Further still, the variables that affect task duration can depend on each other – i.e. they can be correlated.  An example of correlation is the link between productivity and working environment. Such dependencies are  a key difference between Normal and power law distributions. To quote from the paper:

The difference lies in assumptions about the correlations among events. In a Gaussian distribution the data points are assumed to be independent and additive. Independent events generate normal distributions, which sit at the heart of modern statistics. When causal elements are independent-multiplicative they produce a lognormal distribution (see this paper for several examples drawn from science), which turns into a Pareto distribution as the causal complexity increases. When events are interdependent, normality in distributions is not the norm. Instead Paretian distributions dominate because positive feedback processes leading to extreme events occur more frequently than ‘normal’, bell-shaped Gaussian-based statistics lead us to expect. Further, as tension imposed on the data points increases to the limit, they can shift from independent to interdependent.

So, variables that are made up of many independent causes will be normally distributed whereas those that are made up of many interdependent (or correlated) variables will have a power law distribution, particularly if the variables display a positive feedback effect.  See my posts entitled,  Monte Carlo simulation of multiple project tasks and the effect of task duration correlations on project schedules for illustrations of the effects of interdependence and correlations on variables.

Wrapping up

We’ve looked at a couple of general mechanisms which can give rise to power laws in organisations.  In particular, we’ve seen that power laws may lurk in phenomena that are subject to positive feedback and correlation effects. It is important to note that these effects are quite general, so they can apply to diverse organizational phenomena.  For such phenomena, any analysis based on the assumption of Normal statistics will be flawed.

Most management theories assume  simple cause-effect relationships between managerial actions and macro-level outcomes.  This assumption is flawed because  positive feedback effects can cause  qualitative changes in the phenomena studied. Moreover,  it is often difficult to know with certainty all the factors that affect a macro-level quantity becasues  such quantities are typically composed of  several interdependent factors.  In view of this it’s no surprise that managerial actions sometimes lead to unexpected  extreme consequences.

Interdependence, not independence

Typically we invoke probabilities when we are uncertain about outcomes. As an example from project management, the uncertainty in the duration of a project task can be modeled using a probability distribution.  In this case the probability distribution is a characterization of our uncertainty regarding how long it is going to take to complete the task. Now, the accuracy of one’s predictions depends on whether the probability distribution is a good representation of (the yet to materialize) reality.  But where does the distribution itself come from? Generally one fits the data to an assumed distribution.  This is an important point: the fit is an assumption – one can fit historical data to any reasonable distribution, but one can never be sure that it is the right one. To get the form of the distribution from first principles one has to understand the mechanism behind the quantity  of interest. To do that one has to first figure out what the quantity depends on .  It is hard to do this for  organisational phenomena,  which generally cannot be studied in controlled conditions.

To take a concrete example: what does a  project task duration depend on?  Developer competence? Technology used? Autonomy? Quality of the coffee??  Quite possibly it depends on all of the above. But even further, the variables that make up the quantity of interest can depend on each other – i.e. the can be correlated. This is a key difference between Normal and power law distributions. To quote from the paper:

The difference lies in assumptions about the correlations among events. In a Gaussian distribution the data points are assumed to be independent and additive. Independent events generate normal distributions, which sit at the heart of modern statistics. When causal elements are independent-multiplicative they produce a lognormal distribution (see this paper for examples drawn from science), which turns into a Pareto distribution as the causal complexity increases. When events are interdependent, normality in distributions is not the norm. Instead Paretian distributions dominate because positive feedback processes leading to extreme events occur more frequently than ‘normal’, bell-shaped Gaussian-based statistics lead us to expect. Further, as tension imposed on the data points increases to the limit, they can shift from independent to interdependent.

So, variables that are made up of many independent causes will be normally distributed whereas those that are made up of many interdependent (or correlated) variables will have a power law distribution, particularly if the variables display a positive feedback effect.  See my posts entitled,  Monte Carlo simulation of multiple project tasks and the effect of task duration correlations on project schedules for illustrations of the effects of interdependence and correlations on variables.

Scientific management theories assume a simple cause-effect relationship between managerial actions and macro-level outcomes.  In reality however , it is difficult to know with certainty all the factors that affect a macro-level quantity; it is typically influenced by several interdependent factors.  In view of this it’s no surprise that simplistic prescriptions hawked by management gurus and bestsellers seldom help in fixing organisational problems.

Written by K

July 28, 2010 at 11:43 pm

%d bloggers like this: