Eight to Late

Sensemaking and Analytics for Organizations

The Heretic’s Guide to Management – understanding ambiguity in the corporate world

with 2 comments

I am delighted to announce that my new business book, The Heretic’s Guide to Management: The Art of Harnessing Ambiguity is now available in e-book format (The print edition is a couple of weeks away). The book, co-written with Paul Culmsee, is a loose sequel to our previous tome, The Heretics Guide to Best Practices.

Many reviewers liked the writing style of our first book, which combined rigour with humour. This book continues in the same vein, so if you enjoyed the first one we hope you might like this one too. The new book is half the size of the first one and I considerably less idealistic too. In terms of subject matter, I could say “Ambiguity, Teddy Bears and Fetishes” and leave it at that…but that might leave you thinking that it’s not the kind of book you would want anyone to see on your desk!

Rest assured, The Heretic’s Guide to Management is not a corporate version of Fifty Shades of Grey. Instead, it aims to delve into the complex but fascinating ways in which ambiguity affects human behaviour. More importantly, it discusses how ambiguity can be harnessed in ways that achieve positive outcomes.  Most management techniques (ranging from strategic planning to operational budgeting) attempt to reduce ambiguity and thereby provide clarity. It is a profound irony of modern corporate life that they often end up doing the opposite: increasing ambiguity rather than reducing it.

On the surface, it is easy enough to understand why: organizations are complex entities so it is unreasonable to expect management models, such as those that fit neatly into a 2*2 matrix or a predetermined checklist, to work in the real world. In fact, expecting them to work as advertised is like colouring a paint-by-numbers Mona Lisa, expecting to recreate Da Vinci’s masterpiece. Ambiguity therefore invariably remains untamed, and reality reimposes itself no matter how alluring the model is.

It turns out that most of us have a deep aversion to situations that involve even a hint of ambiguity. Recent research in neuroscience has revealed the reason for this: ambiguity is processed in the parts of the brain which regulate our emotional responses. As a result, many people associate it with feelings of anxiety. When kids feel anxious, they turn to transitional objects such as teddy bears or security blankets. These objects provide them with a sense of stability when situations or events seem overwhelming. In this book, we show that as grown-ups we don’t stop using teddy bears – it is just that the teddies we use take a different, more corporate, form. Drawing on research, we discuss how management models, fads and frameworks are actually akin to teddy bears. They provide the same sense of comfort and certainty to corporate managers and minions as real teddies do to distressed kids.

A plain old Teddy

A Plain Teddy

Most children usually outgrow their need for teddies as they mature and learn to cope with their childhood fears. However, if development is disrupted or arrested in some way, the transitional object can become a fetish – an object that is held on to with a pathological intensity, simply for the comfort that it offers in the face of ambiguity. The corporate reliance on simplistic solutions for the complex challenges faced is akin to little Johnny believing that everything will be OK provided he clings on to Teddy.

When this happens, the trick is finding ways to help Johnny overcome his fear of ambiguity.

Ambiguity is a primal force that drives much of our behaviour. It is typically viewed negatively, something to be avoided or to be controlled.

A Sith Teddy

A Sith Teddy

The truth, however, is that ambiguity is a force that can be used in positive ways too. The Force that gave the Dark Side their power in the Star Wars movies was harnessed by the Jedi in positive ways.

A Jedi Teddy

A Jedi Teddy

Our book shows you how ambiguity, so common in the corporate world, can be harnessed to achieve the results you want.

The e-book is available via popular online outlets. Here are links to some:

Amazon Kindle

Google Play

Kobo

For those who prefer paperbacks, the print version will be out in a few weeks – stay tuned!

Thanks for your support:)

Written by K

July 12, 2016 at 10:30 pm

The story before the story – a data science fable

with 4 comments

It is well-known that data-driven stories are a great way to convey results of data science initiatives. What is perhaps not as well-known is that data science projects often have to begin with stories too. Without this “story before the story” there will be no project, no results and no data-driven stories to tell….

 

For those who prefer to read, here’s a transcript of the video in full:

In the beginning there is no data, let alone results…but there are ideas. So, long before we tell stories about data or results, we have to tell stories about our ideas. The aim of these stories is to get people to care about our ideas as much as we do and, more important, invest in them. Without their interest or investment there will be no results and no further stories to tell.

So one of the first things one has to do is craft a story about the idea…or the story before the story.

Once upon a time there was a CRM system. The system captured every customer interaction that occurred, whether it was by phone, email or face to face conversation. Many quantitative details of interactions were recorded, time, duration, type. And if the interaction led to a sale, the details of the sale were recorded too.

Almost as an aside, the system also gave sales people the opportunity to record their qualitative impressions as free text notes. As you might imagine, this information, though potentially valuable, was never analysed. Sure managers looked at notes in isolation from time to time when referring.to specific customer interactions, but there was no systematic analysis of the corpus as a whole. Nobody had thought it worthwhile to do this, possibly because it is difficult if not quite impossible to analyse unstructured information in the world of relational databases and SQL.

One day, an analyst was browsing data randomly in the system, as good analysts sometimes do. He came across a note that to him seemed like the epitome of a good note…it described what the interaction was about, the customer’s reactions and potential next steps all in a logical fashion.

This gave him an idea. Wouldn’t it be cool, he thought, if we could measure the quality of notes? Not only would this tell us something about the customer and the interaction, it may tell us something about the sales person as well.

The analyst was mega excited…but he realised he’d need help. He was an IT guy and as we all know, business folks in big corporations stopped listening to their IT guys long ago. So our IT guy had his work cut out for him.

After much cogitation, he decided to enlist the help of his friend, a strategic business analyst in the marketing department. This lady, who worked in marketing had the trust of the head of marketing. If she liked the idea, she might be able to help sell it to the head of marketing.

As it turned out, the business analyst loved the idea…more important, since she knew what the sales people do on a day to day basis, she could give the IT guy more ideas on how he could build quantitative measures of the quality of notes. For example, she suggested looking for  emotion-laden words or mentions of competitor’s products and so on. The IT guy now had some concrete things to work on.  The initial results gave them even more ideas, and soon they had more than enough to make a convincing pitch to the head of marketing.

It would take us too far afield to discuss details of the pitch, but what we will say is this: they avoided technical details, instead focusing on the strategic and innovative aspects of the work.

The marketing head liked the idea…what was there not to like? He agreed to support the effort, and the idea became a project….

…and yes, within months the project resulted in new insights into customer behaviour. But that is another story.

Written by K

June 15, 2016 at 10:00 pm

The hidden costs of IT outsourcing

with one comment

Many outsourcing arrangements fail because customers do not factor in hidden costs. In 2009, I wrote a post on these hard-to-quantify transaction costs. The following short video (4 mins 45 secs) summarises the main points of that post in a (hopefully!) easy-to-understand way:

Note: Here’s the full script, for those who prefer to read instead of watching…

One of the questions that organisations grapple with is whether or not to outsource IT work to external vendors. The work of Oliver Williamson  a Nobel Laureate in Economics – provides some insight into this issue.  This video is a brief look at how Williamson’s work on transaction cost economics can be applied to the question of outsourcing IT development or implementation.

A firm has two choices for any economic activity: it can either perform the activity in-house or go to market. In either case, the cost of the activity can be decomposed into production costs, which are direct and indirect costs of producing the good or service, and transaction costs, which are costs associated with making the economic exchange (more on this in a minute).

In the case of in-house IT work production costs include salaries, equipment costs etc whereas transaction costs include costs relating to building an IT team (with the right skills, attitude and knowledge).

In the case of outsourced IT work, production costs are similar to those in the in-house case – except that they are now incurred by the vendor and passed on to the client.  The point is, these costs are generally known upfront.

The transaction costs, however, are significantly different. They include things such as:

  1. Search costs: cost of searching for a suitable vendor
  2. Bargaining costs: effort incurred in agreeing on an acceptable price.
  3. Enforcement costs: costs of ensuring compliance with the contract
  4. Costs of coordinating work : this includes costs of managing the vendor.
  5. Cost of uncertainty: cost associated with unforeseen changes (scope change is a common example)

Now, there are a couple of things to note about transaction costs for outsourcing arrangements:

Firstly, they are typically the client’s problem, not the vendors. Secondly, they can be very hard to figure out upfront. They are the therefore the hidden costs of outsourcing.

According to Williamson, the decision as to whether or not an economic activity should be outsourced depends critically on these hidden transaction costs. In his words, “The most efficient institutional arrangement for carrying out a particular economic activity would be the one that minimized transaction costs.”

The most efficient institutional arrangement for IT development work is often the market, but in-house arrangements are sometimes better.

The potentially million dollar question is: when are in-house arrangements better?

Williamson’s work provides an answer to this question. He argues that the cost of completing an economic transaction in an open market depends on two factors

  1. Complexity of the transaction – for example, implementing an ERP system is more complex than implementing a new email system.
  2. Asset specificity – this refers to the degree of customization of the service or product. Highly customized services or products are worth more to the two parties than to anyone else. For example, custom IT services, tailored to the requirements of a specific company have more value client and provider than to anyone else.

In essence, the transaction costs increase with complexity and degree of customization. From this we can conclude that in-house arrangements may be better for work that is complex or highly customized.  The reason for this is simple: it is difficult to specify such systems in detail upfront. Consequently, contracts for such work tend to be complex…and worse, they invariably leave out important details.

Such contracts will work only if interpreted in a farsighted manner, with disputes being settled directly between the vendor and client instead of resorting to litigation.  When this becomes too hard to do, it makes sense to carry out the activity in-house. Note that this does not mean that it has to be done by internal staff…one can still hire contractors, but it is important ensure that they remain under internal supervision.

If one chooses to outsource such work it is important to ensure that the contract is as unambiguous and transparent as possible.  Moreover, both the client and the vendor should expect omissions in contracts, and be flexible whenever there are disagreements over the interpretation of contract terms. In this end, this is possible only if there is a trust-based relationship between the client and vendor…and trust, of course, is impossible to contractualise.

To sum up: be wary of outsourcing work that is complex or highly customized…and if you must, be sure to go with a vendor you trust.

Written by K

May 3, 2016 at 4:59 pm

What is sensemaking?

with 10 comments

I’ve recently set up a consulting practice specializing in sensemaking and analytics. Most people understand the analytics bit, but many have questions about sensemaking. I got that question so many times that I decided to do a short (2.5 minute) whiteboard video explaining what the term means to me (and my definition is not the same as Wikipedia’s).

Here it is:

 

For those who prefer the written word, here’s the script (minus the advertising):

“Most organizations are very good at solving problems. This is no surprise: much of training, right from school to university, focuses on teaching us the skills required to solve problems. Now regardless of the specific technique used, the problem-solving process is essentially a logical or analytical one. It goes something like this:

  • Gather information about the problem.
  • Analyse the information.
  • Formulate candidate solutions.
  • Implement the solution of choice.

This so-called GAFI method works by breaking problems down into manageable parts, solving each of the parts separately and then assembling these into a solution. The method works very well for most scientific and engineering problems – even one  as complicated as sending a spacecraft to Saturn. Indeed, it is so successful that it underpins all of science and modern technology.

However, there is a serious gap in the GAFI method – it assumes that problems are given, it does not tell us how to formulate problems. And as the management luminary, Russell Ackoff once said:

Outside of school, problems are seldom given; they have to be taken, extracted from complex situations…”

The art of taking problems is what sensemaking is all about.

Unlike analytical thinking, which is purely logical, sensemaking involves such as collaboration, imagination and a healthy tolerance for ambiguity. It is an art that is absolutely essential for surviving…no, thriving, in the increasingly complex world of the 21st century.

The two modes of thinking – sensemaking and analytical – are as different as chalk and cheese but both are necessary for a successful outcome. We like to think of them as lying at opposite ends of a spectrum of thinking styles. When approaching a new situation or problem, one should always begin at the sensemaking end and move towards the analytical end as one understands the problem better. Unfortunately time pressures in corporate environments often force managers and employees into analytical mode without a full appreciation of the problem they are attempting solve. As a result the solutions are often less than optimal. Sensemaking techniques equip organisations with tools that cover the entire problem lifecycle, from definition to solution.”

As a closing remark (that might be construed as advertising…) I’ll mention that I’ve discussed a number of these techniques on Eight to Late. Here are a couple of examples:

The Approach: a dialogue mapping story

The dilemmas of enterprise IT

…and, of course, you can always have a look at my book or ping me for a no-obligation chat to find out more:)

Written by K

March 15, 2016 at 6:02 pm

A gentle introduction to decision trees using R

with 3 comments

Introduction

Most techniques of predictive analytics have their origins in probability or statistical theory (see my post on Naïve Bayes, for example).  In this post I’ll look at one that has more a commonplace origin: the way in which humans make decisions.  When making decisions, we typically identify the options available and then evaluate them based on criteria that are important to us.  The intuitive appeal of such a procedure is in no small measure due to the fact that it can be easily explained through a visual. Consider the following graphic, for example:

Figure 1: Example of a simple decision tree (Courtesy: Duncan Hull)

Figure 1: Example of a simple decision tree (Courtesy: Duncan Hull)

(Original image: https://www.flickr.com/photos/dullhunk/7214525854, Credit: Duncan Hull)

The tree structure depicted here provides a neat, easy-to-follow description of the issue under consideration and its resolution. The decision procedure is based on asking a series of questions, each of which serve to further reduce the domain of possibilities. The predictive technique I discuss in this post,classification and regression trees (CART), works in much the same fashion. It was invented by Leo Breiman and his colleagues in the 1970s.

In what follows, I will use the open source software, R. If you are new to R,   you may want to follow this link for more on the basics of setting up and installing it. Note that the R implementation of the CART algorithm is called RPART (Recursive Partitioning And Regression Trees). This is essentially because Breiman and Co. trademarked the term CART. As some others have pointed out, it is somewhat ironical that the algorithm is now commonly referred to as RPART rather than by the term coined by its inventors.

A bit about the algorithm

The rpart algorithm works by splitting the dataset recursively, which means that the subsets that arise from a split are further split until a predetermined termination criterion is reached.  At each step, the split is made based on the independent variable that results in the largest possible reduction in heterogeneity of the dependent (predicted) variable.

Splitting rules can be constructed in many different ways, all of which are based on the notion of impurity-  a measure of the degree of heterogeneity of the leaf nodes. Put another way, a leaf node that contains a single class is homogeneous and has impurity=0.   There are three popular impurity quantification methods: Entropy (aka information gain), Gini Index and Classification Error.  Check out this article for a simple explanation of the three methods.

The rpart algorithm offers the entropy  and Gini index methods as choices. There is a fair amount of fact and opinion on the Web about which method is better. Here are some of the better articles I’ve come across:

https://www.quora.com/Are-gini-index-entropy-or-classification-error-measures-causing-any-difference-on-Decision-Tree-classification

http://stats.stackexchange.com/questions/130155/when-to-use-gini-impurity-and-when-to-use-information-gain

https://www.garysieling.com/blog/sklearn-gini-vs-entropy-criteria

http://www.salford-systems.com/resources/whitepapers/114-do-splitting-rules-really-matter

The answer as to which method is the best is: it depends.  Given this, it may be prudent to try out a couple of methods and pick the one that works best for your problem.

Regardless of the method chosen, the splitting rules partition the decision space (a fancy word for the entire dataset) into rectangular regions each of which correspond to a split. Consider the following simple example with two predictors x1 and x2. The first split is at x1=1 (which splits the decision space into two regions x1<1 and x1>1), the second at x2=2, which splits the (x1>1) region into 2 sub-regions, and finally x1=1.5 which splits the (x1>1,x2>2) sub-region further.

Figure 2: Example of partitioning

Figure 2: Example of partitioning

It is important to note that the algorithm works by making the best possible choice at each particular stage, without any consideration of whether those choices remain optimal in future stages. That is, the algorithm makes a locally optimal decision at each stage. It is thus quite possible that such a choice at one stage turns out to be sub-optimal in the overall scheme of things.  In other words,  the algorithm does not find a globally optimal tree.

Another important point relates to well-known bias-variance tradeoff in machine learning, which in simple terms is a tradeoff between the degree to which a model fits the training data and its predictive accuracy.  This refers to the general rule that beyond a point, it is counterproductive to improve the fit of a model to the training data as this increases the likelihood of overfitting.  It is easy to see that deep trees are more likely to overfit the data than shallow ones. One obvious way to control such overfitting is to construct shallower trees by stopping the algorithm at an appropriate point based on whether a split significantly improves the fit.  Another is to grow a tree unrestricted and then prune it back using an appropriate criterion. The rpart algorithm takes the latter approach.

Here is how it works in brief:

Essentially one minimises the cost,  C_{\alpha}(T), a quantity that is a  linear combination of the error (essentially, the fraction of misclassified instances, or variance in the case of a continuous variable), R(T)  and the number of leaf nodes in the tree, |\tilde{T} |:

C_{\alpha}(T) = R(T) + \alpha |\tilde{T} |

First, we note that when \alpha = 0, this simply returns the original fully grown tree. As \alpha increases, we incur a penalty that is proportional to the number of leaf nodes.  This tends to cause the minimum cost to occur for a tree that is a subtree of the original one (since a subtree will have a smaller number of leaf nodes). In practice we vary \alpha and pick the value that gives the subtree that results in the smallest cross-validated prediction error.  One does not have to worry about programming this because the rpart algorithm actually computes the errors for different values of \alpha for us. All we need to do is pick the value of the coefficient that gives the lowest cross-validated error. I will illustrate this in detail in the next section.

An implication of their tendency to overfit data is that decision trees tend to be sensitive to relatively minor changes in the training datasets. Indeed, small differences can lead to radically different looking trees. Pruning addresses this to an extent, but does not resolve it completely.  A better resolution is offered by the so-called ensemble methods that average over many differently constructed trees. I’ll discuss one such method at length in a future post.

Finally, I should also mention that decision trees can be used for both classification and regression problems (i.e. those in which the predicted variable is discrete and continuous respectively).  I’ll demonstrate both types of problems in the next two sections.

Classification trees using rpart

To demonstrate classification trees, we’ll use the Ionosphere dataset available in the mlbench package in R. I have chosen this dataset because it nicely illustrates the points I wish to make in this post. In general, you will almost always find that algorithms that work fine on classroom datasets do not work so well in the real world…but of course, you know that already!

We begin by setting the working directory, loading the required packages (rpart and mlbench) and then loading the Ionosphere dataset.

#set working directory if needed (modify path as needed)
setwd(“C:/Users/Kailash/Documents/decisiontrees”)
#load required libraries – rpart for classification and regression trees
library(rpart)
#mlbench for Ionosphere dataset
library(mlbench)
#load Ionosphere
data(“Ionosphere”)

Next we separate the data into training and test sets. We’ll use the former to build the model and the latter to test it. To do this, I use a simple scheme wherein I randomly select 80% of the data for the training set and assign the remainder to the test data set. This is easily done in a single R statement that invokes the uniform distribution (runif) and the vectorised function, ifelse. Before invoking runif, I set a seed integer to my favourite integer in order to ensure reproducibility of results.

#set seed to ensure reproducible results
set.seed(42)
#split into training and test sets
Ionosphere[,”train”] <- ifelse(runif(nrow(Ionosphere))<0.8,1,0)
#separate training and test sets
trainset <- Ionosphere[Ionosphere$train==1,]
testset <- Ionosphere[Ionosphere$train==0,]
#get column index of train flag
trainColNum <- grep(“train”,names(trainset))
#remove train flag column from train and test sets
trainset <- trainset[,-trainColNum]
testset <- testset[,-trainColNum]

In the above, I have also removed the training flag from the training and test datasets.

Next we  invoke rpart. I strongly recommend you take some time to go through the documentation and understand the parameters and their defaults values.  Note that we need to remove the predicted variable from the dataset before passing the latter on to the algorithm, which is why we need to find the column index of the  predicted variable (first line below). Also note that we set the method parameter to “class“, which simply tells the algorithm that the predicted variable is discrete.  Finally, rpart uses Gini rule for splitting by default, and we’ll stick with this option.

#get column index of predicted variable in dataset
typeColNum <- grep(“Class”,names(Ionosphere))
#build model
rpart_model <- rpart(Class~.,data = trainset, method=”class”)
#plot tree
plot(rpart_model);text(rpart_model)

 

The resulting plot is shown in Figure 3 below.  It is  quite self-explanatory so I  won’t dwell on it here.

Figure 3: A classification tree for Ionosphere dataset

Figure 3: A classification tree for Ionosphere dataset

Next we see how good the model is by seeing how it fares against the test data.

#…and the moment of reckoning
rpart_predict <- predict(rpart_model,testset[,-typeColNum],type=”class”)
mean(rpart_predict==testset$Class)
[1] 0.8450704
#confusion matrix
table(pred=rpart_predict,true=testset$Class)
pred true bad good
bad 17 2
good 9 43

 

Note that we need to verify the above results by doing multiple runs, each using different training and test sets. I will  do this later, after discussing pruning.

Next, we prune the tree using the cost complexity criterion. Basically, the intent is to see if a shallower subtree can give us comparable results. If so, we’d be better of choosing the shallower tree because it reduces the likelihood of overfitting.

As described earlier, we choose the appropriate pruning parameter (aka cost-complexity parameter) \alpha by picking the value that results in the lowest prediction error. Note that all relevant computations have already been carried out by R when we built the original tree (the call to rpart in the code above). All that remains now is to pick the value of \alpha:

#cost-complexity pruning
printcp(rpart_model)
CP nsplit rel error xerror xstd
1 0.57 0 1.00 1.00 0.080178
2 0.20 1 0.43 0.46 0.062002
3 0.02 2 0.23 0.26 0.048565
4 0.01 4 0.19 0.35

It is clear from the above, that the lowest cross-validation error (xerror in the table) occurs for \alpha =0.02 (this is CP in the table above).   One can find CP programatically like so:

# get index of CP with lowest xerror
opt <- which.min(rpart_model$cptable[,”xerror”])
#get its value
cp <- rpart_model$cptable[opt, “CP”]

Next, we prune the tree based on this value of CP:

#prune tree
pruned_model <- prune(rpart_model,cp)
#plot tree
plot(pruned_model);text(pruned_model)

Note that rpart will use a default CP value of 0.01 if you don’t specify one in prune.

The pruned tree is shown in Figure 4 below.

Figure 4: A pruned classification tree for Ionosphere dataset

Figure 4: A pruned classification tree for Ionosphere dataset

Let’s see how this tree stacks up against the fully grown one shown in Fig 3.

#find proportion of correct predictions using test set
rpart_pruned_predict <- predict(pruned_model,testset[,-typeColNum],type=”class”)
mean(rpart_pruned_predict==testset$Class)
[1] 0.8873239

This seems like an improvement over the unpruned tree, but one swallow does not a summer make. We need to check that this holds up for different training and test sets. This is easily done by creating multiple random partitions of the dataset and checking the efficacy of pruning for each. To do this efficiently, I’ll create a function that takes the training fraction, number of runs (partitions) and the name of the dataset as inputs and outputs the proportion of correct predictions for each run. It also optionally prunes the tree. Here’s the code:

#function to do multiple runs
multiple_runs_classification <- function(train_fraction,n,dataset,prune_tree=FALSE){
fraction_correct <- rep(NA,n)
set.seed(42)
for (i in 1:n){
dataset[,”train”] <- ifelse(runif(nrow(dataset))<0.8,1,0)
trainColNum <- grep(“train”,names(dataset))
typeColNum <- grep(“Class”,names(dataset))
trainset <- dataset[dataset$train==1,-trainColNum]
testset <- dataset[dataset$train==0,-trainColNum]
rpart_model <- rpart(Class~.,data = trainset, method=”class”)
if(prune_tree==FALSE) {
rpart_test_predict <- predict(rpart_model,testset[,-typeColNum],type=”class”)
fraction_correct[i] <- mean(rpart_test_predict==testset$Class)
}else{
opt <- which.min(rpart_model$cptable[,”xerror”])
cp <- rpart_model$cptable[opt, “CP”]
pruned_model <- prune(rpart_model,cp)
rpart_pruned_predict <- predict(pruned_model,testset[,-typeColNum],type=”class”)
fraction_correct[i] <- mean(rpart_pruned_predict==testset$Class)
}
}
return(fraction_correct)
}

Note that in the above,  I have set the default value of the prune_tree to FALSE, so the function will execute the first branch of the if statement unless the default is overridden.

OK, so let’s do 50 runs with and without pruning, and check the mean and variance of the results for both sets of runs.

#50 runs, no pruning
unpruned_set <- multiple_runs_classification(0.8,50,Ionosphere)
mean(unpruned_set)
[1] 0.8772763
sd(unpruned_set)
[1] 0.03168975
#50 runs, with pruning
pruned_set <- multiple_runs_classification(0.8,50,Ionosphere,prune_tree=TRUE)
mean(pruned_set)
[1] 0.9042914
sd(pruned_set)
[1] 0.02970861

So we see that there is an improvement of about 3% with pruning. Also, if you were to plot the trees as we did earlier, you would see that this improvement is achieved with shallower trees. Again, I point out that this is not always the case. In fact, it often happens that pruning results in worse predictions, albeit with better reliability – a classic illustration of the bias-variance tradeoff.

Regression trees using rpart

In the previous section we saw how one can build decision trees for situations in which the predicted variable is discrete.  Let’s now look at the case in which the predicted variable is continuous. We’ll use the Boston Housing dataset from the mlbench package.  Much of the discussion of the earlier section applies here, so I’ll just display the code, explaining only the differences.

#load Boston Housing dataset
data(“BostonHousing”)
#set seed to ensure reproducible results
set.seed(42)
#split into training and test sets
BostonHousing[,”train”] <- ifelse(runif(nrow(BostonHousing))<0.8,1,0)
#separate training and test sets
trainset <- BostonHousing[BostonHousing$train==1,]
testset <- BostonHousing[BostonHousing$train==0,]
#get column index of train flag
trainColNum <- grep(“train”,names(trainset))
#remove train flag column from train and test sets
trainset <- trainset[,-trainColNum]
testset <- testset[,-trainColNum]

Next we invoke rpart, noting that the predicted variable is medv (median value of owner-occupied homes in $1000 units) and that we need to set the method parameter to “anova“. The latter tells rpart that the predicted variable is continuous (i.e that this is a regression problem).

#build model
rpart_model <- rpart(medv~.,data = trainset, method=”anova”)
#plot tree
plot(rpart_model);text(rpart_model)

The plot of the tree is shown in Figure 5 below.

Figure 5: A regression tree for Boston Housing dataset

Figure 5: A regression tree for Boston Housing dataset

Next, we need to see how good the predictions are. Since the dependent variable is continuous, we cannot compare the predictions directly against the test set. Instead, we calculate the root mean square (RMS) error. To do this, we request rpart to output the predictions as a vector – one prediction per record in the test dataset. The RMS error can then easily be calculated by comparing this vector with the medv column in the test dataset.

Here is the relevant code:

#…the moment of reckoning
rpart_test_predict <- predict(rpart_model,testset[,-resultColNum],type = “vector” )
#calculate RMS error
rmsqe <- sqrt(mean((rpart_test_predict-testset$medv)^2)))
rmsqe
[1] 4.586388

Again, we need to do multiple runs to check on the  reliability of the predictions. However, you already know how to do that so I will leave it to you.

Moving on, we prune the tree using the cost complexity criterion as before.  The code is exactly the same as in the classification problem.

# get index of CP with lowest xerror
opt <- which.min(rpart_model$cptable[,”xerror”])
#get its value
cp <- rpart_model$cptable[opt, “CP”]
#prune tree
pruned_model <- prune(rpart_model,cp)
#plot tree
plot(pruned_model);text(pruned_model)

The tree is unchanged so I won’t show it here. This means, as far as the cost complexity pruning is concerned, the optimal subtree is the same as the original tree. To confirm this, we’d need to do multiple runs as before – something that I’ve already left as as an exercise for you:).  Basically, you’ll need to write a function analogous to the one above, that computes the root mean square error instead of the proportion of correct predictions.

Wrapping up

This brings us to the end of my introduction to classification and regression trees using R.  Unlike some articles on the topic I have attempted to describe each of the steps in detail and provide at least some kind of a rationale for them. I hope you’ve found the description and code snippets useful.

I’ll end by reiterating a couple points I made early in this piece. The nice thing about decision trees is that they are easy to explain to the users of our predictions. This is primarily because they  reflect the way we think about how decisions are made in real life – via a set of binary choices based on appropriate criteria. That  said, in many practical situations decision trees turn out to be unstable:  small changes in the dataset can lead to wildly different trees. It turns out that this limitation can be addressed by building a variety of trees using different starting points and then averaging over  them.  This is the domain of the so-called random forest algorithm.We’ll make the journey from decision trees to random forests in a future post.

Until then, thanks for reading.  Your comments & criticisms are welcomed, as always.

Written by K

February 16, 2016 at 6:33 pm

Improving decision-making in projects

with 5 comments

An irony of organisational life is that the most important decisions on projects (or any other initiatives) have to be made at the start, when ambiguity is at its highest and information availability lowest. I recently gave a talk at the Pune office of BMC Software on improving decision-making in such situations.

The talk was recorded and simulcast to a couple of locations in India. The folks at BMC very kindly sent me a copy of the recording with permission to publish it on Eight to Late. Here it is:


Based on the questions asked and the feedback received, I reckon that a number of people found the talk  useful. I’d welcome your comments/feedback.

Acknowledgements: My thanks go out to Gaurav Pal, Manish Gadgil and Mrinalini Wankhede for giving me the opportunity to speak at BMC, and to Shubhangi Apte for putting me in touch with them. Finally, I’d like to thank the wonderful audience at BMC for their insightful questions and comments.

Evolution, obsolescence and enterprise architecture

with 2 comments

Introduction

Enterprise architects are seldom (never?) given a blank canvas on which they can draw as they please. They invariably have to begin with an installed base of systems over which they have no control.  As I wrote in a piece on the legacy of legacy systems:

An often unstated (but implicit) requirement [on new systems] is that [they] must maintain continuity between the past and present. This is true even for systems that claim to represent a clean break from the past; one never has the luxury of a completely blank slate, there are always arbitrary constraints placed by legacy systems.

Indeed the system landscape of any large organization is a palimpsest, always retaining traces of what came before.  Those who actually maintain systems  – usually not architects – are painfully aware of this simple truth.

The IT landscape of an organization is therefore a snapshot, a picture that begins to age the instant is taken. Practicing enterprise architects will say they know this “of course”, and pay due homage to it in their words…but often not their actions.  The conflicts and contradictions between legacy and their aspirational architectures are hard to deal with and hence easier to ignore. In this post, I draw a parallel between this central conundrum of enterprise architecture and the process of biological evolution.

A Batesonian perspective on evolution

I’ve recently been re-reading Mind and Nature: A Necessary Unity, a book that Gregory Bateson wrote towards the end of his life of eclectic scholarship. Tucked away in the appendix of the book is an essay lamenting the fragmentation of knowledge and the lack of transdisciplinary thinking within universities.  Central to the essay is the notion of obsolescence. Bateson argued that much of what was taught in universities lagged behind the practical skills and mindsets that were needed to tackle the problems of that time.  Most people would agree that this is as true today as it was in Bateson’s time, perhaps even more so.

Bateson had a very specific idea of obsolescence in mind. He suggested that the educational system is lopsided because it invariably lags behind what is needed in the “real world”. Specifically, there is a lag between the typical university curriculum and the attitudes, dispositions, knowledge and skills needed to the problems of an ever-changing world. This lag is what Bateson referred to as obsolescence. Indeed, if the external world did not change there would be no lag and hence no obsolescence. As he noted:

I therefore propose to analyze the lopsided process called “obsolescence” which we might more precisely call “one-sided progress.” Clearly for obsolescence to occur there must be, in other parts of the system, other changes compared with which the obsolete is somehow lagging or left behind. In a static system, there would be no obsolescence…

This notion of obsolescence-as-lag has a direct connection with the contrasting process of developmental and evolutionary biology. The process of development of an embryo is inherently conservative – it develops according predetermined rules and is relatively robust to external stimuli. On the other hand, after birth, individuals are continually subject to a wide range of external factors (e.g. climate, stress etc.) that are unpredictable. If exposed to such factors over an extended period, they may change their characteristics in response to them (e.g. the tanning effect of sunlight, adaptability etc).  However, these characteristics are not inheritable.  They are passed on (if at all) by a much slower process of natural selection.  As a consequence, there is a significant lag between external stimuli and the inheritability of the associated characteristics.

As Bateson puts it:

Survival depends upon two contrasting phenomena or processes, two ways of achieving adaptive action. Evolution must always, Janus-like, face in two directions: inward towards the developmental regularities and physiology of the living creature and outward towards the vagaries and demands of the environment. These two necessary components of life contrast in interesting ways: the inner development-the embryology or “epigenesis”-is conservative and demands that every new thing shall conform or be compatible with the regularities of the status quo ante. If we think of a natural selection of new features of anatomy or physiology-then it is clear that one side of this selection process will favor those new items which do not upset the old apple cart. This is minimal necessary conservatism.

In contrast, the outside world is perpetually changing and becoming ready to receive creatures which have undergone change, almost insisting upon change. No animal or plant can ever be “readymade.” The internal recipe insists upon compatibility but is never sufficient for the development and life of the organism. Always the creature itself must achieve change of its own body. It must acquire certain somatic characteristics by use, by disuse, by habit, by hardship, and by nurture. These “acquired characteristics” must, however, never be passed on to the offspring. They must not be directly incorporated into the DNA. In organisational terms, the injunction – e.g. to make babies with strong shoulders who will work better in coal mines- must be transmitted through channels, and the channel in this case is via natural external selection of those offspring who happen (thanks to the random shuffling of genes and random creation of mutations) to have a greater propensity for developing stronger shoulders under the stress of working in coal mine.

The upshot of the above is that the genetic code of any species is inherently obsolete because it is, in at least a few ways, maladapted to its environment.  This is a good thing. Sustainable and lasting change to the genome of a population should occur only through the trial-and-error process of natural selection over many generations. It is only through such a gradual process that one can be sure that that a) the adaptation is necessary and b) that it occurs with minimal disruption to the existing order.

…and so to enterprise architecture

In essence, the aim of enterprise architecture is to come up with a strategy and plan to move from an old system landscape to a new one. Typically, architectures are proposed based on current technology trends and extrapolations thereof. Frameworks such as The Open Group Architecture Framework (TOGAF) present a range of options for migrating from legacy architecture.

Here’s an excerpt from Chapter 13 of the TOGAF Guide:

[The objective is to] create an overall Implementation and Migration Strategy that will guide the implementation of the Target Architecture, and structure any Transition Architectures. The first activity is to determine an overall strategic approach to implementing the solutions and/or exploiting opportunities. There are three basic approaches as follows:

  • Greenfield: A completely new implementation.
  • Revolutionary: A radical change (i.e., switches on, switch off).
  • Evolutionary: A strategy of convergence, such as parallel running or a phased approach to introduce new capabilities.

What can we say about these options in light of the discussion of the previous sections?

Firstly, from the discussion of the introduction, it is clear that Greenfield situations can be discounted on grounds rarity alone.  So let’s look at the second option – revolutionary change – and ask if it is viable in light of the discussion of the previous section.

In the case of a particular organization, the gap between an old architecture and technology trends/extrapolations is analogous to the lag between inherited characteristics and external forces. The former resist change; the latter insist on it.  The discussion of the previous section tells us that the former cannot be wished away, they are a natural consequence of “technology genes” embedded in the organization. Because this is so, changes are best introduced in a gradual way that permits adaptation through the slow and painful process of trial and error. This is why the revolutionary approach usually fails.

It follows from the above that the only viable approach to enterprise architecture is an evolutionary one. This process is necessarily gradual. Architects may wish for green fields and revolutions, but the reality is that lasting and sustainable change in an organisation’s technology landscape can only be achieved incrementally, akin to the way in which an aspiring marathon runner’s physiology adapts to the extreme demands of the sport.

The other, perhaps more subtle point made by this analogy is that a particular organization is but one member of a “species” which, in the present context, is a population of organisations that have a certain technology landscape. Clearly, a new style of architecture will be deemed a success only if it is adopted successfully by a significant number of organisations within this population. Equally clear is that this eventuality is improbable because new architectural paradigms are akin to random mutations. Most of these are rightly rejected by organizations, but only after exacting a high price. This explains why most technology fads tend to fade away.

Some consequences

The analogy between the evolution of biological systems and organizational technology landscapes has some interesting implications for enterprise architects. Here are a few that are worth highlighting:

  1. Enterprise architects are caught between a rock and a hard place: to demonstrate value they have to change things rapidly, but rapid changes are more likely to fail than succeed.
  2. The best chance of success lies in an evolutionary approach that accepts trial and error as a natural part of the process. The trick lies in selling that to management…and there are ways to do that.
  3. A corollary of (2) is that old and new elements of the landscape will necessarily have to coexist, often for periods much longer than one might expect. One must therefore design for coexistence. Above all, the focus here should be on the interfaces for these are the critical elements that enable the old and the new to “talk” to each other.
  4. Enterprise architects should be skeptical of cutting edge technologies. It almost always better to bet on proven technologies because they have the benefit of the experience of others.
  5. One of the consequences of an evolutionary process of trial and error is that benefits (or downsides) are often not evident upfront. One must therefore always keep an eye out for these unexpected features.

Finally, it is worth pointing out that an interesting feature of all the above points is that they are consistent with the principles of emergent design.

Wrapping up

In this article I’ve attempted to highlight a connection between the evolution of organizational technology landscapes and the process of biological evolution. At the heart of both lie a fundamental tension between inherent conservatism (the tendency to preserve the status quo change) and the imperative to evolve in order to adapt to changes imposed by the environment. There is no question that maintaining the status quo is never an option. The question is how to evolve in order to ensure the best chance of success. Evolution tells us that the best approach is a gradual one, via a process of trial, error and learning.

Written by K

December 16, 2015 at 7:26 am

Follow

Get every new post delivered to your Inbox.

Join 469 other followers

%d bloggers like this: