The politics of data warehousing revisited
An enterprise IT initiative generally affects a range of stakeholders groups, each with their own take on why the project is being undertaken and what the result should look like. This diversity of views is no surprise: an organisation-wide effort affects many divisions and departments, so there are bound to be differing – even conflicting – views regarding the initiative and its expected outcome.
The existence of many irreconcilable viewpoints is one of the main symptoms of a wicked problem – a problem that is hard to define, let alone solve. Paul Culmsee has written about the inherent wickedness of projects that involve collaborative platforms such as SharePoint. In this post I discuss how another class of enterprise scale initiatives – efforts to consolidate and harmonise organizational data for analytical and reporting purposes (so-called data warehouse projects) – display characteristics of wickedness. I also briefly discuss a couple of approaches that can be used to manage this issue.
As some of my readers may not be familiar with the terms data warehouse or wicked problem, I’ll start with a short introduction to the two terms in order to set the stage for the main topic.
A data warehouse is a repository of data that a business deems important for reporting and analysis. Ideally, a data warehouse integrates data from multiple sources – for example, CRM and financial systems – thereby serving as an authoritative source for management reports (often referred to as a “single point of truth”). There are at least a couple of different design philosophies for data warehouses, but I won’t go into these as they are not relevant to the discussion. What’s interesting is that most of the literature on data warehousing deals with its technical aspects – things such as data modelling and extract-transform-load processes – yet, as anyone who has been involved in an enterprise-scale data warehousing effort will tell you, the biggest challenges are political, not technical. To be fair, this was recognized a while ago – Marc Demarest wrote an article on the politics of data warehousing in 1997. However, it is worth revisiting this issue because there are techniques to handle it that weren’t widely known at the time Demarest wrote his article. I discuss these briefly later, but first let’s look at what wickedness means and its relevance to data warehouse projects.
The term wicked problem was coined by Horst Rittel and Melvin Webber in a now-classic paper entitled Dilemmas in a General Theory of Planning. The paper is essentially a critique of the traditional approach to social planning, wherein decisions are made by experts who, by virtue of their specialist knowledge and training, are assumed to know best. Such an approach often doesn’t work because it ends up alienating stakeholders who are adversely affected by the “solution.” This is a symptom of social complexity – messiness and conflict arising from diverse opinions as to what the problem is and how it should be solved. Those involved in enterprise-scale IT initiatives – whether as users, managers or technical specialists – would have had first-hand experience of this social complexity.
How do we know that a problem is socially complex (or wicked)? That’s easy: In the paper, Rittel and Webber describe ten criteria for wickedness – so a problem is wicked if it satisfies some or all of the Rittel-Webber criteria. We’ll take a look at the criteria and their relevance to data warehousing next.
The wickedness of data warehouse initiatives
To support my claim about the wickedness of data warehousing initiatives, I’ll simply list the ten Rittel-Webber criteria (in their original form) along with a brief commentary on how they can crop up in data warehouse projects. Here we go:
- There is no definitive formulation of a wicked problem: Those who have worked on organisation-wide efforts at integrating data will know that the first problem is to decide “what’s in and what’s out” – that is, what data sources are considered in scope for integration. The problem arises because different business stakeholders have different views on what is important. For example, data that is critical to HR may not be a priority for the marketing function.
- Wicked problems have no stopping rule: Data warehouse initiatives are never definitively completed: there are always new data sources that need to be integrated; old ones to be turned off; business rules to be changed and so on. Any stopping rule that one might define will need to be revised as new business requirements come up and new data sources are revealed.
- Solutions to wicked problems are not true or false, but better or worse: This is simply an expression of the truism that there is no right or wrong way to build a data warehouse. There are a range of different architectures and approaches that can be chosen, each with their pros and cons (see this paper for a comparison of the two most popular approaches). The problem is that one often cannot tell beforehand which approach is going to be best for a particular situation.
- There is no immediate or ultimate test of a solution to a wicked problem: This is a statement of the fact that one cannot tell whether or not a particular implementation can completely solve the problem of data integration. As Rittel and Webber put it, “…any solution, after being implemented, will generate waves of consequences over an extended – virtually unbounded – period of time. Moreover, the next day’s consequences of the solution may yield utterly undesirable consequences…” Although these words are somewhat over-the-top, the message isn’t: for example, I have seen situations where programming errors that have remained undetected for years (yes, years) have lead to incorrect data being used in reports.
- Every solution to a wicked problem is a “one-shot” operation; because there is no opportunity to learn by trial and error, every attempt counts significantly: Because of the high costs of implementation, enterprise-scale IT initiatives tend to be one-shot affairs. Another limiting factor is that there is usually a very short window of time in which the project must be completed – as the cliché goes, “users need these reports yesterday.” Among other things, this precludes the option of learning by trial and error.
- Wicked problems do not have an enumerable (or exhaustively describable) set of potential solutions, nor is there a set of well-describable options that may be incorporated into the plan: This point may seem like it doesn’t apply to data warehousing initiatives – all data warehousing projects have a plan, right? Nevertheless, those who have worked on such projects will attest to the fact that the plan – such as it is – needs frequent revision because of surprises that crop up along the way. Iterative/incremental development approaches can address these issues to some extent, but cannot eliminate them completely. Because of time constraints, it is inevitable that solutions to unexpected roadblocks occur through improvisation rather than planning.
- Every wicked problem is essentially unique: This one is easy to see: every organisation is unique, and so are its data integration requirements. Methodologists and consultants may try to convince you otherwise, and tempt you into following generic approaches – but don’t be fooled, generic approaches will come unstuck. Your data is unique, treat it with the respect and seriousness it deserves.
- Every wicked problem can be considered to be a symptom of another problem: One of the key drivers of data warehouse projects is that organizations tend to have the same (or similar) data residing in multiple databases. As a consequence there are several different “sources of truth” for reports. These different sources of truth arise because systems used in different departments may have different definitions of the same business entity. For example, a customer might be defined in one way within the financial system but in another way in a CRM system. Seen in this light, the problem of multiple sources of truth is actually a symptom of lack of communication between different departments, what is sometimes called silo mentality.
- The existence of a discrepancy representing a wicked problem can be explained in numerous ways. The choice of explanation determines the nature of the problem’s resolution: As discussed in the previous point, the discrepancy in the case of a data integration problem is the lack of congruency between different data sources. There can be a range of explanations for the discrepancy. For example, one explanation may be that the data is actually different – a customer in the CRM system is not the same as the customer in the finance system; another explanation may be that the two entities are the same but their definitions differ because the systems were developed independently of each other. The data integration solution in the two cases will differ –in other words, the solution to the problem depends on which explanation is seen as the correct one.
- The planner has no right to be wrong: The data warehouse designer is in a difficult position: he or she may have to reconcile contradictory requirements. Following from the example of the previous point, whatever design decisions the designer makes regarding the definition of a customer, there will be some parties that will not be happy: if she goes with the finance definition, sales will be ticked off; if she chooses the sales definition, finance will not be happy; if she chooses to define a single common entity, neither will be pleased. Yet, her mandate is to satisfy all business requirements. This criterion is essentially an expression of the political aspect of data warehouse projects.
I find it quite amazing that criteria that were framed in the context of social planning problems can apply word-for-word to data consolidation initiatives.
Managing wickedness in data warehousing
As should be evident from the above, wicked problems can’t be solved in the usual sense of the word, but they can be managed. Although there are many techniques to manage wickedness, they all focus on the same end: to help all stakeholder groups reach a shared understanding of the problem and make a shared commitment to action. Such a shared understanding is absolutely critical because business and IT folks often have differing views on what a data warehouse ought to be.
One approach that I have used to help stakeholders get to a shared understanding in data warehouse projects is dialogue mapping, a facilitation technique that maps out the conversation between stakeholders as it occurs. Dialogue mapping uses the Issue-Based Information System (IBIS) notation which was invented by Rittel as a means to document the different facets of a wicked problem. See this post for a data warehouse related example of dialogue mapping and this one for more on the IBIS notation.
Shared understanding and commitment to action is well and good, but in the end success is measured by deliverables: the data warehouse and accompanying reports must be built. One of the challenges with a data warehouse initiative is that customers have to wait a long (very long!) time before they see any tangible benefits. Agile approaches to data warehousing offer a way to address this issue. For those interested in the nuts and bolts of agile data warehousing, I recommend Ralph Hughes’ book, which discusses how Scrum can be adapted for data warehousing projects.
Although the juxtaposition of the terms “agile” and “data warehouse” may sound oxymoronic to some, there is evidence that it works (see this case study, for example). Of course, no approach is a silver bullet; those who want to read about potential problems may want to look at this thesis for a research-based view of the pros and cons of an agile approach to data warehousing.
In the end, though, one has to keep in mind that no development technique – agile or otherwise – will succeed unless all stakeholders have a shared understanding of what the data warehouse is intended to achieve. The biggest issues are organisational rather than technical.
As we have seen, corporate data integration problems satisfy many – if not all – of the criteria for wickedness. The main implication of this is that data consolidation at an enterprise level is not just a difficult technical problem it is also a socially complex one. Although tackling this requires skills and techniques that are outside of the standard repertoire of technical staff and managers, these skills can be learnt. What’s more, they are critical for success: those who undertake data warehouse projects without an understanding of the conflicting agendas of stakeholder groups may fail for reasons that have nothing to do with technology.