Data science and sensemaking – tales from two hackathons
“It isn’t that they can’t see the solution. It is that they can’t see the problem” – GK Chesterton
Examples of vendor-generated hype about data science are not hard to find, I found one on the very first site I visited: a large technology and services vendor who, in their own words, claim their analytics solutions help organisations “engage with data to answer the toughest business questions, uncover patterns and pursue breakthrough ideas.” I’ve deliberately avoided linking to the guilty party because there are many others that spout similar rhetoric.
Unfortunately it seems to work: according to Gartner, “by 2020, predictive and prescriptive analytics will attract 40% of enterprises’ net new investment in business intelligence and analytics.” This trend is accompanied by a concomitant increase in demand for data science education, fuelled by remarks along the lines that data science is “The Sexiest Job of the 21st Century.”
By and large, data science education tends to focus on algorithms and technology, but its practice involves much more. The vendor who claims that technology can help organisations grapple with “toughest business questions” and “pursue breakthrough ideas” is singularly silent about where these questions or ideas come from. Data is meaningless without a meaningful hypothesis. Problem is, in the real world questions or hypotheses aren’t obvious; one has to work to formulate them. As the management icon Russell Ackoff once said, “Outside of school, problems are seldom given; they have to be taken, extracted from complex situations…”
The art of taking problems is what sensemaking is all about.
Unfortunately, it is a skill that is typically ignored by data science educators.
Probably because it is hard to teach…but the good news is that it can be learnt. Like most tacit skills, sensemaking is best learnt by doing, that is, by formulating problems in real-world situations. Before I get to that, however, let’s take a brief detour.
Real world problems are characterised by ambiguity
An important aspect of real-world problems – as opposed to classroom ones – is that they are invariably fraught with ambiguity. For example, a customer’s requirements may be vague or the available data incomplete and messy. What this means is that there is no guarantee one will be able to formulate a well-posed problem, let alone get a useful answer. Worse, unlike a risk-based situation in which uncertainty can be quantified, one cannot even figure out the odds of success.
The human brain processes quantifiable uncertainty (aka risk) and ambiguity very differently. The former, which can be calculated, is dealt with by the prefrontal cortex which is responsible for decision making and goal-oriented thinking. Ambiguity, on the other hand, is processed by the amygdala, which deals with emotions. The upshot of this is that ambiguity evokes an emotional response, the most common one being anxiety.
Although some people are innately better at coping with anxiety than others, it is possible to get better at it by repeatedly putting oneself in high-pressure (yet safe) situations that are ambiguous. For data science students, hackathons provide a perfect opportunity to do this.
Ambiguity in data science – tales from two hackathons
Over the last two months, I’ve had the privilege of being a part of the Master of Data Science Innovation (MDSI) program run by the Connected Intelligence Centre at UTS. The course director, Theresa Anderson, sees hackathons as a great way for students to learn how to handle ambiguity. So, apart from regular coursework assignments, students are encouraged to participate in external hackathons sponsored by industry and government organisations. This gives them opportunities to gain practical experience in formulating problems in ambiguous and high-pressure environments.
Datacake at GovHack
A few MDSI student teams participated in a GovHack event earlier this year. Here’s what William Azevedo, a member of team that called themselves Datacake, wrote about his team’s problem formulation journey at the event :
The challenge is simple: the competitors should form teams, identify a problem and use data from government agencies from Australia and New Zealand to present a solution to the problem. Naturally, this solution should bring some benefit to the society.
I’m not sure I’d use the word simple…but the importance of problem formulation comes through quite clearly. Here’s how he and his team (called Datacake) went about it:
As a starting point, our team published an online survey to understand how safe people feel when walking on the streets, especially at night. As we didn’t have much time, we spread the message via social networks. In a couple of hours, we received 44 answers. It gave us enough information to back our idea.
Notice the process used in defining the problem – the team realised they did not know enough to define a meaningful problem so they went and got relevant data. Following this:
Our team analysed the answers of the survey, engaged in passionate discussions, took tips from the mentors, had lots of coffee and designed some cool diagrams on the blackboard.
…and then his description of the Aha moment when a good idea emerged:
Then the magic happened. We had this idea of merging information about crime, demographics, weather, land zoning and street illumination to provide a map of the safe and unsafe areas within a suburb.
An important point is that sensemaking is best done collaboratively. Since the problem is ambiguous or even undefined (as in this case) no individual has a privileged access to the “truth.” It is therefore important to bring diverse perspectives to bear on the problem. Indeed, sensemaking may be thought of as collaborative problem formulation and solving. In view of this it is interesting to hear what other members of Team Datacake had to say about their problem formulation process. Here’s a comment from Anthony So:
During the whole weekend we really forced ourselves to go deep and asked “Why is it happening? Why is it happening? Why is it happening?” every time we found an interesting pattern. We really wanted to understand the true root causes of those accidents. We didn’t want to stay at a descriptive level. We knew the answers were behavioural. We knew there were multiple problems and therefore require different answers and solutions. We did different techniques to do so: machine learning, stats, data visualisation. It didn’t matter which we used the only important point was how can we get to answers of those questions.
The specific area they looked at was pedestrian safety. They found that obvious variables, such as driver fatigue and hazards were not significant, so they started looking for other potential factors. Here’s how Anthony put it:
For instance we built a classification model on the severity of the accidents involving children but we didn’t use it to make predictions. We used it to identify the important features (and unimportant) for those cases. We found out that some of the variables related to the environment (Primary_hazardous_feature, Surface_condition, Weather…) and to the drivers (Fatigue_involved_in_crash…) were not important. This gave us a good indication that those accidents are mostly related directly to the behaviour of the children. So we kept diving further and further and found 3 postcodes with higher numbers of accidents than others. We focused on those 3 areas and we kept going deeper and deeper…
In the end Datacake came up with a few suggestions for improving pedestrian safety. They were awarded a prize for their efforts, so the problem they formulated and solved was clearly valuable to the sponsors.
A couple of weekends ago, Pepper Money, Australia’s largest non-bank lender sponsored a day long internal hackathon for MDSI students, with a hefty winner-take-all prize as an incentive. The challenge was quite open-ended, and had to do with helping the organisation develop a consistent brand voice. Participants were given a small corpus of text files from the organisation’s public and social media sites and were given very general guidelines on how to proceed. Details were left entirely to the teams.
As one might expect, most teams spent the first few hours struggling to define a relevant and tractable problem – relevance being paramount for the client and tractability for the teams. Being a mentor at the event, I was able observe how different teams handled this. Among other things, I was particularly impressed by how some teams with very little text mining experience were able to – in a few hours – come up with a good problem, an approach to solve it…and, most importantly, make decent progress by day’s end.
I won’t go into details except to say that the approaches were diverse, ranging from the somewhat philosophical to the very technical. A couple of examples:
- Using Aristotle’s notion of modes of persuasion to analyse and evaluate marketing material.
- Drawing on deep learning technology (Recurrent Neural Networks via Theano) to build a brand voice generator.
I was amazed at the diversity of solutions the groups came up with, and so were the other mentors and the sponsor. Blair Hudson, Innovation Portfolio Manager at Pepper Money, summed the day up very well when he said:
#PepxUTS was our first hackathon event, challenging students to build data science solutions in a day to allow everyone at Pepper to communicate using a consistent brand voice. Our Co-Group CEOs both joined in for judging and awarded the winners. It was a rewarding day for all involved
(For some vignettes from the day, check out the #PepxUTS hashtag on Twitter.)
The day’s experiences left me ever more convinced that hackathons are an excellent vehicle for learning and demonstrating the practical utility of sensemaking skills.
The two case studies highlight the benefits of sensemaking skills, both for students and organisations. On the one hand, students who participated got valuable experience in formulating problems collaboratively in high-pressure, high-ambiguity situations. This is a skill that cannot be learnt in classrooms, MOOCs or even in online data challenges (like Kaggle) where problems tend to be clearly defined. On the other hand, sponsoring organisations have benefited from new insights into longstanding problems.
Finally, it should be clear that although I’ve focused on educational settings, what I’ve said for students applies to organisational settings too: there’s nothing to stop organisations from using hackathons as a means to help their employees learn sensemaking skills.
To conclude, the main point I want to make is that the most important situations we encounter at work (and even in our personal lives) are usually fraught with ambiguity. Our first reaction is to jump into problem solving mode because it feels like the right thing to do. In reality, one is generally better off stepping back and taking the time to think the situation through, preferably with a group of diversely skilled individuals. All too often this sensemaking step is neglected, and teams end up solving an irrelevant problem.
To paraphrase Chesterton, in order to see the right solution, one must first see the right problem.
Many thanks to Blair Hudson, William Azevedo and Anthony So for their contributions to this piece.