The dark side of data science
Data scientists are sometimes blind to the possibility that the predictions of their algorithms can have unforeseen negative effects on people. Ethical or social implications are easy to overlook when one finds interesting new patterns in data, especially if they promise significant financial gains. The Centrelink debt recovery debacle, recently reported in the Australian media, is a case in point.
Here is the story in brief:
Centrelink is an Australian Government organisation responsible for administering welfare services and payments to those in need. A major challenge such organisations face is ensuring that their clients are paid no less and no more than what is due to them. This is difficult because it involves crosschecking client income details across multiple systems owned by different government departments, a process that necessarily involves many assumptions. In July 2016, Centrelink unveiled an automated compliance system that compares income self-reported by clients to information held by the taxation office.
The problem is that the algorithm is flawed: it makes strong (and incorrect!) assumptions regarding the distribution of income across a financial year and, as a consequence, unfairly penalizes a number of legitimate benefit recipients. It is very likely that the designers and implementers of the algorithm did not fully understand the implications of their assumptions. Worse, from the errors made by the system, it appears they may not have adequately tested it either. But this did not stop them (or, quite possibly, their managers) from unleashing their algorithm on an unsuspecting public, causing widespread stress and distress. More on this a bit later.
Algorithms like the one described above are the subject of Cathy O’Neil’s aptly titled book, Weapons of Math Destruction. In the remainder of this article I discuss the main themes of the book. Just to be clear, this post is more riff than review. However, for those seeking an opinion, here’s my one-line version: I think the book should be read not only by data science practitioners, but also by those who use or are affected by their algorithms (which means pretty much everyone!).
Abstractions and assumptions
‘O Neil begins with the observation that data algorithms are mathematical models of reality, and are necessarily incomplete because several simplifying assumptions are invariably baked into them. This point is important and often overlooked so it is worth illustrating via an example.
When assessing a person’s suitability for a loan, a bank will want to know whether the person is a good risk. It is impossible to model creditworthiness completely because we do not know all the relevant variables and those that are known may be hard to measure. To make up for their ignorance, data scientists typically use proxy variables, i.e. variables that are believed to be correlated with the variable of interest and are also easily measurable. In the case of creditworthiness, proxy variables might be things like gender, age, employment status, residential postcode etc. Unfortunately many of these can be misleading, discriminatory or worse, both.
The Centrelink algorithm provides a good example of such a “double-whammy” proxy. The key variable it uses is the difference between the client’s annual income reported by the taxation office and self-reported annual income stated by the client. A large difference is taken to be an indicative of an incorrect payment and hence an outstanding debt. This simplistic assumption overlooks the fact that most affected people are not in steady jobs and therefore do not earn regular incomes over the course of a financial year (see this article by Michael Griffin, for a detailed example). Worse, this crude proxy places an unfair burden on vulnerable individuals for whom casual and part time work is a fact of life.
Worse still, for those wrongly targeted with a recovery notice, getting the errors sorted out is not a straightforward process. This is typical of a WMD. As ‘O Neil states in her book, “The human victims of WMDs…are held to a far higher standard of evidence than the algorithms themselves.” Perhaps this is because the algorithms are often opaque. But that’s a poor excuse. This is the only technical field where practitioners are held to a lower standard of accountability than those affected by their products.
‘O Neil’s sums it up rather nicely when she calls algorithms like the Centrelink one weapons of math destruction (WMD).
Self-fulfilling prophecies and feedback loops
A characteristic of WMD is that their predictions often become self-fulfilling prophecies. For example a person denied a loan by a faulty risk model is more likely to be denied again when he or she applies elsewhere, simply because it is on their record that they have been refused credit before. This kind of destructive feedback loop is typical of a WMD.
An example that ‘O Neil dwells on at length is a popular predictive policing program. Designed for efficiency rather than nuanced judgment, such algorithms measure what can easily be measured and act by it, ignoring the subtle contextual factors that inform the actions of experienced officers on the beat. Worse, they can lead to actions that can exacerbate the problem. For example, targeting young people of a certain demographic for stop and frisk actions can alienate them to a point where they might well turn to crime out of anger and exasperation.
As Goldratt famously said, “Tell me how you measure me and I’ll tell you how I’ll behave.”
This is not news: savvy managers have known about the dangers of managing by metrics for years. The problem is now exacerbated manyfold by our ability to implement and act on such metrics on an industrial scale, a trend that leads to a dangerous devaluation of human judgement in areas where it is most needed.
A related problem – briefly mentioned earlier – is that some of the important variables are known but hard to quantify in algorithmic terms. For example, it is known that community-oriented policing, where officers on the beat develop relationships with people in the community, leads to greater trust. The degree of trust is hard to quantify, but it is known that communities that have strong relationships with their police departments tend to have lower crime rates than similar communities that do not. Such important but hard-to-quantify factors are typically missed by predictive policing programs.
Ironically, although WMDs can cause destructive feedback loops, they are often not subjected to feedback themselves. O’Neil gives the example of algorithms that gauge the suitability of potential hires. These programs often use proxy variables such as IQ test results, personality tests etc. to predict employability. Candidates who are rejected often do not realise that they have been screened out by an algorithm. Further, it often happens that candidates who are thus rejected go on to successful careers elsewhere. However, this post-rejection information is never fed back to the algorithm because it impossible to do so.
In such cases, the only way to avoid being blackballed is to understand the rules set by the algorithm and play according to them. As ‘O Neil so poignantly puts it, “our lives increasingly depend on our ability to make our case to machines.” However, this can be difficult because it assumes that a) people know they are being assessed by an algorithm and 2) they have knowledge of how the algorithm works. In most hiring scenarios neither of these hold.
Just to be clear, not all data science models ignore feedback. For example, sabermetric algorithms used to assess player performance in Major League Baseball are continually revised based on latest player stats, thereby taking into account changes in performance.
Driven by data
In recent years, many workplaces have gradually seen the introduction to data-driven efficiency initiatives. Automated rostering, based on scheduling algorithms is an example. These algorithms are based on operations research techniques that were developed for scheduling complex manufacturing processes. Although appropriate for driving efficiency in manufacturing, these techniques are inappropriate for optimising shift work because of the effect they have on people. As O’ Neil states:
Scheduling software can be seen as an extension of just-in-time economy. But instead of lawn mower blades or cell phone screens showing up right on cue, it’s people, usually people who badly need money. And because they need money so desperately, the companies can bend their lives to the dictates of a mathematical model.
She correctly observes that an, “oversupply of low wage labour is the problem.” Employers know they can get away with treating people like machine parts because they have a large captive workforce. What makes this seriously scary is that vested interests can make it difficult to outlaw such exploitative practices. As ‘O Neil mentions:
Following [a] New York Times report on Starbucks’ scheduling practices, Democrats in Congress promptly drew up bills to rein in scheduling software. But facing a Republican majority fiercely opposed to government regulations, the chances that their bill would become law were nil. The legislation died.
Commercial interests invariably trump social and ethical issues, so it is highly unlikely that industry or government will take steps to curb the worst excesses of such algorithms without significant pressure from the general public. A first step towards this is to educate ourselves on how these algorithms work and the downstream social effects of their predictions.
Messing with your mind
There is an even more insidious way that algorithms mess with us. Hot on the heels of the recent US presidential election, there were suggestions that fake news items on Facebook may have influenced the results. Mark Zuckerberg denied this, but as this Casey Newton noted in this trenchant tweet, the denial leaves Facebook in “the awkward position of having to explain why they think they drive purchase decisions but not voting decisions.”
Be that as it may, the fact is Facebook’s own researchers have been conducting experiments to fine tune a tool they call the “voter megaphone”. Here’s what ‘O Neil says about it:
The idea was to encourage people to spread the word that they had voted. This seemed reasonable enough. By sprinkling people’s news feeds with “I voted” updates, Facebook was encouraging Americans – more that sixty-one million of them – to carry out their civic duty….by posting about people’s voting behaviour, the site was stoking peer pressure to vote. Studies have shown that the quiet satisfaction of carrying out a civic duty is less likely to move people than the possible judgement of friends and neighbours…The Facebook started out with a constructive and seemingly innocent goal to encourage people to vote. And it succeeded…researchers estimated that their campaign had increased turnout by 340,000 people. That’s a big enough crowd to swing entire states, and even national elections.
And if that’s not scary enough, try this:
For three months leading up to the election between President Obama and Mitt Romney, a researcher at the company….altered the news feed algorithm for about two million people, all of them politically engaged. The people got a higher proportion of hard news, as opposed to the usual cat videos, graduation announcements, or photos from Disney world….[the researcher] wanted to see if getting more [political] news from friends changed people’s political behaviour. Following the election [he] sent out surveys. The self-reported results that voter participation in this group inched up from 64 to 67 percent.
This might not sound like much, but considering the thin margins of recent presidential elections, it could be enough to change a result.
But it’s even more insidious. In a paper published in 2014, Facebook researchers showed that users’ moods can be influenced by the emotional content of their newsfeeds. Here’s a snippet from the abstract of the paper:
In an experiment with people who use Facebook, we test whether emotional contagion occurs outside of in-person interaction between individuals by reducing the amount of emotional content in the News Feed. When positive expressions were reduced, people produced fewer positive posts and more negative posts; when negative expressions were reduced, the opposite pattern occurred. These results indicate that emotions expressed by others on Facebook influence our own emotions, constituting experimental evidence for massive-scale contagion via social networks.
As you might imagine, there was a media uproar following which the lead researcher issued a clarification and Facebook officials duly expressed regret (but, as far as I know, not an apology). To be sure, advertisers have been exploiting this kind of “mind control” for years, but a public social media platform should (expect to) be held to a higher standard of ethics. Facebook has since reviewed its internal research practices, but the recent fake news affair shows that the story is to be continued.
Disarming weapons of math destruction
The Centrelink debt debacle, Facebook mood contagion experiments and the other case studies mentioned in the book illusrate the myriad ways in which Big Data algorithms have a pernicious effect on our day-to-day lives. Quite often people remain unaware of their influence, wondering why a loan was denied or a job application didn’t go their way. Just as often, they are aware of what is happening, but are powerless to change it – shift scheduling algorithms being a case in point.
This is not how it was meant to be. Technology was supposed to make life better for all, not just the few who wield it.
So what can be done? Here are some suggestions:
- To begin with, education is the key. We must work to demystify data science, create a general awareness of data science algorithms and how they work. O’ Neil’s book is an excellent first step in this direction (although it is very thin on details of how the algorithms work)
- Develop a code of ethics for data science practitioners. It is heartening to see that IEEE has recently come up with a discussion paper on ethical considerations for artificial intelligence and autonomous systems and ACM has proposed a set of principles for algorithmic transparency and accountability. However, I should also tag this suggestion with the warning that codes of ethics are not very effective as they can be easily violated. One has to – somehow – embed ethics in the DNA of data scientists. I believe, one way to do this is through practice-oriented education in which data scientists-in-training grapple with ethical issues through data challenges and hackathons. It is as Wittgenstein famously said, “it is clear that ethics cannot be articulated.” Ethics must be practiced.
- Put in place a system of reliable algorithmic audits within data science departments, particularly those that do work with significant social impact.
- Increase transparency a) by publishing information on how algorithms predict what they predict and b) by making it possible for those affected by the algorithm to access the data used to classify them as well as their classification, how it will be used and by whom.
- Encourage the development of algorithms that detect bias in other algorithms and correct it.
- Inspire aspiring data scientists to build models for the good.
It is only right that the last word in this long riff should go to ‘O Neil whose work inspired it. Towards the end of her book she writes:
Big Data processes codify the past. They do not invent the future. Doing that requires moral imagination, and that’s something that only humans can provide. We have to explicitly embed better values into our algorithms, creating Big Data models that follow our ethical lead. Sometimes that will mean putting fairness ahead of profit.
Excellent words for data scientists to live by.