Each month, Dr. Paola Cecchi-Dimeglio, a behavioral scientist and senior research fellow for Harvard Law School’s Center on the Legal Profession and the Harvard Kennedy School, will be answering questions about how law firms and legal service firms can navigate a dramatically changing legal environment by using data analytics and behavioral science to create incentives for their lawyers and others to change their behavior. (You can follow Paola on Twitter at @HLSPaola.)
On to this month’s question…
Ask Dr. Paola: We hear a lot about Artificial Intelligence and machine learning and all the positive change they will bring to businesses, including the legal industry, but what about the negatives? We hear that AI can perpetuate existing biases in organizations, is that true? How can organizations combat that?
Dr. Paola Cecchi-Dimeglio: Unfortunately, it is true that AI, machine learning and deep learning can be biased. And we are just becoming very aware of that.
We actually know from computer science and technology research that our data and algorithms can carry on biases, and those biases can disproportionate effect some people within a group, much like traditional biases.
For example, if you look at software that can predict the likelihood of certain crimes being committed in the future, and the coders or creators of the software designed it to predict crime based on people’s gender and race, then what some researchers have found is the algorithms can be biased towards certain groups, especially minorities.
And that happens because the system in itself is already tainted with biases, so any other data or trends we extrapolate from that system contain these biases — so the new data would be based on the pre-existing biases within the old data.
For instance, we know from data on sentencing and court rulings that people from a minority group are more likely to receive harsher court judgements than people who are from a non-minority group. Substantial empirical work confirms that phenomenon.
So, if you build your algorithms with that data, and if you take the data just at face value without realizing how easily biases can disadvantage certain groups, then the machine learning and the deep learning that comes out of that data will keep confirming an incorrect image of society because of these biases. Simply put, if you’re basing your analysis off of reading strictly past results, and those results were biased, you’re going to replicate the bias in the future.
To get around this and combat the perpetuation of these biases, there are two important points to keep in mind. First, you have to first determine how accurate your data is and see if it is treating members of the group of people you are examining differently and check for how accurate it is for the larger group overall. In a sense, the question you should be asking yourself is really, how inaccurate is my data?
And that brings us to the second point — controlling for these biases. That’s why it’s vital, when you’re trying to examine your data, that you have professionals that understand this issue, and who can design algorithms so they first, understand the diversity of the group; and second, can control well for inaccuracies and for mislabeling of people that might occur.
So back to our crime-predicting software example, if your algorithms have determined, because of existing data, that members of minority groups are at a higher risk of going to or returning to jail, will the algorithms then determine that non-minority group members are less likely to commit crimes than minority group members and subsequently mislabeling who is at higher risk of committing crime?
By the way, this can happen based on gender as well. But in the case of gender inequality, it may not be about committing crime, but about who will take better care of a child, for example. Is it a woman or a man?
Professionals in the criminology, sociology and statistical fields know that some of the software being used is naturally tainted by racial and gender inequality and that these biases have to be controlled.
So, even as you set out to determine something from your existing data, you have to consider that, whatever the variable that you will use to define your data, is it a neutral one? That is not an easy task, because data are never neutral and there is never a neutral way of looking at things. If you think about a simple question, such as how many convictions a person has received for crime, and you plan to take that as a neutral variable, you should think twice. You need to more closely examine what you’re asking the algorithms to do. In this case, you’d have to understand — and embed that understanding in your algorithms — that the number of arrests among minority group members may be disproportionate compared to the number for non-minority group members living in the same area.
Without closely examining and controlling for this bias, you are continuing to perpetrate the biases because the algorithms have lost the context and take at face value what the data says.