top of page

Choice architecture resolves systemic bias found in data science and machine learning

Updated: Jun 11, 2023


Introduction: We are pleased to present to James Madison University and its business analytics program. We discuss the differences between data science and decision science for overcoming bias and noise. We walk through Definitive's decision sciences applications and provide the students access to our choice architecture app, Definitive Choice. We discuss why, in a competition between data science and decision science, decision science wins easily. In fact, it is not even a fair fight!


The challenge of data science


Data science helps predict the future. It has many applications, such as:

  • The next best product to purchase on Amazon;

  • The next best movie to watch on Netflix; or,

  • The best loan product and credit risk of a borrower.

The data science algorithms, using artificial intelligence, neural networks, and machine learning, are potent and getting more powerful.


The challenge is data. Simply, data is defined as:

"The representation of our past reality."

For example, let's say you watched 20 movies on Netflix over the last year. Those 20 shows - along with your buying behavior, when you watched those shows, etc. are your "past reality" data that Netflix has stored in its database. Netflix's data science algorithms learn from that data and integrate it into its "next product" models for you and other customers.


So, does Netflix, Amazon, or any other platform have all the data of other people that look like us to help us make a "next best product" purchase decision? How do we know that the data on other people's behavior are the same as ours? Their past reality could not possibly be the same.... people are unique! People respond differently in different situations. Different social groups, genders, ethnic groups, etc have had dramatically different American experiences.


Also, what about incentives? Is Amazon incented to help me make the best decision or to help Amazon make the most money off me? Sometimes these incentives are aligned. But often incentives are NOT aligned!


Then, what about the makeup of the data, does Netflix have all the data representing their customers or just data they observe on the Netflix platform? You may be thinking "This is no big deal, if I don't like the movie Netflix recommends, I'll just move on to the next one." Good point. But what if the decision has a bigger life impact, like a loan or a parole decision?


If you are a borrower and you do not typically use the banking system, how could a bank possibly predict payment behavior accurately for people whose payment data is not in the bank's credit database? Just because the banks do not have access to your payment data, does not mean you did not pay your bills on time. People outside the banking system may both a) greatly benefit from and b) responsibly repay loans to provide life-improving opportunities for buying a car, a house, etc.


These questions about the representativeness, incentives, and completeness of the data demonstrate the systemic bias challenge of data science. These are the sort of biases that occur when the data is not fully available to make an accurate prediction. Systemic biases may occur to anyone. They are particularly acute in social groups that have not typically been represented in the data. Think of systemic bias as walls built to help certain people and have the impact of keeping others out. Systemic biases as found in statistical models can be very difficult to see. In fact, traditional measures of statistical precision, like RSquared or the K-S statistic, can be deceptively positive. The model can be very precise! The challenge is, the model is only great for a systemically biased subset of the population. Precise models may provide false confidence and lead to a very inaccurate decision. As algorithms become more powerful, the systemic bias walls only get higher. Our past reality is fixed. If there is a previous bias, powerful algorithms only enhance that bias. As a result: Given the fact that racism or other "isms" are part of our past reality -- related biases, by definition, must be resident in the data. Thus, the algorithms trained on that data must also be biased.


The late Harvard scientist and system researcher Donella Meadows said:

“…. most of what goes wrong in systems goes wrong because of biased, late, or missing information.”

Decision Science as the answer

Think of decision science as a way to create a unique preference model - far better than any model offered by data science. It is not really a fair fight. Think of data science as one of the blind mice trying to learn about the elephant by a single part of that elephant. Decision science is the whole elephant.


In economics, aggregate demand is a summation of all the individuals' utility for a good or service. An individual's utility consists of a set of multiple preferences about a good or service. Your preferences are a weighted collection of "what is important to me or us" about buying something. That something could be on Amazon, Netflix, or a loan to buy a house.


What economists do not tell you is that understanding our own utility is very challenging! Much of the challenge concerns how our brains operate and some of our decision quirks, called "cognitive biases."


The challenge with data science is that no model could possibly have enough data to make an accurate prediction consistently. On the flip side, if a model has too much latent data, the model runs the risk of overfitting to a unique situation that does not persist in the future. You are unique and your preferences regularly change based on situational framing. Data looks to the past. Humans look to the future.


Decision science saves the day by providing tools to the decision-making team or individuals to easily and quickly develop their utility model. Decision science provides the tools to help you make a confidence-inspiring decision, grounded in the "what is important to me or us" model. These tools are easily updated as the situation changes.


Data science models can be a helpful input when building the utility model. They can be a tool to help narrow down alternatives or understand risks. Data science models can be beneficial as a precise input to your overarching bias-reducing decision science-enabled process. But data science is only a tool. Decision science has the answer for you to best understand individual or group preferences in the context of making a significant decision. Decision science is essential for making an accurate decision.


Data science can be a tool to enable the best decision science-based process. A decision science-based process is also known as choice architecture. Do not confuse data science with the best decision.

 

Jeff Hulett provided a presentation to JMU's Business Analytics program on March 30, 2023. Thanks to Drs. Raktim Pal and Rhonda Syler for hosting us.





Comments


bottom of page