Updated: Sep 24
In the world of lending, algorithms drive much of the credit decisoning. This is particularly true when lending to people, like Mortgage and Consumer Lending products. While policy experts set policy cutoffs, it is credit algorithms that perform much of the credit assessment. The most popular algorithm is called the classic FICO credit score.
This article is focused on resolving cultural biases that may exist within traditional credit data and the systems that render and manage the credit data. We take a systems-level approach. Our belief is the path to significant and lasting change is by addressing system rules and goals. (1) Our implicit system goal is that all people, regardless of past inequities, should receive an unbiased credit assessment.
Jeff Hulett and several collaborators authored this article. Jeff’s career includes Financial Services and Consulting related leadership roles in the decision sciences, behavioral modeling, loan operations management, and risk and compliance. Jeff has advanced degrees in Mathematics, Finance, and Economics. Jeff’s journey in the algorithm-enabled financial services world Is summarized in the appendix.
A color-blind algorithm
There is good news from a lending bias standpoint. Since the algorithm is "color-blind," it will not provide a decision recommendation other than that rendered by the scorecard. (Please see note (2) for a definition of “protected class” and “color-blind”). The data driving the scorecard is from large, historical U.S. credit data sets provided by one or more of the primary credit bureaus. (Please see note (3) - aka “CRA” or “Credit Reporting Agencies”) The data is professionally maintained and governed via regulations like the Federal Credit Reporting Act (FCRA). By design, the scorecard is the algorithm's instructions to calculate the credit score from our individual credit data. For its massive size and notwithstanding periodic data breaches, the data from the three largest credit bureaus is well regarded as one of the highest quality and deepest data repositories in the world. On its face, credit bureau data seems "color-blind" as they are limited by Fair Lending regulation to only warehouse attributes associated with credit risk. For example, typical credit attributes, also known as "tradeline data" are
the number of times delinquent
the number of credit inquiries,
date open / closed,
payment, etc. (4)
No data explicitly identifies legally protected classes. (5)
The use of more sophisticated credit algorithms is increasing. Credit modeling companies like Fair Isaac Corporation (maker of the FICO Score), and banks with a permissible purpose to the credit data regularly test Artificial Intelligence and Machine Learning-based algorithms to improve the credit risk separating ability of the existing algorithms. Because of Fair Lending laws and customer reporting laws like the Equal Credit Opportunity Act (ECOA) (6), banks have been slow to move away from the more standard, transparent linear-based modeling techniques, such as the Classic FICO score. But, competitive pressures are driving banks to use more sophisticated models to improve the algorithm’s credit risk separating ability. (7)
1) The algorithms, by their nature, are color-blind and regularly used to make individual credit assessments;
2) There is competitive pressure to use even more sophisticated, but less transparent algorithms to further improve the quality of credit assessments.
All good news, right? Unfortunately, there is a fly in the ointment that enables bias and reduces credit assessment accuracy.
While the algorithms are color-blind, there is a significant body of literature suggesting the data utilized to train the algorithms may be structurally (aka: systemically) biased. (8) So, how can an algorithm not be biased but the data is biased? Ultimately, algorithms are just a tool. A very sophisticated tool but still just a tool. Data, on the other hand, is a representation of past reality. For example, we know the U.S. has a history of racism. This is certainly not a secret and persists at some level today. (9) So, since the data used for credit algorithms is from our recent past and our recent past contains racism, then it follows the data itself may be structurally biased.
Please note: Structural bias is at the system level, not necessarily the individual level. So, it is quite possible, the individual lending participants are not biased, but the system in which they participate contains structural bias as part of the system's rules, habits, and outcomes.
Structural bias may persist although banks and non-bank lenders follow Fair Lending, HMDA (Home Mortgage Disclosure Act), FCRA, and related laws. The operative questions relate to the direction of the causality arrow connected to poor credit behavior.
1) Did the poor credit behavior of a protected class citizen cause their low credit score?
2) Was the poor credit behavior of a protected class citizen caused by structural bias and a lack of opportunity?
Causality is really important because the validity of credit algorithms and associated credit assessments depend upon the “A->B” causal direction from statement 1.
Then, the questions remain:
To what degree does structural bias exist in the data used for credit algorithms?
What if the causality arrow goes in both directions? What are its impacts on the credit assessment?
How do we know the degree to which bias impacts credit decisions?
Should lenders use different data or algorithms?
If the large, government-based buyers and insurers of mortgages* all require credit algorithms exposed to structural bias, how much influence do lenders even have to use different data or algorithms? * Collectively these organizations include the Government Sponsored Enterprises and Federal Loan Insurers (GSE / FLI - like Freddie Mac, Fannie Mae, Ginnie Mae, FHA, VA, and USDA)
The following proposals address these questions.
Proposals to address Lending Data Bias
Proposal 1: Create a bias-free level data playing field
We can address most of these questions with a simple hypothesis, test that hypothesis, and utilize the existing credit backstop from the U.S. government.
The simple hypothesis is this: white people, especially white men, on average have historically received more opportunities than other protected classes. This advantage, while possibly less today, still exists today. This proposal proactively tests this hypothesis and provides a solution based on the test outcome.
The proposal is to redevelop the predominant credit-granting algorithm, the classic FICO score. It will be independently redeveloped using two distinct data sets based on FHA-defined protected classes. One data set will contain no protected classes ("NPC" - only white men) and the other will contain 1 or more protected classes ("PC" - people of color, women, etc) The data will be sourced from the Credit Bureaus, HMDA (10) data, and census data. In effect, there will now be 2 comparable FICO scores, an "unprotected class FICO" and a "protected class FICO." Upon the completion of the new FICO scores, then, apply them to a performance validation data set. If both the NPC and PC FICO scores are identical in terms of separating risk and the credit assessment decision, then there is no bias. Back to the hypothesis, a “no bias” result invalidates the hypothesis and this proposal becomes unnecessary. However, if the result shows lower credit granting and/or higher credit loss attributable to the protected class FICO, then there is structural bias.
As a matter of policy, the GSE/FLIs could compel all lenders to only utilize the protected class FICO scorecard for all credit decisions, regardless of the applicant's protected class status. This would facilitate a “de-biased” decision because the scorecard attributes are tuned to those that have not received a social advantage. In effect, it would level the credit granting playing field with credit algorithms trained via data sets associated with the same (lower) social advantage.
The costs of credit losses associated with these programs are generally borne by the borrowers in the form of mortgage insurance premiums (MIP) or an interest rate pass-through associated with loan guarantee fees (G-Fees). As such, there should be little effect on loan salability in the secondary markets. This approach will accelerate the Federal Government’s and the GSE / LFI's social and fairness mandates.
Similarly, banks could also use the protected class FICO for their portfolio credit decisions. This gets a little trickier because the bank may end up with higher credit losses than if they used the traditional credit score. This could be managed similarly to how the FDIC loss share mechanism is used for banks that purchase failed banks. (11) The banks could make a claim to the U.S. Treasury on the higher credit loss and associated management costs by determining which loss loans would have been declined originally if they had used the traditional score. This approach holds the bank harmless for a social program to eliminate lending data bias.
Proposal 2: Reduce data bias with non-traditional credit data
This proposal seeks to increase and improve the data available for credit algorithms. The incremental data is intended to be appropriately representative of all U.S. citizens, not just those traditionally “banked.” The idea is to utilize valid, but non-traditional "non-bank payment" sources of credit performance information. This may include rental payment data, utility payment data, and smartphone app payment data (like Paypal, Venmo, or Zelle). The idea is that bias will be reduced by including a more fulsome representation of individual payment behaviors in the CRA data repositories. Also, an important part of the proposal is to leverage the existing CRA data repository infrastructure. This will ensure data quality, including customer dispute mechanisms.
What’s in our wallet? A larger % of protected class members make payments outside the CRA bank payments system.
Using non-traditional data for algorithm development is not a new idea. What's new is focusing on system incentive alignment and existing CRA infrastructure, instead of "pushing on the string" of misaligned incentives and lower quality data. (12) - Please see this note for a discussion concerning the importance of incentives.
The idea is to include verified, FCRA compliant non-bank payment data in the CRAs and on consumer bureau reports as separate tradelines. The incentive includes providing non-bank payment companies access to all CRA data and CRA-enabled information products in exchange for the inclusion of verified non-bank payment data, subject to FCRA compliant dispute validation. This will enable non-bank payment data to be included in the CRA data process and CRA-enabled data products. (like FICO Score)
We are assuming non-bank payment data providers have an economic incentive to exchange verified non-bank data for access to the full CRA library of data, including traditional bank data and non-bank payment data. CRAs have a high-quality, secure, and scalable process to include verified, FCRA compliant non-bank payment data. FCRA may either be interpreted or amended to expand “permissible purpose” requirements to include non-bank payment providers. New permissible purpose rules will enable non-bank payment companies to utilize credit data to make credit-related decisions and to evaluate current customers' credit risk and credit product usage likelihood.
This could be done by expanding the Metro 2 (13) input file to include non-bank payment data. Expand dispute processing to include non-bank payment companies. Develop training and related CRA information through non-bank payment industry groups. Include non-bank payment data in CRA-enabled information products (like the FICO Score).
Finally, think of proposal 1 as an initial step. Proposal 1 serves as an approach for reducing the current impact of data-related bias specific used for algorithm training. Also, it helps us understand the degree to which structural bias is resident in the data.
Proposal 2 is more of a long-term fix, to ensure all borrower classes, protected or otherwise, have exposure to well-trained algorithms. These credit decision rendering algorithms should be trained by high quality, professionally managed, and unbiased data.
Also, proposal 1 will serve as a feedback mechanism as to the effectiveness of proposal 2. That is, as proposal 2 is implemented, we will know how it is working based on how far the proposal 1 bias hypothesis is from becoming invalid.
The focus on credit data is the essence of reducing lending bias. If we wish to reduce bias in lending, getting the data right is critical to enabling algorithmic success.
An algorithm is just a tool. Data is a representation of past reality. Don’t blame the tool for our past mistakes…. fix the mistakes as found in our past reality.
By definition, the job of the algorithm is to “discriminate” or separate data by some dependent variable. The question becomes that of the data itself.
Was there bias in the formation of the data set? Is past “bad” (racial) discrimination hidden within the “good” (credit) discrimination data?
If we get the data correct, the algos will do their job with a more accurate credit decision.
If we get the data wrong, the algos will still do their job but with a less accurate decision.
It is easy to confuse the precision of an algo with the data-impacted accuracy of a credit decision….sadly they are not always aligned. We are proposing to align the precision of the credit scoring algorithms with the accuracy of our data...and provide the means to resolve our past mistakes.
About Jeff Hulett
“I started in banking over 30 years, my first banking experience was in a bank branch as a loan officer. We were located in predominantly black neighborhoods of Richmond, Virginia. I made loans to many folks from the local community. It was very old school. People came to the branch to apply in person. I pulled credit reports on fax-like paper. Our credit decisions were purely judgmental.
Since then, I have been part of a massive automation and decision sciences led banking revolution. My roles evolved. I went on to develop statistical models, implement algorithm enabled decision platforms, lead enterprise algorithm integrated risk management, and lead bank lending divisions. I have also worked for large banking and decision sciences related consulting and software firms. I feel fortunate to participate in the Artificial Intelligence/Machine Learning revolution from the very beginning. I also appreciate my foundational lending experiences in Richmond.
From a social standpoint, I have come to appreciate our focus on credit algorithms is both helpful and high risk. Helpful, because a credit algo only judges the current applicant based on their ability to repay a loan. High risk because racism and other “isms” still exist and related bias maybe subtly present in the data used to train our models. As algorithms become more powerful, the importance of getting the data right increases substantially. Structural bias is difficult to eliminate and must be managed proactively and systematically.”
(1) The late Harvard and MIT trained scientist and systems researcher Donella Meadows said:
“…. most of what goes wrong in systems goes wrong because of biased, late, or missing information.”
(2) By "color-blind," I'm referring to the lending protections afforded “protected classes” as defined by the Fair Housing Act. Protected classes include Race, Color, National Origin, Religion, Sex, Familial Status, and Disability.
(4) For a nice primer on credit bureau data and usage, please see the Consumer Financial Protection Bureau’s (CFPB) whitepaper Key Dimensions and Processes in the U.S. Credit Reporting System
(5) Fair Lending laws makes it unlawful for creditors to commit "Disparate Treatment" of protected classes.
(6) The ECOA, as implemented by Regulation B, makes it unlawful for “any creditor to discriminate against any applicant concerning any aspect of a credit transaction (1) on the basis of race, color, religion, national origin, sex or marital status, or age (provided the applicant has the capacity to contract); (2) because all or part of the applicant’s income derives from any public assistance program; or (3) because the applicant has in good faith exercised any right under the Consumer Credit Protection Act.”
As on-point for this article, the ECOA requires lenders to provide applicants an "Adverse Action Notice" in the event they are declined for credit. This is a notice that explains the cause for the loan decline decision. In the linear-based classic FICO score world, decline reasons are straightforward. The weighting of the FICO scorecard itself will render a prioritized and individual decline reason. In the non-linear Machine Learning/Artificial Intelligence world, decline reasons are not easy to provide. By the very nature of these newer algorithms, causality is not a natural by-product of the machine learning process. Some "workaround" techniques have been proposed, like Shapley Values and other techniques, but consistent decline reasons continue to be a challenge.
(7) Another reason why more sophisticated credit assessment techniques have not accelerated as quickly has to do with the robustness of the historical modeling approach. The FICO models, sold by Fair Isaac Corporation, have been in use for about 25 years. As such, much of the nonlinear, dynamic signals that are captured today with nonlinear-based automated Machine Learning techniques have already been captured within the linear-based traditional FICO models. This was done with "Machine Teaching" via years of nonlinear variable transformation-based human learning and model updating. Please see this article for more information: Can Machine Learning Build a Better FICO Score?
(8) Structural / Systemic racism and credit data related sources include:
Blattner, L. and Nelson, S., How Costly is Noise? Data and Disparities in Consumer Credit
The following is an analogy I find helpful to understand structural bias:
Imagine there are 2 people with similar fitness goals, they want to be fit and are willing to work out, eat healthily, and generally live a healthy lifestyle. They were both born with similar bodies and their bodies respond similarly to fitness. The first person, Sheila, has access to excellent equipment, information about healthy workouts, and an encouraging community. The second person, Liam, has little access to equipment, little information about healthy workouts, and a health indifferent community.
While Sheila and Liam share similar desires and physical proclivity regarding health (endogenous factors), the healthy road will be much tougher for Liam. In fact, on average, people that have the same structural environment (exogenous factors) as Liam tend to lead a less healthy life and with more health problems. THIS IS AN EXAMPLE OF STRUCTURAL BIAS. In this example, it is a function of opportunity availability that drives healthy outcomes, not the individual characteristics of Sheila or Liam.
"Throughout this country’s history, the hallmarks of American democracy – opportunity, freedom, and prosperity – have been largely reserved for white people through the intentional exclusion and oppression of people of color."
(10) The Home Mortgage Disclosure Act (HMDA) requires lenders to collect loan-level protected class data.
(11) The FDIC primarily uses the Loss Share mechanism during times of economic crisis, like the bank failures following the 2007-08 Financial Crisis.
(12) Why are aligned incentives important?
In our view, aligned incentives are at the heart of the success of the current CRA-based system. Aligned incentives enable the CRAs to successfully a) collect banking data, b) utilize that data to build information products like scoring models (like the FICO score used to rate consumer creditworthiness), and c) provide "permissible purpose" (as defined by FCRA) information to banks for credit decisions, portfolio monitoring, and marketing.
Aligned incentives mean the banks are motivated to provide data to the CRAs because they generally get more out of it (ability to evaluate creditworthiness) than the cost of validating and submitting the data (operational cost and competitive cost). Generally, it would not make sense for a bank to go it alone. Meaning, for a bank to access the large library of CRA credit data and CRA-enabled information products, they must provide existing customer data as a trade.
(13) If a company furnishes consumer credit account data regularly to credit reporting agencies, they have duties under the Fair Credit Reporting Act (FCRA) to correct and update that consumer credit history information.
To assist data furnishers (such as banks, credit unions, consumer credit card companies, retailers, and auto finance companies) in this process, the credit reporting industry has adopted a standard electronic data reporting format called the Metro 2® Format.