When Well-Intentioned Incentives Go Bad: A ‘you get what you pay for’ risk management case study

Jeff Hulett
Jan 19, 2023
7 min read

Updated: Jan 23, 2023

How did Wells Wargo lose $3 billion and a sterling reputation? What are the nuances between good risk governance and organizational incentives? What do statistics and quality control have to do with incentives? Statistically-nimble economist and banker Jeff Hulett provides a case study to answer these very important questions!

The following is part of our article series: Companies Need A Nudge: Create a nudge unit flywheel to drive happy customers and business success

Within an organization, sometimes mistakes happen. These mistakes may create organizational risks. As organizations scale, the ability to evaluate the operational quality of every customer interaction becomes a challenge. Quality Control is the means by which organizations test the quality of those customer interactions. This testing often goes by the generic-sounding title such as: "QC testing," "transactional testing," or "transactional file testing." Almost all quality control requirements have some minimum quality threshold, like 95% error-free. That means that 5% of the customer transactions may have had some risk or compliance error. An "Error" group is a sample group of customer transactions that failed a particular transactional risk or compliance test. The "Good" group is the error-free customer transaction sample group from the same test. In the operational world, a wide variety of transactional testing may be needed.

A testing example: With consumer loans, compliance testing against Regulation B (The Equal Credit Opportunity Act) and Regulation Z (The Truth In Lending Act) is common. These will be a series of customer transactional tests to confirm that Reg B and Reg Z rules were followed in the origination or servicing of a loan. The test outcome will be either a pass ("Good") or a test fail ("Error").

"Risk testing" is a subset of a wider set of model-based activities intended to predict an outcome. Examples of other risk testing-like activities include: modeling a trading strategy to detect the most profitable strategy or cancer testing to detect life-threatening medical challenges.

In risk testing, it is important to discern potential type II errors (False Negatives) and type I errors (False Positives). Earlier, we suggested there are only 2 outcomes - "Good" or "Error" groups. In fact, there are actually 4 potential outcomes. The two additional outcomes occur because quality testing uses a model of operational reality and that modeled reality is sometimes wrong or incomplete. The "5% error group" discussed earlier contains the identified errors. If it turns out some of these errors are actually not errors, then this is known as a type I error. (False Positive) This occurs when errors initially identified by the risk testing group are challenged and overturned by business operations. Compared to manual human testing, good automation routines are generally able to appropriately identify errors, reducing the incidence of false positives.

But what if an error actually did occur but was not identified in the risk test? This is very different than the false positive. These are more dangerous errors and are known as type II errors or false negatives. In my experience in large companies, hidden errors occur all the time. Corporate testing regimes are good but not perfect. As we discuss next, incentives may discourage uncovering type II errors. Also, type II errors are likely fodder for costly regulatory actions. In the appendix, we contribute an "Automated customer communication testing example," providing an approach to uncover previously undiscovered errors.

But first, let's discuss the practical organizational incentives associated with this testing approach.

Interpretation: You may think of the "operational reality" dimension as what actually happened regarding past customer interactions. You may think of the "testing outcome" dimension as the modeled estimate of what should have happened. Incentives have a way of impacting both what actually happened AND the testing of what should have happened.

Type I errors seem reasonable. Testing is not perfect and the validation of test errors generally turns up important learning about data, processes, testing routines, etc. However, business operations really dislike Type I errors. If risk management criticizes business operations, operational leaders may perceive false positives as unnecessarily "calling my baby ugly." Also, quality errors are often used as input for incentive compensation**. Thus, individual business participants have incentives to aggressively challenge these errors. They are literally fighting for their money.

[** "Incentive compensation" may be in the form of a direct incentive, like a bonus directly tied to certain quality goals. Also, "incentive compensation" may be an indirect incentive, where bonus compensation includes quality goals as one of a basket of goals evaluated for bonus payout.]

Type II errors are dangerous. These are the unseen errors that may lead to regulatory action or even existential challenges to a company. Type II errors can start small, but left unchecked, may metastasize like cancer. But there is generally little incentive to discover Type II errors. People do not like to spend energy on something they do not feel accountable for! In fact, the business unit has a plausible defense for not checking for false negatives, such as: "Hey, the risk management organization did not even find this error and that is their job! I was too busy taking care of customers and chasing the errors they did find."

Also, there could be an incentive misalignment, causing a disincentive to uncover type II errors. This incentive challenge anatomy looks something like this: Business participants may be paid on quality. That quality measure is generally only based on found errors. In this case, they may be incented to "look the other way" on errors not discovered in the formal risk management testing processes. From a practical standpoint, finding undetected type II errors takes work. Most operating folks are already over capacity and "filling a 5-pound bag with 10 pounds." Just from a work capacity standpoint, the addition of "one more thing" is often unwelcome.

Next, we cite a prototypical example of a type II error that may have started small, but metastasized to become a near-existential enterprise challenge. At the time of this U.S. Department of Justice legal action, Wells Fargo was the largest bank in the United States.

Wells Fargo Agrees to Pay $3 Billion to Resolve Criminal and Civil Investigations into Sales Practices Involving the Opening of Millions of Accounts without Customer Authorization

Those of us that have been involved in responding to similar enforcement actions appreciate that "$3 Billion" is only the headline number. The final cost will be a significant multiple of $3 Billion, when you include employee, consulting, new technology, and other related costs.

The facts and circumstances of this scandal have all the trappings of type II error-based misaligned incentives. In a bank the size of Wells Fargo, there are a large and diffuse group of employees that “should have known” that millions of fake accounts over many years were opened for unsuspecting customers. It may seem unbelievable that the “should have known” bank employees were unable to put a stop to this fake account sales practice. It may seem unbelievable that risk management quality testing was unable to detect the fake account sales practice and escalate the issue to put a stop to it.

But such is the power of incentives. Misaligned incentives can be particularly nefarious in large diffuse organizations, where individual accountability is less clear.

Thus, as a rule of thumb, organizational incentives have a tendency to overemphasize type I errors and underemphasize dangerous type II errors. Next are best practice suggestions for overcoming incentive and testing-borne challenges:

Good testing automation processes help to reduce type I and type II errors.
Being VERY thoughtful about organizational incentives is really important. Misaligned incentives have a way of leading to "you get what you pay for" unintended consequences associated with type II errors.
Create a culture that rewards creative risk thinking. Type II errors are detected by "out of the box" thinking. Leaders should encourage the creative thinking necessary for detecting previously unknown challenges.
Finally, utilizing a structured risk portfolio decision process will optimize your limited risk testing resources across your risk portfolio. Optimized risk resources help to reduce the impact of potential type I and type II errors.

We discuss choice architecture in the following resource section as a means to enhance your risk portfolio decision process.

Resources

Definitive Pro: For corporate and larger organizations - This is an enterprise-level, cloud-based group decision-making platform. Confidence is important in corporate or other professional environments. Certainly, this includes risk planning and prioritizing risk activities. Most major decisions are done in teams. Group dynamics play a critical role in driving confidence-enabled outcomes for those making the decisions and those responsible for implementing the decisions.

Definitive Pro provides a well-structured and configurable choice architecture. This includes integrating and weighing key criteria, overlaying judgment, integrating objective business case and risk information, then providing a means to prioritize and optimize decision recommendations. There are virtually an endless number of uses, just like there are almost an endless number of important decisions. The most popular use cases include M&A, Supplier Risk Management, Risk and audit planning, Technology and Strategic portfolio management, and Capital planning.

Next are a few whitepapers and examples of how to make the best organizational decisions:

Appendix

Automated customer communication testing example

The benefits of automated testing include:

increase compliance testing coverage,
decrease testing costs, and
improve testing quality.

By the way, this example is a composite of actual recent experiences across multiple company departments or divisions.

From a customer and regulator standpoint, customer communication and documents (letters, statements, emails, texts, promissory notes, disclosures, etc) are the customer's "system of record." That is, customer communication and documentation are the ultimate confirmation source that the company has met various regulatory, investor, and other obligations. Because customer communication is often stored as unstructured data, it requires cost-effective automation capabilities to interpret documents, ingest data, and evaluate regulatory obligations. See the following graphic to compare a bank or company to the customer's perspective.

Also, an operational complication could arise if third parties are involved in the creation and transmission process of customer communication and documentation. Given this, the ability to structure data and apply obligation tests is critical for testing the “customer view” and is the essence of compliance automated testing.

In general, automated testing is an updating process as communication, documents, and algorithms are validated. Below are key automation outcome categories, resolutions, and success suggestions depending on the nature of the automated testing outcomes.

For more information, please see our article Making the most of Statistics and Automation. [vii]

Notes

Citations are found in the article: Companies Need A Nudge: Create a nudge unit flywheel to drive happy customers and business success

Stay Curious.

When Well-Intentioned Incentives Go Bad: A ‘you get what you pay for’ risk management case study

Resources

Appendix

Notes

Comments