top of page

Being a data explorer is essential for the data abundance era

Updated: Apr 9

This article is about data, not algorithms.


I offer this disclaimer because data and algorithms are often confused. Data represents our past reality. Algorithms are used to transform data. They are different. Data has already happened. An algorithm is a tool to transform data intended to predict and impact the future. Sometimes that data-transforming algorithm is helpful to you. More often today, that data-transforming algorithm is even more helpful to an organization trying to sell you something - like goods, services, or political candidates. An organization's algorithm may be helpful to you, but it often serves other purposes, including maximizing shareholder profit or filling political party coffers.  Please see the appendix for more context.


About the author: Jeff Hulett is a career banker, data scientist, behavioral economist, and choice architect. Jeff has held banking and consulting leadership roles at Wells Fargo, Citibank, KPMG, and IBM. Today, Jeff is an executive with the Definitive Companies. He teaches personal finance at James Madison University and provides personal finance seminars. Check out his new book -- Making Choices, Making Money: Your Guide to Making Confident Financial Decisions -- at jeffhulett.com.


In my undergraduate personal finance class, part of the curriculum is to help students understand the interaction of data, the power of algorithms, and how to leverage or overcome them with a robust decision process.


From data scarcity to data abundance


In the last half of the 20th century, the world shifted from the industrial era to the information era.  The changing of eras is very subtle.  For those of us who lived through the era change, it is not like there was some official government notice or a “Welcome to the information era” party to usher in the new era.  It just slowly happened – like a “boil the frog” parable - as innovation accelerates and our cultures' adapt.  Era changeovers are very backward-looking.  It is more like a historian observing that so much had changed that they decided to call the late 20th century as when the information era started.

 

This big change requires people to rethink their relationship with data, beliefs, and decision-making.  Prior to the information age, data was scarce.  Our mindset evolved to best handle data scarcity over many millennia.  In just the last few decades, the information age required us to flip our mindset 180 degrees.  Today, the data abundance mindset is necessary for success.  Our genome WILL catch up some day…. Perhaps in a thousand or more years as evolution does its' inevitable job.  Until then, we need to train our brains to handle data abundance.  The objective of this article is to make the case to best handle data abundance.  Cognitive gaps, such as that created by the difference between our data scarcity-based genome and our data abundance-expected culture have only accelerated during the information era.


In the industrial era, computing power was needed and not yet as available.  As a result, math education taught people to do the work of computers.  In many ways, people were the gap fillers for furnishing society's increasing computational needs. Our education system trained people to provide the needed computational power before powerful computers and incredible data bandwidth became available.


Over time, digital data storage has been increasing.  However, even during the industrial era, those data stores still took effort to locate.  Data was often only available to those with a need to know or those willing to provide payment for access.  The average person during the industrial era did not regularly interact with data outside that observed in their local, analog life.


The information era is different. Today, powerful computers exist and computing power is both ubiquitous and inexpensive. Digital data stores are no longer like islands with vast oceans around them for protection. Data stores are now among easy-to-access cloud networks. Also, many consumers are willing to trade personal data for some financial gain or entertainment. While this attitude is subject to change, this trade seems to be working for both the consumers and those companies providing the incentives. Data abundance is the defining characteristic of today's information era. Success comes from understanding your essential data and leveraging that data with available computing technology.

dopamine trade

See: A Content Creator Investment Thesis - How Disruption, AI, and Growth Create Opportunity  This article provides background for why people are willing to give up their data to the social media platforms.


For most people, today's challenge is less about learning to do the work of a computer. Today's challenge concerns using abundant data and leveraging technology to serve human-centered decisions. Our formal math education systems have been slow to change and tend to favor former industrial era-based computation needs over information era-based data usage. [i] This is unfortunate but only emphasizes the need to build and practice your statistical understanding even if you did not learn it in your formal education.


The big change – From data scarcity to data abundance


Data scarcity was when the most challenging part of a decision was collecting data.  The data was difficult to track down.  It was like people were data foragers, where they filled a basket with a few pieces of difficult-to-obtain data they needed for a decision.  Since there was not much data, it was relatively easy to weigh and decide once the data was located.


Data abundance has changed our relationship with data 180 degrees in just the last few decades.  Consider your smartphone.  It is like the end of a data firehose.  Once the smartphone is opened, potentially millions of pieces of data come spewing out.  Plus, it is not just smartphones, data is everywhere. But it is not just the volume of data, it is the motivation of the data-focused firms. The data usage has a purpose and that purpose is probably not your welfare.


"The best minds of my generation are thinking about how to make people click ads. That sucks." - Jeff Hammerbacher, a former Facebook data leader.




The challenge is no longer foraging for data.  Our neurobiology, as tuned by evolution, is still calibrated to the data scarcity world.  It is like no one told our brains that how we make decisions is dramatically different today. The challenge is now being clear about which of the overwhelming flood of data is actually needed.  The challenge is now to curate data, subtract the unneeded data, and use the best decision process.  Unfortunately, the education curriculum often teaches students as if we are still in the data scarcity world.


For a "Go West, Young Man" decision made during the 1800s as compared to a similar decision today, please see the article:


The big change – From data scarcity to data abundance

Our past reality is diverse


Our world can be interpreted through data. After all, data helps people form and update their beliefs. Often, our family of origin and communities help people form their initial beliefs, especially when they are young. This makes statistics the language of interpreting our past reality in the service of updating those beliefs. Like any other language, the language of statistics has grammar rules. Think of statistical moments as the grammar for interpreting our past realities. The better we understand the grammar rules, the better we can:

  1. Learn from our past reality,

  2. Update our beliefs, and

  3. Make confidence-inspired decisions for our future.


'Past reality’ may be a nanosecond ago, which was as long as it took for the light of the present to reach our eyes. Alternatively, ‘past reality’ could be that learned from our distant ancestors. A group of people is known as a population. Populations are mostly described across diverse distributions. While people may share some similarities, we also share incredible uniqueness.


Diversity goes beyond typical characteristics, like gender, race, and eye color. Even more important is our behavior given the uncertainty from:

a) the incomplete and imperfect information impacting most situations,

b) the dynamic, interrelated nature of many situations, and

c) the unseen neurobiological uniqueness we each possess.


This means even the definition of rationality has been redefined. Instead of rationality being robotically assigned to a single point, rationality is better understood through the eyes of the beholder. The same individual is often diverse across different situations because of uncertainty, framing, and anchors. This means the “you” of one situation is often different than the “you” of another situation because of our state of mind at the time the situation is experienced and how situations inevitably differ. Certainly, the different "us" of the same situation are also divergent, owing to individual neurodiversity.

behavioral economics redefined rationality

Our hunt is to understand the population by learning of its past reality. But rarely can data be gathered on the entire population. More often, we must rely on samples to make an inference about the population. 

 

Tricky samples and cognitive bias


Samples can be tricky.  The sample data from others in the population may be challenging to interpret. But even more troublesome, our own brains may play tricks on us. These tricks have grown in significance because of how the information era has evolved. These tricks may lead us to conclude the sample data we used to confirm a belief is representative and appropriate to make an inference. It takes careful inspection to guard against those tricks, called confirmation bias.  Next is a typical decision narrative descriptive of the environment leading to confirmation bias and a less-than-accurate decision:

decision narrative

The challenge is that past outcome is a single observation in the total population. Your sample size of one is likely too small to make a robust inference. To be clear, this does NOT mean your past experience has no decision value... of course it does. However, blindly following our past experiences as a guide to the future may not include other past realities to help inform our decisions.


Robyn Dawes (1936-2010) was a psychology researcher and professor. He formerly taught and researched at the University of Oregon and Carnegie Mellon University. Dr. Dawes said:


"(One should have) a healthy skepticism about 'learning from experience.' In fact, what we often must do is to learn how to avoid learning from experience."

Properly understanding your past reality in the present decision context is doable with the appropriate decision process.  Part of being a good data explorer is using a belief-updating process including a suitable integration of our and others' past reality. A proper decision process helps you avoid confirmation bias and achieve conviction in your decision confidence.

 

Think of confirmation bias as a mental shortcut gone bad.  Most mental shortcuts provide effective or at least neutral heuristic-based signals.  But confirmation bias occurs when a mental shortcut leads us to make a poor decision.  As the next graphic illustrates, confirmation bias occurs when only a subset of evidence is used to make a decision.  While the current set of information may be convenient and apparently confirms a previous belief, the decision-maker ignores a fuller set of data that may be contrary to the existing belief.  This kind of cherry-picking bias leads to a reasoning error called an error of omission.  Errors of omission are tricky because technically the subset of information is not wrong, it is simply incomplete to draw the appropriate conclusion.


A politician example for reasoning errors: Fact-checking is often done to detect incorrect statements of the data the politician provides. A false statement is also known as an error of commission. However, the challenge is not necessarily what the politician said, but what the politician did NOT say. Politicians regularly engage in providing incomplete fact sets. Errors of omission are a) different than their error or commission cousins and b) generally tolerated or not detected by the public. Politicians regularly and conveniently leave out data - an error of omission - when trying to sell a particular policy or campaign plank.


Could you imagine a politician saying, “Here are all the reasons why this is a great policy decision! But wait! Here are several other reasons that may make this policy decision risky and potentially not effective. There are many tradeoffs. The chance of success depends greatly on the complex and unknowable future!” A politician who honestly presented all the facts and tradeoffs necessary to make a great decision would likely struggle to get elected. Political theater and a complete rendering of complex policy decisions are very different.


Bertand Russell (1872-1970) - the late, great mathematician and philosopher's timeless aphorism reminds of the politician's reasoning challenge:

"The whole problem with the world is that fools and fanatics are always so certain of themselves, and wiser people so full of doubts."

 

confirmation bias

Being on the lookout for confirmation bias is essential for the successful data explorer. Confirmation bias is a type of cognitive trick called cognitive bias.  All people are subject to cognitive biases.  Mental shortcuts, also known as heuristics, and their related cognitive bias cousins are a feature of the human species and something we all share. 

 

The biggest challenge of our cognitive biases is that they come from the emotional part of our brain lacking language. [iii]   This means that other than vague feelings, we have no signal to warn us when we are under the spell of a cognitive bias.  In the last typical decision narrative, the pain or joy of those outcomes was remembered. The challenge is that those emotions have no weight as an input to the current decision.  Also, that feeling has no way to integrate with all the other data you need to make the best decision.  Confirmation bias is when we do not weigh the emotional signal correctly. Inaccurate weighting goes both ways — one may be under-confident or over-confident when interpreting emotion-based data. 

 

In order to learn and infer from our past reality, one must either have a) an unbiased sample or b) at least understand the bias so inferential corrections can be made.  Statistics help us use a wider set of data and properly integrate our own experience. This is in the service of taking a less biased, outside-in view to better understand our data. 

Helpful fast-brain heuristics often include inaccurate cognitive biases

Please see the following VidCast for more information on how confirmation bias leads to reasoning errors. This VidCast shows the slippery slope of how confirmation bias may devolve to cancel culture and allowing others to determine an individual’s self-worth. Political leaders may aspire to this level of followership. Social Media echo chambers are a hotbed for confirmation bias and cancel culture. 


 

Being Bayesian and the statistical moments' map


Belief updating is a challenge in today’s information-overloaded world. Being a successful data explorer often requires us to actively manage our cognitive biases by curating and refining valid data and subtracting the data that is irrelevant or wrong. 

 

Please see the following article for an example of using Bayesian inference to make a job change decision. Bayesian inference is a time-tested belief-updating approach VERY relevant to today’s world.  Bayesian inference enables us to make good decisions by understanding our priors and appropriately using new information to update our beliefs.  Bayesian inference helps us use our good judgment and overcome our cognitive biases. In this article, the Definitive Choice app is presented to implement a Bayesian approach to your day-to-day decision-making.

 


For an example of using Bayesian inference to help make a decision after a scary terrorist attack, please see the article:



Hopefully, I have successfully made the case for why being a data explorer is a) important, b) tricky to manage, and c) needing of a statistical understanding and a robust decision process to appropriately manage. The rest of the article is an intuitive primer for a core descriptive statistics framework called statistical moments. We will start by placing the statistical moments in the context of scientific inquiry. Mathematician William Byers defines science as a continuum. [ii] At one extreme is the science of certainty and the other extreme is the science of wonder.  The statistical moments' grammar rules fall along the science continuum.  At the left end of the continuum, the initial statistical moments describe a more certain world.  As we go along the continuum from left to right, risk and variability enter the world picture.  Then, uncertainty and unknowable fat tails give way to wonder.

How statistical moments maps to science

Just like grammar rules for language, statistical moments are essential for understanding and capturing the benefits accrued from our past reality. And, just like grammar rules for language, statistical moments take practice. This practice leads to the effective understanding of our past reality and for statistical moments to become a permanent feature for your information-era success. Data, as representing our past reality, contains nuance and exceptions adding context to that historical understanding. Also, there are even more grammar rules that help guide us in more unique circumstances. Building statistical intuition is your superpower in the Information Age.


Please see the following article to step deeper into being a data explorer. This article explores the statistical moments, proceeding from the science of certainty and concluding with the science of wonder.



Appendix - How well are algorithms aligned to you?


This appendix supports the "This article is about data, not algorithms" disclaimer found at the end of the introduction.


Generally, public companies have 4 major stakeholders or "bosses to please" and you - the customer - are only one of the bosses. Those stakeholders are:

  1. The shareholders,

  2. The customers (YOU),

  3. The employees, and

  4. The communities in which they work and serve.


Company management makes trade-off decisions to please the unique needs of these stakeholder groups. In general, available capital for these stakeholders is a zero-sum game. For example, if you give an employee a raise, these are funds that could have gone to shareholder profit or one of the other stakeholders.


This means the unweighted organizational investment and attention for your customer benefit is one in four or 25%. The customer weight could certainly be below 25%, especially during earnings season. Objectively, given the competing interests and tradeoffs, this means a commercial organization's algorithms are not explicitly aligned with customer welfare. Often, the organization's misaligned algorithm behavior is obscured from view. This obscuring is often facilitated by the organization's marketing department. Why do you think Amazon's brand image is a happy smiley face :) For more context on large consumer brands and their use of algorithms please see the next article's section 5 called "Big consumer brands provide choice architecture designed for their own self-interests."



This article’s focus on data will help you make algorithms useful to you and identify those algorithms and organizations that are not as helpful. Understanding your data in the service of an effective decision process is the starting point for making data and algorithms useful.

While this article is focused on the data, please see the next article links for more context on algorithms:


An approach to determine algorithm and organizational alignment in the Information Age:



How credit and lending use color-blind algorithms but accelerate systemic bias found in the data:


 

Notes and a word about citations


Citations:  There are many, many references supporting this article. Truly, the author stands on the shoulders of giants! This article is a summarization of the author's earlier articles. Many of the citations for this article are found in the linked supporting articles provided throughout.


[i] The challenge of how high school math is taught in the information age is well known. The good news is that it is recognized that the traditional, industrial age-based high school "math sandwich" of algebra, geometry, trigonometry, and calculus is not as relevant as it used to be. Whereas information age-based data science and statistics have dramatically increased in relevance and necessity. The curriculum debate comes down to purpose and weight.


Purpose: If the purpose of high school is to a) prepare students for entrance to prestigious colleges requiring the math sandwich, then the math sandwich may be more relevant. If the purpose of high school is to b) provide general mathematical intuition to be successful in the information age, then the math sandwich is much less relevant. I argue the purpose of high school for students should be b, with perhaps an option to add a for a small minority of students. Also, it is not clear whether going beyond a should be taught in high school or be part of the general college education curriculum or other post-secondary curriculum. Today, the math sandwich curriculum alone lacks relevance for most high schoolers. As many educators appreciate, anything that lacks relevance will likely lead to not learning it.


Weight: Certainly, the basics of math are necessary to be successful in statistics or data science. To be successful in b) one must have a grounding in a). The reality is, high school has a fixed 8-semester time limit. Which, by the way, education entrepreneurs like Sal Khan of Khan Academy argue against tying mastery to a fixed time period. But, for now, let's assume the 'tyranny of the semester' must be obeyed. As such, the courses that are taught must be weighed within the fixed time budget. Then, the practical question is this: "If statistics and data science become required in high school, which course comes out?" I suggest the math sandwich curriculum get condensed to 4 to 5 semesters, with the information age curriculum being emphasized in 3 to 4 semesters.


The tyranny of the semester can be overcome with education platforms like Kahn Academy. Since the high school math curriculum increasingly lacks relevance, an enterprising learner or their family can take matters into their own hands. Use Kahn Academy outside of regular class to learn the data science and statistics-related classes you actually need to be successful in the information era.



[iii] See Our Brain Model to explore 1) the parts of the brain lacking language, called the fast brain, and 2) people’s abilities to see through the big block. 

 

1)  The Fast Brain:  The human ability to quickly process information through our emotions is anchored in the right hemispheric attention center of our brain.  Please see the “The high emotion tag & low language case” for an example.

 

2)  The Big Block:  The human ability to forecast the future based on past inputs is anchored in the left hemispheric attention center of our brain.  Please see the “The low emotion tag & high language case” for an example.

 

Hulett, Our Brain Model, The Curiosity Vine, 2020


Comments


bottom of page