Building data availability and lessons from an iconic T.V. show
The following is part of our article series, Every bank needs a nudge....
The T.V. sitcom M*A*S*H ran in the 1970s and was a funny show about an Army hospital unit in the Korean War. (In case you are wondering, I only watched the re-runs....) What could data science possibly have in common with this T.V. show? Turns out, quite a bit. But first, let me build related perspectives on data science in the banking world.
Bank incrementalism impact on data science
Banks become big banks mostly through consolidations. The consolidation catalyst may occur from many sources:
Often a big economic downturn is a cause,
Sometimes law changes are a cause, (Think of the reduced interstate banking restrictions in the 1990s) or,
It may be the regulatory change is a cause, resulting in different scale economies. (Think of the CFPB requirements that kick in at the $10B bank asset size)
The following graphic shows consolidations for some of the biggest U.S. banks from 1990 to 2010. Certainly, a similar consolidation trend exists for most U.S. banks. "Eat or be eaten!" seems to be the manta.
So, what does this mean to data science in banking? In a word, banking suffers from "incrementalism." This occurs for a multitude of reasons, including 1) our human nature to think shorter term (i.e., Recency Bias), 2) SEC registrant required quarterly reporting requirements, and 3) the consolidation norm specific to banking.
In the data science world, data is the raw material enabling analytical success. Access to data is critical. Unfortunately, in the incremental banking context, data can be hard to locate, access, and utilize. This is one of my favorite relevant aphorisms about data:
“Where is the wisdom?
Lost in the knowledge.
Where is the knowledge?
Lost in the information.” - T. S. Eliot
“Where is the information?
Lost in the data.
Where is the data?
Lost in the damn database!” - Joe Celko
Generally, in bigger banks, data can be very silo'ed in different operating groups, different operating systems (aka, systems of record), with various levels of care. Also, because of bank incrementalism, acquired bank systems are not always fully integrated into legacy bank systems. Today, with an increasing focus on information security, data accessibility is generally more restricted and may require special permissions. All this creates friction for the data scientist. Often, doing really interesting data analysis and driving actionable business insight is only about 20% of the data scientist's job, the remaining time is spent wrangling data and other administrative tasks.
So, this is the data scientist's reality. Is it getting better?
Some days, yes --> better data warehousing, API's, or tool access occurs,
Some days, no --> the next wave of bank consolidations or more info security rules occur.
If you are in a data science group, especially groups focused on operational analysis and compliance testing, this reality is likely particularly acute, as you are most closely tied to the operating system's data variability. For example, Compliance Testing, especially specific to customer obligations, requires access to a core system of record data and documents. The gold standard is to directly test the customer's communication media (letters, texts, statements, online, auto agent, etc.) against the regulatory, investor, or related obligations. Because of organizational complexity, separate systems, third-party involvement, infosec requirements, etc; automation-enabled testing of customer media may be very challenging. Please see our article AI and automated testing are changing our workplace for more information.
Building your data science shop like a M*A*S*H unit
A practical solution to enhance data availability may be found in the following metaphor. A M*A*S*H unit runs with a couple of primary operating groups. Those include the expert doctors, nurses, and orderlies that attend to the patients - think of Hawkeye or Margaret Houlihan. Also, the M*A*S*H unit includes leadership, like Colonel Blake or Colonel Potter. Naturally, data science shops also have both experts and leaders (the data scientists and the data science leadership)
So far, so good. But what I see missing most often in data science shops is the single most important factor to make a M*A*S*H unit run. That is, Radar O'Reilly. Radar is not just a company clerk, he is the grease that makes the M*A*S*H unit run. Radar is the one that knows how to get things done, that knows all the Army supply sergeants, and that knows the company clerks at the other Army M*A*S*H units. As such, Radar knows where to get the raw material to ensure the M*A*S*H unit effectively operates. Radar knows his way around the Army and how to work back channels. In the context of data science, Radar knows where the data is, whom to contact to get the data, how to get the metadata/data dictionary/data ontologies, the nuances of the infosec rules, and how to stay ahead of the next big change affecting data availability. To me, asking a data scientist to run down data is like asking a surgeon to procure sutures.
For whatever reason, big bank data science organizations do not seem to hire the Radar O'Reilly types. If I was starting a new data science organization in a big bank, my first hire would be Radar O'Reilly. Sure, I would hire a crack team of data scientists, junior analysts, application engineers, and license Python, R, SAS, RPA / OCR engines, or related tools and storage. But Radar would come first. Since it is hard to analyze something if your data raw material is elusive and regularly at risk.