top of page

Top 10 suggestions for growing a data science career



Data science can mean many things to many people.  Next is a "day-in-the-life" insight from Data Scientist Joel Grus.  It gives you an idea of the potential and varied activities of a data scientist.

 

“a data scientist should be able to run a regression, write a sql query, scrape a website, design an experiment, factor matrices, use a data frame, pretend to understand deep learning, steal from the d3 gallery, argue r versus python, think in mapreduce, update a prior, build a dashboard, clean up messy data, test a hypothesis, talk to a businessperson, script a shell, code on a whiteboard, hack a p-value, machine-learn a model. specialization is for engineers.”

 

However, these insights are related to technical skills and practice. There is more to it.  I led data science-related organizations for over 2 decades.  Whether fresh out of school or already in your career, the following suggestions go beyond the technical skills, they relate more to behaviors and attitudes of being successful in a data science career.

 

I hope this helps!


- Jeff Hulett

 


Go beyond ”The Matrix" data view of your customers. All data scientists and related should have some kind of regular customer interaction. This helps make the messy world of emotions and behavior real for the data scientist. This also helps the data scientist understand the sales and operational realities their teammates are facing.







This could be an insight from existing data or it could be new and unique data. Unique customer knowledge will help separate your company from the competition.






Data Scientists often do not like data digging. Data digging is code for the messy ETL-related data processes needed for less structured data sets. It can be grinding work. I call this the "meta metadata." That is building the story behind the data dictionary. It can be time-consuming and take away from primary data analysis. While I hope Data Scientists spend the majority of their time analyzing,

some data digging can be both instructive and can lead to a "digging for gold" outcome by finding unique competitive insights.




New analytical techniques are not always better than "tried and true" techniques like Regression and Decision Trees. However, we always learn something new and useful in the process, beyond the fact that new AI techniques were not always effective. It is worth the exploration, just not for the reasons you may expect.





Commit the resources for proper test execution. Testing systems may include: - Testing program guides, - Coding to differentiate test and control groups, - Collecting performance results, - Scripting for agents or customers, - Availability of characteristic data and related testing information. - Analytical resources to analyze and

provide post-test results and recommendations.




This means answering the question: "Assuming success, how will this test be rolled out and scaled in our base business?" Unfortunately, I know of too many tests providing promising results - but failed to keep that promise because of a failure to scale. Take the time to plan for future success. Hoping that "If you build it, they will come" only works in the movies!





RCTs are necessary to drive confidence in the causal nature of your results. It will also help business leaders understand the value of testing and your organization's analytical work. Often, a small, but statistically significant, percentage test gain will lead to a significant bottom-line improvement. By the way, not every test is suitable for RCT. If you test without a control group, be very explicit about what you hope to learn

and potential learning limitations.  Natural experiments are challenging to construct but may be necessary.




AI, Machine Learning, Natural Language Processing, and neural network results have become increasingly difficult to understand. They are literally "black boxes," even to the data scientists. Just because a model result is directionally predictive and with low error signals, does not mean it is stable or properly fit. Be determined to understand why. Feature engineering of independent variables can be hugely insightful. Ask for Shapley Values or other techniques to reverse engineer the model result cause. Business and behavioral intuition is your best friend and an investment in your career.




There is a continuum of company and industry needs for causality.  At one end – the high causality side – these industries have regulatory requirements, customer demands, or incentives to understand why.  My home industry of banking is a great example,  Reg B (ECOA) requires banks to tell loan applicant why when their loan gets declined.  Saying “the black box made me!” is not a legitimate answer.  So

banking and others need appropriate modeling and AI techniques.  Consumer goods are on the low causality side.  If Amazon provides a list of products based on a customer’s search, and that list is

unique to each customer based on a black box neural network optimized for

Amazon’s profitability, no one seems to care!





With today's information technology, data science organizations can be useful for many products or services companies.  Keep your eyes open for opportunities to grow your career.

 




For more insight on building a data science organization, please see:



For more data science and statistical insights, please see the article:



This linked article tells the data story via the time-tested statistical moments framework. There are many links and citations you may find interesting.



About the author: Jeff Hulett is a career banker, data scientist, behavioral economist, and choice architect. Jeff has held banking and consulting leadership roles at Wells Fargo, Citibank, KPMG, and IBM. Today, Jeff is an executive with the Definitive Companies. He teaches personal finance at James Madison University and provides personal finance seminars. Check out his new book -- Making Choices, Making Money: Your Guide to Making Confident Financial Decisions -- at jeffhulett.com.


Comments


bottom of page