
(Please see the attached document for ML and AI trends in Banking and Credit)
Background
The math and the core tools of Machine Learning have not changed significantly over time.
Rene Descartes lived around 400 years ago and theorized the linear equations related to matrix algebra and common regression techniques. Linear regression is a bedrock modeling approach used in bank demand and credit risk algorithms.
Thomas Bayes lived over 250 years ago and Bayesian math-based conditional probability is at the core of “next product” algorithms.
Neural Networks are the “babies” of only about 30 years old.
What has changed is:
⁃Ability to handle larger data sets
⁃Speed of processing
⁃Automation of quantitative processes that used to be handled manually by the analysts.
⁃Use of Python to speed/simplify the implementation of new models
AI / ML maturity model:
Manual learning => Supervised Learning => Unsupervised Learning
Manual – people (data scientists) perform the machine learning tasks and manually teach (augment) the algorithms to implement. Data scientists primarily focus on functional shape and data relationships. They also apply various algorithm techniques.
Supervised - Algorithms apply what has been learned in the past to predict the outcome of new data. Common in credit risk modeling with more accurate results than methodologies such as logistic or linear regression.
Unsupervised - Algorithms draw inferences from or look for patterns in datasets. These are more advanced forms of AI and their use in credit risk is limited.
Best Practices
Best Practices in the Data Preparation Stage
Completely understand the project goal (note: This includes a deep dive/documentation of business and technical assumptions)
Collect all fields that are relevant
Maintain consistency of field values
Deal with missing data
Best Practices In The Training Sets Generation Stage
Determine categorical features with numerical values
Decide on whether or not to encode categorical features
Decide on whether or not to reduce dimensionality and if so how
Likely improving performance as prediction models will learn from data with less redundant or correlated features (Note: Very important. PCA and other variable transformation approaches can help )
Decide on whether or not to scale features
Perform feature engineering with domain expertise
Perform feature engineering without domain expertise
Document how each feature is generated (note: This is critical! Relates to learning as an asset. Best Practices In The Model Training, Evaluation, and Selection )
Best Practices In The Model Training, Evaluation, and Selection Stage
Choose the right algorithm(s) to start with
Naive Bayes
Logistic regression
SVM
Random forest (or decision tree)
Neural networks (Reduce overfitting, Diagnose overfitting and underfitting)
Best Practices In The Deployment And Monitoring Stage
save, load, and reuse models
Update Models Regularly
Source: Steve Blair’s book, Python Machine Learning for Beginners
Comentários