Machine Learning Best Practices



From Steve Blair’s book, Python Machine Learning for Beginners 

Liked the book, great refresher. Appreciate that the math and the core tools have not changed much. What has, is:

⁃Ability to handle larger data sets

⁃Speed of processing

⁃Automation of quantitative processes that used to be handled manually by the analysts. 

⁃Use of Python to speed / simplify implementation of new models

Manual learning => Supervised Learning => Unsupervised Learning

Best Practices in the Data Preparation Stage 

⁃Completely understand the project goal (note: This includes a deep dive / documentation of business and technical assumptions)

⁃Collect all fields that are relevant

⁃Maintain consistency of field values 

⁃Deal with missing data 

Best Practices In The Training Sets Generation Stage 

⁃Determine categorical features with numerical values 

⁃Decide on whether or not to encode categorical features 

⁃Decide on whether or not to reduce dimensionality and if so how

⁃Likely improving performance as prediction models will learn from data with less redundant or correlated features (Note: Very important. PCA and other variable transformation approaches can help )

⁃Decide on whether or not to scale features 

⁃Perform feature engineering with domain expertise 

⁃Perform feature engineering without domain expertise 

⁃Document how each feature is generated  (note: This is critical! Relates to learning as an asset. Best Practices In The Model Training, Evaluation, and Selection )

Best Practices In The Model Training, Evaluation, and Selection Stage 

⁃Choose the right algorithm(s) to start with 

⁃- Naive Bayes

⁃- Logistic regression

⁃- SVM 

⁃- Random forest (or decision tree) 

⁃- Neural networks (Reduce overfitting, Diagnose overfitting and underfitting)

Best Practices In The Deployment And Monitoring Stage 

⁃save, load, and reuse models 

⁃Update Models Regularly 

1 view0 comments