Random Forest
Back to Index
Objective
Learn Random Forest algorithms
Prerequisite Reading
Essentials Reading
Random Forest
RF Feature Importance
Implementing Random Forest in Scikit-Learn
Knowledge Check
- What problem can RF solve? Classification, regression, both?
- What are the issues with DT, that are solved by RF?
- What are the strengths and weaknesses of RF?
- What are the tuning parameters for RF? Which is the most important tuning param?
- How do we calculate feature importance from RF?
Exercises
We will be using RF in the same exercises we did in Decision Trees section
Difficulty Level
★☆☆ - Easy
★★☆ - Medium
★★★ - Challenging
★★★★ - Bonus
EX-1: RF Classification - Synthetic data (★☆☆)
Use Scikit’s make_blobs or make_classification to generate some sample data.
Try to separate them using RF
EX-2: RF Classification (★★☆)
- Here is Bank marketing dataset
- You may want to encode variables
- Use DT to predict yes/no binary decision
- Visualize the tree
- Create a confusion matrix
- What is the accuracy of the model
- Run Cross Validation to gauge the accuracy of this model
EX-3: RF Regression - Synthetic data (★☆☆)
Use Scikit’s make_regression to generate some sample data.
Use RandomForestRegressor to solve this
EX-4: RF Regression (★★☆)
- Use Bike sharing data
- Use RandomForestRegressor to predict bike demand
- Visualize the tree
- Use RMSE, R2 to evaluate the model
- Use Cross Validation to thoroughly test the model performance
More Exercises