Self learning guide for machine learning
Learn to encode categorical variables
pd.get_dummies
quick and easy. Here are some examples
Scikit offers the following encoders:
sklearn.preprocessing.LabelEncoder
sklearn.preprocessing.OneHotEncoder
Try these examples:
Category Encoders are easy to use encoders.
After completing the exercises below, you should be comfortable with
★☆☆ - Easy
★★☆ - Medium
★★★ - Challenging
★★★★ - Bonus
Create the following data frame
import pandas as pd
df = pd.DataFrame({"age" : [65, 32, 24, 55, 45, 30, 35 ],
"gender" : ['Male', 'Male', 'Female', 'Male', 'Male', 'Female', 'Female'],
"status":['married', 'single', 'single', 'divorced', 'married' ,'single', 'married' ]
})
age gender status
0 65 Male married
1 32 Male single
2 24 Female single
3 55 Male divorced
4 45 Male married
5 30 Female single
6 35 Female married
Index encode (integer encode) status
column and gender
column
One-hot encode status
column and gender
column
Read prosper-data-simplified.csv
Inspect the data, and identify categorical variables to encode. Use appropriate encoding