ML-learning-path

Self learning guide for machine learning

View the Project on GitHub elephantscale/ML-learning-path

Feature Engineering - Variable Encoding

Back to Index

Objective

Learn to encode categorical variables

Reference

Essential Reading

Encoding Options

Pandas

pd.get_dummies quick and easy. Here are some examples

SKLearn Encoders

Scikit offers the following encoders:

Try these examples:

Category Encoders

Category Encoders are easy to use encoders.

Extra Reading

Checklist

After completing the exercises below, you should be comfortable with

Exercises

Difficulty Level

★☆☆ - Easy
★★☆ - Medium
★★★ - Challenging
★★★★ - Bonus

A - Encoding

A-1 : Index Encoding (★☆☆)

Create the following data frame

import pandas as pd

df = pd.DataFrame({"age" : [65, 32, 24, 55, 45, 30, 35 ],
                   "gender" : ['Male', 'Male', 'Female', 'Male', 'Male', 'Female', 'Female'],
                   "status":['married', 'single', 'single', 'divorced', 'married' ,'single', 'married' ]
                   })
   age  gender    status
0   65    Male   married
1   32    Male    single
2   24  Female    single
3   55    Male  divorced
4   45    Male   married
5   30  Female    single
6   35  Female   married

Index encode (integer encode) status column and gender column

A-2: One-Hot Encoding (★☆☆)

One-hot encode status column and gender column

A-3: Encode Prosper Data (★★☆)

Read prosper-data-simplified.csv

Inspect the data, and identify categorical variables to encode. Use appropriate encoding

More Exercises


Index