ML-learning-path

Self learning guide for machine learning

View the Project on GitHub elephantscale/ML-learning-path

K-Means

Back to Index


Objective

Learn clustering with k-means

Prerequisite Reading

Essentials Reading

Extra Reading


Implementing K-Means in Scikit-Learn

Knowledge Check

Exercises

Difficulty Level

★☆☆ - Easy
★★☆ - Medium
★★★ - Challenging
★★★★ - Bonus

EX-1: Clustering with synthetic data (★☆☆)

Use Scikit’s make_blobs to generate some data

Cluster it using Kmeans

Start with this notebook: kmeans-1-intro

EX-2: Clustering cars dataset (★★☆)

We are going cluster cars dataset.

Here is the cars data set

Data looks likes this:

            model   mpg  cyl   disp   hp  drat     wt   qsec  vs  am  gear  
   Ford Pantera L  15.8    8  351.0  264  4.22  3.170  14.50   0   1     5   
        Merc 280C  17.8    6  167.6  123  3.92  3.440  18.90   1   0     4   
       Volvo 142E  21.4    4  121.0  109  4.11  2.780  18.60   1   1     4   
         Merc 230  22.8    4  140.8   95  3.92  3.150  22.90   1   0     4   

Only use mpg and cyl columns and cluster the cars.

You can start with this notebook: kmeans-2-mtcars

EX-3: Clustering Uber Trips (★★☆)

This is a fun lab. We will cluster Uber pick up locations and figure out where the demand hot-spot is.

Here is uber dataset

You can start with this notebook: kmeans-3-uber-pickups

More Exercises


Index