K-Means

Back to Index

Objective

Learn clustering with k-means

Prerequisite Reading

Brush up on Unsupervised learning

Essentials Reading

A good intro to kmeans
A Friendly Introduction to K-Means clustering algorithm
Visualizing kmeans
visualizing Kmeans
Kmeans intro video - a nice video (17 mins) by Luis Serrano

Extra Reading

Section 10.3 “Clustering”, pp 385 in Introduction to Statistical Learning

Implementing K-Means in Scikit-Learn

Knowledge Check

What are the strengths and weaknesses of KMeans?
Can KMeans predict optimal value for K?
How will we find optimal K value?
How will outliers impact Kmeans

Exercises

Difficulty Level

★☆☆ - Easy
★★☆ - Medium
★★★ - Challenging
★★★★ - Bonus

EX-1: Clustering with synthetic data (★☆☆)

Use Scikit’s make_blobs to generate some data

Cluster it using Kmeans

Start with this notebook: kmeans-1-intro

EX-2: Clustering cars dataset (★★☆)

We are going cluster cars dataset.

Here is the cars data set

Data looks likes this:

            model   mpg  cyl   disp   hp  drat     wt   qsec  vs  am  gear  
   Ford Pantera L  15.8    8  351.0  264  4.22  3.170  14.50   0   1     5   
        Merc 280C  17.8    6  167.6  123  3.92  3.440  18.90   1   0     4   
       Volvo 142E  21.4    4  121.0  109  4.11  2.780  18.60   1   1     4   
         Merc 230  22.8    4  140.8   95  3.92  3.150  22.90   1   0     4   

Only use mpg and cyl columns and cluster the cars.

You can start with this notebook: kmeans-2-mtcars

EX-3: Clustering Uber Trips (★★☆)

This is a fun lab. We will cluster Uber pick up locations and figure out where the demand hot-spot is.

Here is uber dataset

You can start with this notebook: kmeans-3-uber-pickups

ML-learning-path