Unit 5 — Unsupervised Learning

🔍 5.1.1 — Introduction to Unsupervised Learning

Unsupervised learning works with unlabeled data — the algorithm finds hidden patterns, structures, or groupings without being told what to look for.

📊 Clustering

Group similar data points together based on feature similarity.

Customer segmentation
Image compression
Anomaly detection
Document grouping

🔗 Association

Find relationships and patterns between items in large datasets.

Market basket analysis
"Customers who bought X also bought Y"
Recommendation systems
Cross-selling strategies

💡

Key difference from Supervised Learning: No target variable, no "correct answers." The algorithm discovers structure on its own. Evaluation is harder — no ground-truth labels to check against.

🔵 5.1.2 — Clustering Concepts

Clustering aims to partition data into groups (clusters) such that:

Points within the same cluster are as similar as possible (high intra-cluster similarity)
Points in different clusters are as different as possible (high inter-cluster distance)

Types of Clustering

Type	Description	Example Algorithm
Partitional	Divides data into non-overlapping clusters	K-Means, K-Medoids
Hierarchical	Creates a tree of nested clusters (dendrogram)	Agglomerative, Divisive
Density-Based	Clusters based on dense regions of data	DBSCAN

Evaluation Metrics

Silhouette Score: Measures how well each point fits its cluster vs other clusters. Range: [-1, 1]. Higher = better.
Inertia (WCSS): Sum of squared distances from each point to its cluster centroid. Lower = tighter clusters.
Elbow Method: Plot inertia vs K and look for the "elbow" where adding more clusters stops giving significant improvement.

🎯 5.2.1 — K-Means Clustering

K-Means is the most popular clustering algorithm. It partitions data into exactly K clusters by iteratively assigning points to the nearest centroid and updating centroids.

Algorithm Steps

Choose K — the number of clusters
Initialize K centroids randomly
Assignment step: Assign each data point to the nearest centroid
Update step: Recalculate each centroid as the mean of all points in its cluster
Repeat steps 3-4 until centroids no longer change (convergence)

Objective: Minimize WCSS = Σᵢ Σⱼ∈Cᵢ ||xⱼ - μᵢ||²

Where μᵢ is the centroid of cluster Cᵢ, and xⱼ is a data point.

Pros and Cons

✅ Advantages

Simple and easy to implement
Fast and scalable for large datasets
Guaranteed convergence
Works well with spherical clusters

❌ Disadvantages

Must specify K in advance
Sensitive to initial centroid placement
Assumes spherical, equal-sized clusters
Sensitive to outliers

🎮 Interactive: K-Means Clustering Demo

Watch K-Means iterate! Adjust K and click "Run K-Means" to see the algorithm in action.

K (clusters): 3

🌳 5.2.2 — Hierarchical Clustering

Hierarchical clustering builds a tree of clusters (dendrogram) rather than producing a flat partition. No need to specify K in advance.

Two Approaches

⬆️ Agglomerative (Bottom-Up)

Most common approach.

Start: each point is its own cluster
Find the two closest clusters
Merge them into one cluster
Repeat until only one cluster remains

⬇️ Divisive (Top-Down)

Less common, more complex.

Start: all points in one large cluster
Split the cluster into two sub-clusters
Recursively split each sub-cluster
Repeat until each point is its own cluster

Linkage Methods

How do we measure distance between clusters (not just individual points)?

Method	Description	Characteristic
Single Linkage	Min distance between any pair across clusters	Can produce elongated (chaining) clusters
Complete Linkage	Max distance between any pair across clusters	Produces compact, spherical clusters
Average Linkage	Average distance between all pairs across clusters	Balanced approach
Ward's Method	Minimizes total within-cluster variance	Tends to create equal-sized clusters

Reading a Dendrogram

Height (y-axis) represents the distance at which clusters were merged. Cut the dendrogram at a chosen height to get a desired number of clusters.

🔗 5.3 — Association Rule Mining

Association rule mining finds interesting relationships (rules) between items in transactional datasets — "If a customer buys X, they are also likely to buy Y."

Key Metrics

Support(A→B) = P(A ∩ B) = (Transactions containing both A and B) / Total transactions

How frequently items A and B appear together.

Confidence(A→B) = P(B|A) = Support(A∪B) / Support(A)

Given that A is purchased, how likely is B also purchased?

Lift(A→B) = Confidence(A→B) / Support(B)

How much more likely are A and B bought together than independently?

Lift > 1: Positive association — items are bought together more than expected
Lift = 1: Independent — no association
Lift < 1: Negative association — buying one makes the other less likely

⛏️ 5.3.1 — Apriori Algorithm

The Apriori algorithm is the classic approach for mining frequent itemsets and generating association rules.

Key Principle

📐

Apriori Property: If an itemset is infrequent, all its supersets must also be infrequent. This allows us to prune the search space efficiently.

Algorithm Steps

Set minimum support threshold
Generate C₁: Count support of each individual item
Prune to L₁: Keep only items meeting minimum support
Generate C₂: Create all 2-item combinations from L₁
Prune to L₂: Keep only pairs meeting minimum support
Continue generating larger itemsets until no more frequent sets found
Generate rules from frequent itemsets that meet minimum confidence

🛒 Interactive: Market Basket Analysis

Enter transactions (comma-separated items per row). Adjust minimum support and click "Analyze" to find frequent itemsets and association rules.

Minimum Support: 30%

🃏 Quick Revision — Flashcards

What are the two main steps in K-Means that repeat?

1. Assignment Step: Assign each point to the nearest centroid.
2. Update Step: Recalculate centroids as the mean of all points in each cluster.
Repeat until convergence (centroids stop moving).

Click to reveal

What is the Elbow Method?

A technique to find the optimal K in K-Means. Plot WCSS (inertia) vs K. The "elbow" point — where the rate of decrease sharply slows — suggests the best K value.

Click to reveal

What is the Apriori Property?

If an itemset is infrequent, all of its supersets must also be infrequent. This allows the algorithm to prune large portions of the search space, making it computationally feasible.

Click to reveal

What does Lift measure in association rules?

Lift(A→B) = Confidence(A→B) / Support(B). It measures how much more likely items are bought together vs independently. Lift > 1 = positive association, Lift = 1 = independent, Lift < 1 = negative association.

Click to reveal

Name the two types of hierarchical clustering.

Agglomerative (Bottom-Up): Start with individual points, merge closest pairs until one cluster remains.
Divisive (Top-Down): Start with one big cluster, recursively split until each point is its own cluster.

Click to reveal

Unsupervised Learning

🔍 5.1.1 — Introduction to Unsupervised Learning

📊 Clustering

🔗 Association

🔵 5.1.2 — Clustering Concepts

Types of Clustering

Evaluation Metrics

🎯 5.2.1 — K-Means Clustering

Algorithm Steps

Pros and Cons

✅ Advantages

❌ Disadvantages

🎮 Interactive: K-Means Clustering Demo

🌳 5.2.2 — Hierarchical Clustering

Two Approaches

⬆️ Agglomerative (Bottom-Up)

⬇️ Divisive (Top-Down)

Linkage Methods

Reading a Dendrogram

🔗 5.3 — Association Rule Mining

Key Metrics

⛏️ 5.3.1 — Apriori Algorithm

Key Principle

Algorithm Steps

🛒 Interactive: Market Basket Analysis

🃏 Quick Revision — Flashcards

🧠 Unit 5 Quiz