Unit V

Unsupervised Learning

Discover hidden patterns without labels — clustering with K-Means and hierarchical methods, and association rule mining with the Apriori algorithm.

🔍 5.1.1 — Introduction to Unsupervised Learning

Unsupervised learning works with unlabeled data — the algorithm finds hidden patterns, structures, or groupings without being told what to look for.

📊 Clustering

Group similar data points together based on feature similarity.

  • Customer segmentation
  • Image compression
  • Anomaly detection
  • Document grouping

🔗 Association

Find relationships and patterns between items in large datasets.

  • Market basket analysis
  • "Customers who bought X also bought Y"
  • Recommendation systems
  • Cross-selling strategies
💡

Key difference from Supervised Learning: No target variable, no "correct answers." The algorithm discovers structure on its own. Evaluation is harder — no ground-truth labels to check against.

🔵 5.1.2 — Clustering Concepts

Clustering aims to partition data into groups (clusters) such that:

  • Points within the same cluster are as similar as possible (high intra-cluster similarity)
  • Points in different clusters are as different as possible (high inter-cluster distance)

Types of Clustering

Type Description Example Algorithm
Partitional Divides data into non-overlapping clusters K-Means, K-Medoids
Hierarchical Creates a tree of nested clusters (dendrogram) Agglomerative, Divisive
Density-Based Clusters based on dense regions of data DBSCAN

Evaluation Metrics

  • Silhouette Score: Measures how well each point fits its cluster vs other clusters. Range: [-1, 1]. Higher = better.
  • Inertia (WCSS): Sum of squared distances from each point to its cluster centroid. Lower = tighter clusters.
  • Elbow Method: Plot inertia vs K and look for the "elbow" where adding more clusters stops giving significant improvement.

🎯 5.2.1 — K-Means Clustering

K-Means is the most popular clustering algorithm. It partitions data into exactly K clusters by iteratively assigning points to the nearest centroid and updating centroids.

Algorithm Steps

  1. Choose K — the number of clusters
  2. Initialize K centroids randomly
  3. Assignment step: Assign each data point to the nearest centroid
  4. Update step: Recalculate each centroid as the mean of all points in its cluster
  5. Repeat steps 3-4 until centroids no longer change (convergence)
Objective: Minimize WCSS = Σᵢ Σⱼ∈Cᵢ ||xⱼ - μᵢ||²

Where μᵢ is the centroid of cluster Cᵢ, and xⱼ is a data point.

Pros and Cons

✅ Advantages

  • Simple and easy to implement
  • Fast and scalable for large datasets
  • Guaranteed convergence
  • Works well with spherical clusters

❌ Disadvantages

  • Must specify K in advance
  • Sensitive to initial centroid placement
  • Assumes spherical, equal-sized clusters
  • Sensitive to outliers

🎮 Interactive: K-Means Clustering Demo

Watch K-Means iterate! Adjust K and click "Run K-Means" to see the algorithm in action.

🌳 5.2.2 — Hierarchical Clustering

Hierarchical clustering builds a tree of clusters (dendrogram) rather than producing a flat partition. No need to specify K in advance.

Two Approaches

⬆️ Agglomerative (Bottom-Up)

Most common approach.

  1. Start: each point is its own cluster
  2. Find the two closest clusters
  3. Merge them into one cluster
  4. Repeat until only one cluster remains

⬇️ Divisive (Top-Down)

Less common, more complex.

  1. Start: all points in one large cluster
  2. Split the cluster into two sub-clusters
  3. Recursively split each sub-cluster
  4. Repeat until each point is its own cluster

Linkage Methods

How do we measure distance between clusters (not just individual points)?

Method Description Characteristic
Single Linkage Min distance between any pair across clusters Can produce elongated (chaining) clusters
Complete Linkage Max distance between any pair across clusters Produces compact, spherical clusters
Average Linkage Average distance between all pairs across clusters Balanced approach
Ward's Method Minimizes total within-cluster variance Tends to create equal-sized clusters

Reading a Dendrogram

Height (y-axis) represents the distance at which clusters were merged. Cut the dendrogram at a chosen height to get a desired number of clusters.

🔗 5.3 — Association Rule Mining

Association rule mining finds interesting relationships (rules) between items in transactional datasets — "If a customer buys X, they are also likely to buy Y."

Key Metrics

Support(A→B) = P(A ∩ B) = (Transactions containing both A and B) / Total transactions

How frequently items A and B appear together.

Confidence(A→B) = P(B|A) = Support(A∪B) / Support(A)

Given that A is purchased, how likely is B also purchased?

Lift(A→B) = Confidence(A→B) / Support(B)

How much more likely are A and B bought together than independently?

  • Lift > 1: Positive association — items are bought together more than expected
  • Lift = 1: Independent — no association
  • Lift < 1: Negative association — buying one makes the other less likely

⛏️ 5.3.1 — Apriori Algorithm

The Apriori algorithm is the classic approach for mining frequent itemsets and generating association rules.

Key Principle

📐

Apriori Property: If an itemset is infrequent, all its supersets must also be infrequent. This allows us to prune the search space efficiently.

Algorithm Steps

  1. Set minimum support threshold
  2. Generate C₁: Count support of each individual item
  3. Prune to L₁: Keep only items meeting minimum support
  4. Generate C₂: Create all 2-item combinations from L₁
  5. Prune to L₂: Keep only pairs meeting minimum support
  6. Continue generating larger itemsets until no more frequent sets found
  7. Generate rules from frequent itemsets that meet minimum confidence

🛒 Interactive: Market Basket Analysis

Enter transactions (comma-separated items per row). Adjust minimum support and click "Analyze" to find frequent itemsets and association rules.

🃏 Quick Revision — Flashcards

What are the two main steps in K-Means that repeat?
1. Assignment Step: Assign each point to the nearest centroid.
2. Update Step: Recalculate centroids as the mean of all points in each cluster.
Repeat until convergence (centroids stop moving).
Click to reveal
What is the Elbow Method?
A technique to find the optimal K in K-Means. Plot WCSS (inertia) vs K. The "elbow" point — where the rate of decrease sharply slows — suggests the best K value.
Click to reveal
What is the Apriori Property?
If an itemset is infrequent, all of its supersets must also be infrequent. This allows the algorithm to prune large portions of the search space, making it computationally feasible.
Click to reveal
What does Lift measure in association rules?
Lift(A→B) = Confidence(A→B) / Support(B). It measures how much more likely items are bought together vs independently. Lift > 1 = positive association, Lift = 1 = independent, Lift < 1 = negative association.
Click to reveal
Name the two types of hierarchical clustering.
Agglomerative (Bottom-Up): Start with individual points, merge closest pairs until one cluster remains.
Divisive (Top-Down): Start with one big cluster, recursively split until each point is its own cluster.
Click to reveal

🧠 Unit 5 Quiz