Unit 3 — Modeling & Evaluation

🎯 3.1.1 — Selecting a Model

The first step in modeling is choosing the right type of model for your problem. Models are broadly categorized as Predictive or Descriptive.

🔮 Predictive Models

Use supervised learning — learn from labeled data to predict outcomes for new, unseen data.

Has a target variable (what we're predicting)
Classification: Predict a category (spam/not spam, disease/healthy)
Regression: Predict a continuous value (price, temperature)
Evaluated using accuracy, precision, recall, RMSE

🔍 Descriptive Models

Use unsupervised learning — discover hidden patterns and structures in data without any target.

No target variable — just input features
Clustering: Group similar data points (customer segments)
Association: Find co-occurrence patterns (market basket)
Evaluated using silhouette score, inertia, lift

🧩 Interactive: Model Selection Flow

🚀 3.2.1 — Training a Model (Supervised Learning)

Training means feeding data to an algorithm so it can learn patterns. The critical question is: how do we split data for training and testing?

Holdout Method

The simplest approach: split the dataset into two parts.

Training set (70–80%): Used to train (fit) the model
Test set (20–30%): Used to evaluate model performance on unseen data

Holdout Split Example (80/20)

Training (80%)

Test (20%)

⚠️

Limitation: Results depend heavily on which data ends up in training vs test. A single unlucky split can give misleading results — high variance in performance estimates.

K-Fold Cross-Validation

A more robust technique that uses all data for both training and testing across multiple rounds:

Divide the dataset into K equal-sized folds (subsets)
For each fold i (from 1 to K):
- Use fold i as the test set
- Use the remaining K-1 folds as the training set
- Train the model and record the test score
The final performance = average of all K test scores

Final Score = (1/K) × Σ Score_i (for i = 1 to K)

Aspect	Holdout	K-Fold Cross-Validation
Number of splits	1	K
Data usage	Part of data never used for training	All data used for both training and testing
Variance	High (depends on the split)	Low (averaged over K runs)
Computation	Fast	K times slower
Typical K values	N/A	5 or 10
Best for	Large datasets, quick prototyping	Small datasets, robust evaluation

🔬 K-Fold Cross-Validation Visualizer

Number of Folds (K): 5

🔎 3.3.1 — Model Representation & Interpretability

Interpretability refers to how easily humans can understand and explain a model's predictions. This is crucial in high-stakes domains like healthcare, finance, and criminal justice.

🟢 Interpretable (White-Box) Models

Decisions are transparent and explainable
Easy to understand why a prediction was made
Examples: Decision Trees, Linear Regression, Logistic Regression
Preferred in regulated industries

🔴 Non-Interpretable (Black-Box) Models

Internal workings are opaque and complex
Difficult to explain individual predictions
Examples: Neural Networks, Random Forest, SVM (non-linear kernels)
Often more accurate on complex tasks

⚖️

Accuracy vs Interpretability Trade-off: More complex models (neural networks) often achieve higher accuracy but are harder to interpret. Simpler models (linear regression) are easily explainable but may miss complex patterns. Choose based on your domain requirements.

📊 3.3.2 — Evaluating Performance: Confusion Matrix

A Confusion Matrix is a table that summarizes the performance of a classification model by comparing actual vs predicted labels.

	Predicted Positive	Predicted Negative
Actual Positive	TP (True Positive) Correctly predicted positive	FN (False Negative) Missed — Type II Error
Actual Negative	FP (False Positive) False alarm — Type I Error	TN (True Negative) Correctly predicted negative

Key Performance Metrics

Accuracy = (TP + TN) / (TP + TN + FP + FN)

Overall correctness — what fraction of all predictions were right.

Precision = TP / (TP + FP)

Of all predicted positives, how many were actually positive? High precision = low false alarm rate.

Recall (Sensitivity) = TP / (TP + FN)

Of all actual positives, how many did the model catch? High recall = missing few positives.

F1 Score = 2 × (Precision × Recall) / (Precision + Recall)

Harmonic mean of precision and recall — a balanced measure when you need both.

🎯

When to prioritize which metric?
• Precision when false positives are costly (spam filter — don't want to block real emails).
• Recall when false negatives are costly (disease detection — don't want to miss a sick patient).
• F1 when you need a balance between the two.

🧮 Interactive: Confusion Matrix Calculator

Enter your confusion matrix values and see all metrics computed in real-time:

	Predicted Positive	Predicted Negative
Actual Positive	TP	FN
Actual Negative	FP	TN

📈 3.3.3 — Improving Model Performance

If your model doesn't perform well enough, here are proven strategies to improve it:

Understanding Overfitting vs Underfitting

📈 Overfitting (High Variance)

Model memorizes training data (including noise)
High training accuracy, low test accuracy
Too complex for the data
Solution: simplify model, add regularization, get more data

📉 Underfitting (High Bias)

Model is too simple to capture patterns
Low training accuracy, low test accuracy
Not enough capacity
Solution: more complex model, add features, train longer

Improvement Techniques

🔧 Feature Engineering

Create informative features, combine existing ones, or apply domain knowledge to extract better inputs for the model.

⚡ Hyperparameter Tuning

Adjust model settings (learning rate, tree depth, K value) using Grid Search or Random Search.

🛡️ Regularization

L1 (Lasso) or L2 (Ridge) regularization penalizes overly complex models to prevent overfitting.

🔄 Cross-Validation

Use K-fold cross-validation to get a reliable estimate and avoid misleading results from a single split.

⚖️ Handle Imbalanced Data

Use SMOTE (oversampling), undersampling, or class weights to address class imbalance.

📊 More Data

More training data helps the model learn more diverse patterns and improve generalization.

🎛️ Interactive: Overfitting Simulator

Adjust model complexity and dataset size to see how they affect training vs testing accuracy.

Model Complexity: 5

Dataset Size: 5

🃏 Quick Revision — Flashcards

What is the difference between Predictive and Descriptive models?

Predictive: Supervised learning — uses labeled data, has a target variable, predicts new outcomes (classification/regression).
Descriptive: Unsupervised learning — no target variable, discovers hidden patterns (clustering/association).

Click to reveal

Why is K-Fold Cross-Validation better than a single Holdout split?

K-Fold uses all data for both training and testing (across K iterations). This gives a more reliable, lower-variance performance estimate. A single holdout can be misleading due to an unlucky split.

Click to reveal

What does the F1 Score measure?

F1 = 2 × (Precision × Recall) / (Precision + Recall). It's the harmonic mean of precision and recall, useful when you need a balance between both — especially with imbalanced classes.

Click to reveal

What is the difference between overfitting and underfitting?

Overfitting: Model is too complex, memorizes training data. High train accuracy, low test accuracy.
Underfitting: Model is too simple, can't capture patterns. Low accuracy on both train and test data.

Click to reveal

Modeling & Evaluation

🎯 3.1.1 — Selecting a Model

🔮 Predictive Models

🔍 Descriptive Models

🧩 Interactive: Model Selection Flow

🚀 3.2.1 — Training a Model (Supervised Learning)

Holdout Method

K-Fold Cross-Validation

🔬 K-Fold Cross-Validation Visualizer

🔎 3.3.1 — Model Representation & Interpretability

🟢 Interpretable (White-Box) Models

🔴 Non-Interpretable (Black-Box) Models

📊 3.3.2 — Evaluating Performance: Confusion Matrix

Key Performance Metrics

🧮 Interactive: Confusion Matrix Calculator

📈 3.3.3 — Improving Model Performance

Understanding Overfitting vs Underfitting

📈 Overfitting (High Variance)

📉 Underfitting (High Bias)

Improvement Techniques

🔧 Feature Engineering

⚡ Hyperparameter Tuning

🛡️ Regularization

🔄 Cross-Validation

⚖️ Handle Imbalanced Data

📊 More Data

🎛️ Interactive: Overfitting Simulator

🃏 Quick Revision — Flashcards

🧠 Unit 3 Quiz