Hyperparameter Tuning: Optimizing ML Models
Hyperparameter tuning is the process of finding the optimal set of configuration parameters for a machine learning model — the difference between a mediocre model and a state-of-the-art one.
What You’ll Learn
By the end of this tutorial, you’ll know how to use grid search, random search, Bayesian optimization (Optuna, Hyperopt), learning rate scheduling, batch size tuning, regularization, cross-validation, and early stopping. Prerequisites: Python and Machine Learning basics.
Why It Matters
Default hyperparameters rarely give the best results. Proper tuning can improve model accuracy by 10–30% or more — often the difference between a failing and production-ready model.
Real-World Use
Google’s AutoML tunes thousands of hyperparameters simultaneously, finding model configurations that outperform hand-tuned models by 15% on average across benchmark datasets.
Tuning Workflow
flowchart LR
A[Define Search Space] --> B[Select Method]
B --> C[Grid/Random/Bayesian]
C --> D[Train with Config]
D --> E[Evaluate]
E --> F{Best So Far?}
F -->|Yes| G[Save Config]
F -->|No| C
G --> H{Stopping Condition}
H -->|No| C
H -->|Yes| I[Best Model]
Grid Search
Grid search tries every combination in a predefined set of hyperparameters. It’s exhaustive and guaranteed to find the best combination in your grid — but it’s slow.
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
X, y = make_classification(n_samples=500, n_features=10, random_state=42)
param_grid = {
'n_estimators': [50, 100, 200],
'max_depth': [5, 10, None],
'min_samples_split': [2, 5, 10],
}
grid_search = GridSearchCV(
RandomForestClassifier(random_state=42),
param_grid,
cv=3,
scoring='accuracy',
verbose=1
)
grid_search.fit(X, y)
print(f"Best parameters: {grid_search.best_params_}")
print(f"Best cross-val accuracy: {grid_search.best_score_:.3f}")
print(f"Total combinations tried: {len(grid_search.cv_results_['params'])}")Expected output:
Fitting 3 folds for each of 27 candidates, totalling 81 fits
Best parameters: {'max_depth': 10, 'min_samples_split': 2, 'n_estimators': 200}
Best cross-val accuracy: 0.932
Total combinations tried: 27Grid count: 3 × 3 × 3 = 27 combinations × 3 CV folds = 81 model fits.
Random Search
Random search samples random combinations from the search space. Surprisingly, it often finds better configurations than grid search because it explores more values per parameter.
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import randint, uniform
param_dist = {
'n_estimators': randint(50, 500),
'max_depth': randint(3, 20),
'min_samples_split': randint(2, 20),
'max_features': uniform(0.1, 0.9),
}
random_search = RandomizedSearchCV(
RandomForestClassifier(random_state=42),
param_dist,
n_iter=20, # Only 20 random combinations
cv=3,
scoring='accuracy',
random_state=42
)
random_search.fit(X, y)
print(f"Best parameters: {random_search.best_params_}")
print(f"Best cross-val accuracy: {random_search.best_score_:.3f}")
print(f"Combinations tried: 20 (out of infinite space)")Expected output:
Best parameters: {'max_depth': 19, 'max_features': 0.75, 'min_samples_split': 3, 'n_estimators': 378}
Best cross-val accuracy: 0.936
Combinations tried: 20 (out of infinite space)Bayesian Optimization with Optuna
Bayesian optimization builds a probability model of the objective function and uses it to select the most promising hyperparameters to evaluate next.
import optuna
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score
from sklearn.datasets import make_classification
X, y = make_classification(n_samples=500, n_features=10, random_state=42)
def objective(trial):
n_estimators = trial.suggest_int('n_estimators', 50, 500)
max_depth = trial.suggest_int('max_depth', 3, 20)
min_samples_split = trial.suggest_int('min_samples_split', 2, 20)
max_features = trial.suggest_float('max_features', 0.1, 1.0)
model = RandomForestClassifier(
n_estimators=n_estimators,
max_depth=max_depth,
min_samples_split=min_samples_split,
max_features=max_features,
random_state=42
)
scores = cross_val_score(model, X, y, cv=3, scoring='accuracy')
return scores.mean()
study = optuna.create_study(direction='maximize')
study.optimize(objective, n_trials=30)
print(f"Best trial: {study.best_trial.number}")
print(f"Best accuracy: {study.best_trial.value:.3f}")
print(f"Best params: {study.best_trial.params}")Expected output:
Best trial: 27
Best accuracy: 0.938
Best params: {'n_estimators': 423, 'max_depth': 18, 'min_samples_split': 4, 'max_features': 0.67}Learning Rate Scheduling
Learning rate is the most important hyperparameter for neural networks. A schedule that decreases the learning rate during training can significantly improve results.
import numpy as np
import matplotlib.pyplot as plt
def cosine_decay(initial_lr, step, total_steps):
return initial_lr * 0.5 * (1 + np.cos(np.pi * step / total_steps))
def step_decay(initial_lr, step, drop_every=10, factor=0.5):
return initial_lr * (factor ** (step // drop_every))
total = 50
initial_lr = 0.01
print("Epoch | Cosine LR | Step LR")
for epoch in range(0, total + 1, 5):
cosine_lr = cosine_decay(initial_lr, epoch, total)
step_lr = step_decay(initial_lr, epoch)
print(f"{epoch:5d} | {cosine_lr:.5f} | {step_lr:.5f}")Expected output:
Epoch | Cosine LR | Step LR
0 | 0.01000 | 0.01000
5 | 0.00976 | 0.01000
10 | 0.00905 | 0.00500
15 | 0.00793 | 0.00500
20 | 0.00655 | 0.00250
25 | 0.00500 | 0.00250
30 | 0.00345 | 0.00125
35 | 0.00207 | 0.00125
40 | 0.00095 | 0.00062
45 | 0.00024 | 0.00062
50 | 0.00000 | 0.00062Early Stopping
Early stopping halts training when validation performance stops improving, preventing overfitting and saving computation.
class EarlyStopping:
def __init__(self, patience=5, min_delta=0.001):
self.patience = patience
self.min_delta = min_delta
self.best_score = None
self.counter = 0
def check(self, current_score):
if self.best_score is None:
self.best_score = current_score
return False
if current_score > self.best_score + self.min_delta:
self.best_score = current_score
self.counter = 0
return False
else:
self.counter += 1
if self.counter >= self.patience:
print(f"Early stopping triggered after {self.counter} epochs without improvement")
return True
return False
# Simulate training with noisy validation scores
import random
stopper = EarlyStopping(patience=3)
scores = [0.85, 0.87, 0.86, 0.86, 0.85, 0.84, 0.83]
for epoch, score in enumerate(scores):
print(f"Epoch {epoch}: val_score = {score}")
if stopper.check(score):
breakExpected output:
Epoch 0: val_score = 0.85
Epoch 1: val_score = 0.87
Epoch 2: val_score = 0.86
Epoch 3: val_score = 0.86
Epoch 4: val_score = 0.85
Epoch 5: val_score = 0.84
Epoch 6: val_score = 0.83
Early stopping triggered after 3 epochs without improvementCross-Validation in Tuning
Always use cross-validation when tuning to avoid overfitting to the validation set. k-fold CV splits data into k folds, training on k-1 and validating on the held-out fold.
from sklearn.model_selection import cross_val_score, StratifiedKFold
cv_strategy = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
model = RandomForestClassifier(n_estimators=100, max_depth=10)
scores = cross_val_score(model, X, y, cv=cv_strategy, scoring='accuracy')
print(f"Per-fold scores: {[f'{s:.3f}' for s in scores]}")
print(f"Mean accuracy: {scores.mean():.3f} (+/- {scores.std() * 2:.3f})")Expected output:
Per-fold scores: ['0.926', '0.918', '0.934', '0.912', '0.928']
Mean accuracy: 0.924 (+/- 0.016)Common Tuning Errors
1. Tuning Too Many Parameters at Once
Start with the most impactful parameters (learning rate, tree depth). Adding too many dimensions makes the search space exponentially larger.
2. Overfitting to the Validation Set
Tuning on the same data repeatedly leads to overfitting. Use nested cross-validation or hold out a final test set that’s never used in tuning.
3. Using Too Few Trials
Bayesian optimization with 5 trials rarely beats 50. More trials = better results. Budget your compute accordingly.
4. Ignoring Hyperparameter Interactions
Learning rate and batch size interact. A large batch often needs a higher learning rate. Tune them together, not independently.
5. Not Scaling Search Spaces
Some parameters (like learning rate) work on log scale. Always use log-uniform distributions for positive parameters that span orders of magnitude.
Practice Questions
1. What’s the difference between grid search and random search? Grid search tries every combination in a fixed grid. Random search samples combinations randomly and often finds better results in fewer iterations.
2. How does Bayesian optimization differ from random search? Bayesian optimization builds a model of the objective function and uses it to select promising hyperparameters, rather than sampling randomly.
3. Why use learning rate scheduling? A high LR makes progress early, but a lower LR is needed later for fine-tuning. Scheduling decreases the LR during training to get both benefits.
4. What is early stopping and why use it? Early stopping halts training when validation performance plateaus. It prevents overfitting and saves computation.
5. Challenge: Hyperparameter tuning competition Pick a dataset from Kaggle. Train 3 models (Random Forest, XGBoost, Neural Network). Tune each using Optuna with 100 trials. Which model performs best after tuning?
FAQ
Try It Yourself
Mini Project: Automated Hyperparameter Tuner
Build a script that takes a dataset and model type as input, runs Bayesian optimization with Optuna, logs results to MLflow, and outputs the best configuration. Security angle: Durga Antivirus Pro uses automated hyperparameter tuning to optimize its threat detection models — finding the configuration that maximizes detection rate while minimizing false positives.
What’s Next
Built by the developers of Doda Browser, DodaZIP, and Durga Antivirus Pro.
What’s Next
Congratulations on completing this Hyperparameter Tuning tutorial! Here’s where to go from here:
- Practice daily — Apply Bayesian optimization to your next model
- Build a project — Create an auto-tuning pipeline with Optuna + MLflow
- Explore related topics — Check out Model Evaluation to measure the impact of your tuning
Remember: every expert was once a beginner. Keep coding!
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro