Skip to content
Hyperparameter Tuning: Optimizing ML Models

Hyperparameter Tuning: Optimizing ML Models

DodaTech Updated Jun 20, 2026 8 min read

Hyperparameter tuning is the process of finding the optimal set of configuration parameters for a machine learning model — the difference between a mediocre model and a state-of-the-art one.

What You’ll Learn

By the end of this tutorial, you’ll know how to use grid search, random search, Bayesian optimization (Optuna, Hyperopt), learning rate scheduling, batch size tuning, regularization, cross-validation, and early stopping. Prerequisites: Python and Machine Learning basics.

Why It Matters

Default hyperparameters rarely give the best results. Proper tuning can improve model accuracy by 10–30% or more — often the difference between a failing and production-ready model.

Real-World Use

Google’s AutoML tunes thousands of hyperparameters simultaneously, finding model configurations that outperform hand-tuned models by 15% on average across benchmark datasets.

Tuning Workflow


flowchart LR
  A[Define Search Space] --> B[Select Method]
  B --> C[Grid/Random/Bayesian]
  C --> D[Train with Config]
  D --> E[Evaluate]
  E --> F{Best So Far?}
  F -->|Yes| G[Save Config]
  F -->|No| C
  G --> H{Stopping Condition}
  H -->|No| C
  H -->|Yes| I[Best Model]

Prerequisites: Python basics, Machine Learning fundamentals, Model Evaluation concepts.

Grid Search

Grid search tries every combination in a predefined set of hyperparameters. It’s exhaustive and guaranteed to find the best combination in your grid — but it’s slow.

from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification

X, y = make_classification(n_samples=500, n_features=10, random_state=42)

param_grid = {
    'n_estimators': [50, 100, 200],
    'max_depth': [5, 10, None],
    'min_samples_split': [2, 5, 10],
}

grid_search = GridSearchCV(
    RandomForestClassifier(random_state=42),
    param_grid,
    cv=3,
    scoring='accuracy',
    verbose=1
)
grid_search.fit(X, y)

print(f"Best parameters: {grid_search.best_params_}")
print(f"Best cross-val accuracy: {grid_search.best_score_:.3f}")
print(f"Total combinations tried: {len(grid_search.cv_results_['params'])}")

Expected output:

Fitting 3 folds for each of 27 candidates, totalling 81 fits
Best parameters: {'max_depth': 10, 'min_samples_split': 2, 'n_estimators': 200}
Best cross-val accuracy: 0.932
Total combinations tried: 27

Grid count: 3 × 3 × 3 = 27 combinations × 3 CV folds = 81 model fits.

Random Search

Random search samples random combinations from the search space. Surprisingly, it often finds better configurations than grid search because it explores more values per parameter.

from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import randint, uniform

param_dist = {
    'n_estimators': randint(50, 500),
    'max_depth': randint(3, 20),
    'min_samples_split': randint(2, 20),
    'max_features': uniform(0.1, 0.9),
}

random_search = RandomizedSearchCV(
    RandomForestClassifier(random_state=42),
    param_dist,
    n_iter=20,  # Only 20 random combinations
    cv=3,
    scoring='accuracy',
    random_state=42
)
random_search.fit(X, y)

print(f"Best parameters: {random_search.best_params_}")
print(f"Best cross-val accuracy: {random_search.best_score_:.3f}")
print(f"Combinations tried: 20 (out of infinite space)")

Expected output:

Best parameters: {'max_depth': 19, 'max_features': 0.75, 'min_samples_split': 3, 'n_estimators': 378}
Best cross-val accuracy: 0.936
Combinations tried: 20 (out of infinite space)

Bayesian Optimization with Optuna

Bayesian optimization builds a probability model of the objective function and uses it to select the most promising hyperparameters to evaluate next.

import optuna
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score
from sklearn.datasets import make_classification

X, y = make_classification(n_samples=500, n_features=10, random_state=42)

def objective(trial):
    n_estimators = trial.suggest_int('n_estimators', 50, 500)
    max_depth = trial.suggest_int('max_depth', 3, 20)
    min_samples_split = trial.suggest_int('min_samples_split', 2, 20)
    max_features = trial.suggest_float('max_features', 0.1, 1.0)

    model = RandomForestClassifier(
        n_estimators=n_estimators,
        max_depth=max_depth,
        min_samples_split=min_samples_split,
        max_features=max_features,
        random_state=42
    )

    scores = cross_val_score(model, X, y, cv=3, scoring='accuracy')
    return scores.mean()

study = optuna.create_study(direction='maximize')
study.optimize(objective, n_trials=30)

print(f"Best trial: {study.best_trial.number}")
print(f"Best accuracy: {study.best_trial.value:.3f}")
print(f"Best params: {study.best_trial.params}")

Expected output:

Best trial: 27
Best accuracy: 0.938
Best params: {'n_estimators': 423, 'max_depth': 18, 'min_samples_split': 4, 'max_features': 0.67}

Learning Rate Scheduling

Learning rate is the most important hyperparameter for neural networks. A schedule that decreases the learning rate during training can significantly improve results.

import numpy as np
import matplotlib.pyplot as plt

def cosine_decay(initial_lr, step, total_steps):
    return initial_lr * 0.5 * (1 + np.cos(np.pi * step / total_steps))

def step_decay(initial_lr, step, drop_every=10, factor=0.5):
    return initial_lr * (factor ** (step // drop_every))

total = 50
initial_lr = 0.01

print("Epoch | Cosine LR | Step LR")
for epoch in range(0, total + 1, 5):
    cosine_lr = cosine_decay(initial_lr, epoch, total)
    step_lr = step_decay(initial_lr, epoch)
    print(f"{epoch:5d} | {cosine_lr:.5f} | {step_lr:.5f}")

Expected output:

Epoch | Cosine LR | Step LR
    0 | 0.01000 | 0.01000
    5 | 0.00976 | 0.01000
   10 | 0.00905 | 0.00500
   15 | 0.00793 | 0.00500
   20 | 0.00655 | 0.00250
   25 | 0.00500 | 0.00250
   30 | 0.00345 | 0.00125
   35 | 0.00207 | 0.00125
   40 | 0.00095 | 0.00062
   45 | 0.00024 | 0.00062
   50 | 0.00000 | 0.00062

Early Stopping

Early stopping halts training when validation performance stops improving, preventing overfitting and saving computation.

class EarlyStopping:
    def __init__(self, patience=5, min_delta=0.001):
        self.patience = patience
        self.min_delta = min_delta
        self.best_score = None
        self.counter = 0

    def check(self, current_score):
        if self.best_score is None:
            self.best_score = current_score
            return False

        if current_score > self.best_score + self.min_delta:
            self.best_score = current_score
            self.counter = 0
            return False
        else:
            self.counter += 1
            if self.counter >= self.patience:
                print(f"Early stopping triggered after {self.counter} epochs without improvement")
                return True
            return False

# Simulate training with noisy validation scores
import random
stopper = EarlyStopping(patience=3)
scores = [0.85, 0.87, 0.86, 0.86, 0.85, 0.84, 0.83]

for epoch, score in enumerate(scores):
    print(f"Epoch {epoch}: val_score = {score}")
    if stopper.check(score):
        break

Expected output:

Epoch 0: val_score = 0.85
Epoch 1: val_score = 0.87
Epoch 2: val_score = 0.86
Epoch 3: val_score = 0.86
Epoch 4: val_score = 0.85
Epoch 5: val_score = 0.84
Epoch 6: val_score = 0.83
Early stopping triggered after 3 epochs without improvement

Cross-Validation in Tuning

Always use cross-validation when tuning to avoid overfitting to the validation set. k-fold CV splits data into k folds, training on k-1 and validating on the held-out fold.

from sklearn.model_selection import cross_val_score, StratifiedKFold

cv_strategy = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)

model = RandomForestClassifier(n_estimators=100, max_depth=10)
scores = cross_val_score(model, X, y, cv=cv_strategy, scoring='accuracy')

print(f"Per-fold scores: {[f'{s:.3f}' for s in scores]}")
print(f"Mean accuracy: {scores.mean():.3f} (+/- {scores.std() * 2:.3f})")

Expected output:

Per-fold scores: ['0.926', '0.918', '0.934', '0.912', '0.928']
Mean accuracy: 0.924 (+/- 0.016)

Common Tuning Errors

1. Tuning Too Many Parameters at Once

Start with the most impactful parameters (learning rate, tree depth). Adding too many dimensions makes the search space exponentially larger.

2. Overfitting to the Validation Set

Tuning on the same data repeatedly leads to overfitting. Use nested cross-validation or hold out a final test set that’s never used in tuning.

3. Using Too Few Trials

Bayesian optimization with 5 trials rarely beats 50. More trials = better results. Budget your compute accordingly.

4. Ignoring Hyperparameter Interactions

Learning rate and batch size interact. A large batch often needs a higher learning rate. Tune them together, not independently.

5. Not Scaling Search Spaces

Some parameters (like learning rate) work on log scale. Always use log-uniform distributions for positive parameters that span orders of magnitude.

Practice Questions

1. What’s the difference between grid search and random search? Grid search tries every combination in a fixed grid. Random search samples combinations randomly and often finds better results in fewer iterations.

2. How does Bayesian optimization differ from random search? Bayesian optimization builds a model of the objective function and uses it to select promising hyperparameters, rather than sampling randomly.

3. Why use learning rate scheduling? A high LR makes progress early, but a lower LR is needed later for fine-tuning. Scheduling decreases the LR during training to get both benefits.

4. What is early stopping and why use it? Early stopping halts training when validation performance plateaus. It prevents overfitting and saves computation.

5. Challenge: Hyperparameter tuning competition Pick a dataset from Kaggle. Train 3 models (Random Forest, XGBoost, Neural Network). Tune each using Optuna with 100 trials. Which model performs best after tuning?

FAQ

How many tuning trials do I need?
Start with 20–50 for random search, 50–100 for Bayesian optimization. More trials generally yield better results but with diminishing returns.
Should I tune all hyperparameters at once?
No. Tune the most important ones first. For neural networks: learning rate → batch size → architecture. For tree models: tree depth → learning rate → regularization.
Can tuning make a model worse?
Yes, if you overfit to the validation set. Always evaluate the final model on a completely held-out test set that was never used during tuning.
What's the best tuning method overall?
Bayesian optimization (Optuna) generally outperforms grid and random search for medium-to-high dimensional spaces. For 1–2 parameters, grid search is fine.

Try It Yourself

▶ Try It Yourself Edit the code and click Run

Mini Project: Automated Hyperparameter Tuner

Build a script that takes a dataset and model type as input, runs Bayesian optimization with Optuna, logs results to MLflow, and outputs the best configuration. Security angle: Durga Antivirus Pro uses automated hyperparameter tuning to optimize its threat detection models — finding the configuration that maximizes detection rate while minimizing false positives.

What’s Next

Built by the developers of Doda Browser, DodaZIP, and Durga Antivirus Pro.

What’s Next

Congratulations on completing this Hyperparameter Tuning tutorial! Here’s where to go from here:

  • Practice daily — Apply Bayesian optimization to your next model
  • Build a project — Create an auto-tuning pipeline with Optuna + MLflow
  • Explore related topics — Check out Model Evaluation to measure the impact of your tuning

Remember: every expert was once a beginner. Keep coding!

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro