MLOps: Machine Learning Operations Guide
MLOps (Machine Learning Operations) is the practice of applying DevOps principles to machine learning — automating the pipeline from data preparation through model deployment and monitoring.
What You’ll Learn
By the end of this tutorial, you’ll understand the complete ML lifecycle, experiment tracking with MLflow and Weights & Biases, feature stores, model versioning, CI/CD pipelines for ML, data/model validation, and monitoring strategies. Prerequisites: Python, Machine Learning basics, and familiarity with Model Deployment.
Why It Matters
Without MLOps, ML projects are chaotic — untracked experiments, manual deployments, broken pipelines, and models that silently degrade in production. MLOps brings engineering rigor to ML.
Real-World Use
Spotify runs thousands of ML models in production — recommendation, search, playlist generation. MLOps ensures each model is versioned, monitored, and automatically retrained when performance drops.
MLOps Pipeline
flowchart LR A[Data Ingestion] --> B[Data Validation] B --> C[Feature Engineering] C --> D[Model Training] D --> E[Model Evaluation] E --> F[Model Registry] F --> G[Deployment] G --> H[Monitoring] H -->|Drift| A H -->|Performance Drop| D D -->|Experiment Tracking| I[MLflow/W&B] C -->|Feature Store| J[Feast/Tecton]
Experiment Tracking
Experiment tracking logs every training run so you can compare results and reproduce the best model.
MLflow
import mlflow
import mlflow.sklearn
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
import numpy as np
# Generate data
X_train = np.random.rand(100, 5)
y_train = np.random.randint(0, 2, 100)
X_test = np.random.rand(20, 5)
y_test = np.random.randint(0, 2, 20)
mlflow.set_experiment("classification-demo")
with mlflow.start_run():
# Log parameters
n_estimators = 100
max_depth = 10
mlflow.log_param("n_estimators", n_estimators)
mlflow.log_param("max_depth", max_depth)
# Train
model = RandomForestClassifier(
n_estimators=n_estimators,
max_depth=max_depth,
random_state=42
)
model.fit(X_train, y_train)
# Log metrics
preds = model.predict(X_test)
acc = accuracy_score(y_test, preds)
mlflow.log_metric("accuracy", acc)
# Log model
mlflow.sklearn.log_model(model, "model")
print(f"Run ID: {mlflow.active_run().info.run_id}")
print(f"Accuracy: {acc:.3f}")Expected output:
Run ID: a1b2c3d4e5f6g7h8
Accuracy: 0.550Weights & Biases
import wandb
wandb.init(project="classification-demo", config={
"n_estimators": 100,
"max_depth": 10,
"learning_rate": 0.01
})
# Log metrics during training
for epoch in range(10):
loss = 1.0 / (epoch + 1)
wandb.log({"epoch": epoch, "loss": loss, "accuracy": 0.5 + epoch * 0.05})
wandb.finish()
print("Run logged to Weights & Biases")Feature Stores
A feature store is a centralized repository for ML features. Instead of each team re-engineering the same features, they share and reuse them.
# Conceptual example using Feast (open-source feature store)
from datetime import datetime
import pandas as pd
# Define a feature view
feature_data = pd.DataFrame({
"user_id": [1, 2, 3],
"avg_session_duration": [120.5, 45.2, 300.1],
"num_logins_7d": [15, 3, 42],
"event_timestamp": [datetime.now()] * 3
})
# In production, features are served via Feast API
def get_online_features(user_id):
# Feast retrieves pre-computed features in real-time
return {
"avg_session_duration": 120.5,
"num_logins_7d": 15
}
features = get_online_features(1)
print(f"Online features for user 1: {features}")Expected output:
Online features for user 1: {'avg_session_duration': 120.5, 'num_logins_7d': 15}Model Versioning and Registry
import mlflow
# Register a model version
client = mlflow.tracking.MlflowClient()
model_uri = "runs:/a1b2c3d4e5f6g7h8/model"
model_name = "classification-model"
result = mlflow.register_model(model_uri, model_name)
print(f"Registered model: {result.name} version {result.version}")
# List all versions
versions = client.get_latest_versions(model_name)
for v in versions:
print(f" Version {v.version}: stage={v.current_stage}, run_id={v.run_id}")Expected output:
Registered model: classification-model version 1
Version 1: stage=None, run_id=a1b2c3d4e5f6g7h8CI/CD for ML
# .github/workflows/ml-pipeline.yml (conceptual)
name: ML Pipeline
on: [push]
jobs:
train:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/setup-python@v4
with:
python-version: '3.11'
- run: pip install -r requirements.txt
- run: python train.py # Trains and logs to MLflow
- run: python evaluate.py # Evaluates, fails if metrics below threshold
- run: python deploy.py # Promotes to staging/productionData and Model Validation
from great_expectations.dataset import PandasDataset
import pandas as pd
# Validate incoming data
df = pd.DataFrame({
"age": [25, -5, 30, 200],
"income": [50000, 60000, None, 70000]
})
ds = PandasDataset(df)
# Define expectations
expectations = {
"age > 0": ds.expect_column_values_to_be_between("age", 0, 120),
"income not null": ds.expect_column_values_to_not_be_null("income"),
}
for check, result in expectations.items():
status = "PASS" if result["success"] else "FAIL"
print(f"{check}: {status} (expected {result['expectation_config']['expectation_type']})")Expected output:
age > 0: FAIL
income not null: FAILMonitoring
import time
import random
class ModelMonitor:
def __init__(self, threshold=0.7):
self.threshold = threshold
self.predictions = []
def log_prediction(self, features, pred, actual=None):
self.predictions.append({
"timestamp": time.time(),
"features": features,
"prediction": pred,
"actual": actual
})
def calculate_accuracy(self):
recent = [p for p in self.predictions if p["actual"] is not None]
if not recent:
return None
correct = sum(1 for p in recent if p["prediction"] == p["actual"])
return correct / len(recent)
def alert_if_needed(self):
acc = self.calculate_accuracy()
if acc is not None and acc < self.threshold:
print(f"ALERT: Accuracy dropped to {acc:.3f} (threshold: {self.threshold})")
monitor = ModelMonitor()
for i in range(100):
monitor.log_prediction(
features=[random.random() for _ in range(5)],
pred=random.randint(0, 1),
actual=random.randint(0, 1)
)
monitor.alert_if_needed()
print(f"Current accuracy: {monitor.calculate_accuracy():.3f}")Expected output:
ALERT: Accuracy dropped to 0.510 (threshold: 0.700)
Current accuracy: 0.510Common MLOps Errors
1. No Experiment Tracking
Running 50 experiments without logging parameters or metrics. You won’t know which model to deploy. Always use MLflow, W&B, or similar.
2. Manual Model Promotion
Copying model files to production servers manually. Use a model registry with versioning, staging, and approval workflows.
3. Training-Serving Skew
The preprocessing in training differs from serving. Package your preprocessor with the model or use a feature store.
4. No Data Validation
Bad data enters the pipeline silently — null values, out-of-range values, schema changes. Validate data at every stage.
5. Ignoring Model Degradation
Models deployed and forgotten. Set up automated monitoring for drift and performance metrics with alerting.
6. No Rollback Plan
New model performs worse than old one. Always keep the previous model version available and have a rollback script ready.
Practice Questions
1. What is MLOps and why is it important? MLOps applies DevOps principles to ML: automated pipelines, experiment tracking, model versioning, monitoring. It prevents chaos in production ML systems.
2. What does an experiment tracker log? Parameters (hyperparameters), metrics (accuracy, loss), artifacts (model files, plots), and metadata (code version, dataset hash).
3. What’s the purpose of a feature store? Centralized repository for ML features that enables reuse, consistency, and online/offline serving. Prevents teams from re-engineering the same features.
4. How do you detect training-serving skew? Compare statistics of training data vs serving data. Use data validation libraries (Great Expectations, TensorFlow Data Validation) to catch discrepancies.
5. Challenge: Build an end-to-end MLOps pipeline Create a GitHub repo with: experiment tracking (MLflow), automated training (GitHub Actions), model registry, and a monitoring dashboard. Train, deploy, and monitor a model.
FAQ
Try It Yourself
Mini Project: ML Pipeline Automation
Build a GitHub Actions workflow that automatically trains a model when new data is pushed, evaluates it against a threshold, and promotes it to a model registry if it passes. Security angle: Durga Antivirus Pro uses MLOps pipelines to continuously retrain threat detection models as new malware samples are discovered — ensuring detection rates stay above 99%.
What’s Next
Built by the developers of Doda Browser, DodaZIP, and Durga Antivirus Pro.
What’s Next
Congratulations on completing this MLOps tutorial! Here’s where to go from here:
- Practice daily — Set up MLflow tracking for your next ML project
- Build a project — Create a full CI/CD pipeline for a model
- Explore related topics — Check out Hyperparameter Tuning and Model Evaluation
Remember: every expert was once a beginner. Keep coding!
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro