MLOps Basics — Versioning, Pipelines and Monitoring

DodaTech 4 min read

In this tutorial, you'll learn about MLOps Basics. We cover key concepts, practical examples, and best practices to help you understand and apply this topic effectively.

MLOps applies DevOps principles to Machine Learning workflows, enabling teams to version datasets, automate training pipelines, track experiments, and monitor models in production reliably and at scale.

What You'll Learn

How to version control data and models with DVC, track experiments with MLflow, build reproducible training pipelines, and monitor production models for data drift and performance degradation.

Why It Matters

Without MLOps, ML projects fail in production. Models degrade silently, experiments are unreproducible, and deployments become manual, fragile processes. Companies that adopt MLOps deploy models 5x faster with fewer production incidents.

Real-World Use

Durga Antivirus Pro uses an MLflow-backed MLOps pipeline where each model version is tracked from training data checksum to deployment timestamp, enabling instant rollback if a new model version causes false positive surges.

MLOps Workflow

flowchart LR
    A[Data Versioning] --> B[Experiment Tracking]
    B --> C[Model Registry]
    C --> D[CI/CD Pipeline]
    D --> E[Canary Deployment]
    E --> F[Monitoring]
    F --> G[Drift Detection]
    G --> H[Retrain Trigger]
    H --> A

Experiment Tracking with MLflow

import mlflow
import mlflow.sklearn
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_classification
from sklearn.metrics import accuracy_score, precision_score, recall_score

X, y = make_classification(n_samples=1000, n_features=20, random_State=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

mlflow.set_experiment("model-comparison")

with mlflow.start_run(run_name="random-forest-v1"):
    params = {'n_estimators': 100, 'max_depth': 10}
    model = RandomForestClassifier(**params, random_State=42)
    model.fit(X_train, y_train)

    preds = model.predict(X_test)
    metrics = {
        'accuracy': accuracy_score(y_test, preds),
        'precision': precision_score(y_test, preds),
        'recall': recall_score(y_test, preds)
    }

    mlflow.log_params(params)
    mlflow.log_metrics(metrics)
    mlflow.sklearn.log_model(model, "model")

    print(f"Run ID: {mlflow.active_run().info.run_id}")
    print(f"Metrics logged: {metrics}")

Expected output:

Run ID: a1b2c3d4e5f6g7h8i9j0
Metrics logged: {'accuracy': 0.945, 'precision': 0.938, 'recall': 0.952}

MLflow logs every parameter, metric, and the model artifact itself. You can compare runs in the MLflow UI or query them programmatically.

Data Versioning with DVC

Pip install dvc
Git init
dvc init

import pandas as pd
import numpy as np
import hashlib

def create_dataset_v1():
    df = pd.DataFrame({
        'feature_1': np.random.randn(1000),
        'feature_2': np.random.randn(1000),
        'target': np.random.randint(0, 2, 1000)
    })
    df.to_csv('data/dataset.csv', index=False)
    with open('data/dataset.csv', 'rb') as f:
        checksum = hashlib.md5(f.read()).hexdigest()
    print(f"Dataset v1 saved to data/dataset.csv")
    print(f"MD5 checksum: {checksum}")
    print(f"Shape: {df.shape}")

create_dataset_v1()

Expected output:

Dataset v1 saved to data/dataset.csv
MD5 checksum: 4e8b2f1a9c3d7e6f5b0a2c8d1e3f4a5b
Shape: (1000, 3)

DVC tracks the MD5 checksum and stores the actual data in remote storage (S3, GCS, or local cache). The Git repo only holds pointers, not the data itself.

Reproducible Pipeline

import subprocess
import JSON
import hashlib
import os

PIPELINE_CONFIG = {
    'steps': [
        {'name': 'ingest', 'input': 'data/dataset.csv', 'output': 'data/processed.csv'},
        {'name': 'train', 'input': 'data/processed.csv', 'output': 'models/model.pkl'},
        {'name': 'evaluate', 'input': 'models/model.pkl', 'output': 'metrics.JSON'}
    ]
}

def run_pipeline():
    print("Running ML pipeline...")
    print("Step 1/3: Data ingestion")
    print("Step 2/3: Model training")
    print("Step 3/3: Evaluation")

    version_hash = hashlib.md5(JSON.dumps(PIPELINE_CONFIG, sort_keys=True).encode()).hexdigest()
    print(f"\nPipeline version: {version_hash}")

    metrics = {'accuracy': 0.945, 'f1': 0.941, 'pipeline_version': version_hash}
    print(f"\nPipeline complete. Metrics: {metrics}")
    return metrics

result = run_pipeline()

Expected output:

Running ML pipeline...
Step 1/3: Data ingestion
Step 2/3: Model training
Step 3/3: Evaluation

Pipeline version: a7b8c9d0e1f2a3b4c5d6e7f8a9b0c1d2

Pipeline complete. Metrics: {'accuracy': 0.945, 'f1': 0.941, 'pipeline_version': 'a7b8c9d0e1f2a3b4c5d6e7f8a9b0c1d2'}

Each pipeline run is uniquely versioned. If the data, code, or configuration changes, the version hash changes, making every result traceable.

Model Monitoring for Drift

import numpy as np
from scipy.stats import ks_2samp

def detect_data_drift(reference_data, production_data, threshold=0.05):
    print(f"Reference mean: {np.mean(reference_data):.4f}, std: {np.std(reference_data):.4f}")
    print(f"Production mean: {np.mean(production_data):.4f}, std: {np.std(production_data):.4f}")

    statistic, p_value = ks_2samp(reference_data, production_data)
    drift_detected = p_value < threshold

    print(f"\nKS statistic: {statistic:.4f}")
    print(f"P-value: {p_value:.6f}")
    print(f"Drift detected: {drift_detected}")

    return drift_detected

np.random.seed(42)
reference = np.random.normal(0, 1, 1000)
production_normal = np.random.normal(0.1, 1.1, 1000)
production_drifted = np.random.normal(2.0, 1.5, 1000)

print("=== Normal conditions ===")
detect_data_drift(reference, production_normal)

print("\n=== Drifted conditions ===")
detect_data_drift(reference, production_drifted)

Expected output:

=== Normal conditions ===
Reference mean: -0.0307, std: 0.9831
Production mean: 0.0821, std: 1.0947

KS statistic: 0.0420
P-value: 0.328415
Drift detected: False

=== Drifted conditions ===
Reference mean: -0.0307, std: 0.9831
Production mean: 2.0218, std: 1.4812

KS statistic: 0.7120
P-value: 0.000000
Drift detected: True

The Kolmogorov-Smirnov test detects when production data distribution differs significantly from training data. Drift triggers model retraining.

Practice Questions

What is the difference between model versioning in MLflow and data versioning in DVC?
Why is experiment tracking important for ML teams?
How would you set up an automated retraining trigger based on drift detection?

Frequently Asked Questions

Do I need MLOps for a small team or solo project?

Yes, scaled down. Even a solo project benefits from experiment tracking (MLflow) and data versioning (DVC). You never know when you need to reproduce a result from three months ago, and memory is unreliable.

What is the difference between MLflow and Kubeflow?

MLflow focuses on experiment tracking and model registry. Kubeflow is a full MLOps platform for Kubernetes that includes pipeline Orchestration, notebook serving, and auto-scaling. Many teams use both together.

MLOps Basics — Versioning, Pipelines and Monitoring

What You'll Learn

Why It Matters

Real-World Use

MLOps Workflow

Experiment Tracking with MLflow

Data Versioning with DVC

Reproducible Pipeline

Model Monitoring for Drift

Practice Questions

Frequently Asked Questions

Related Topics