Learn DevOps: Spot & Preemptible Instances: 80-90% Discount on Compute

Spot & Preemptible Instances: 80-90% Discount on Compute

DodaTech Updated Jun 20, 2026 7 min read

Spot and preemptible instances offer 80-90% discounts on cloud compute in exchange for the risk of interruption — ideal for batch processing, CI/CD, stateless web workers, and any fault-tolerant workload that can handle being terminated with little notice.

What You’ll Learn

AWS Spot Instances and Spot Fleet
Azure Spot VMs and eviction policies
GCP Preemptible and Spot VMs
Spot pricing mechanisms and how to bid
Interruption handling and graceful shutdowns
Checkpointing for long-running batch jobs
Designing fault-tolerant spot workloads
Spot orchestration with Spot.io/NetApp

Why It Matters

On-demand compute is the most expensive option. For workloads that can tolerate interruption — batch processing, testing, CI/CD pipelines, stateless microservices — spot instances reduce compute costs by 80-90%. DodaTech runs all DodaZIP build agents and Durga Antivirus Pro’s malware analysis sandbox on spot instances, saving $14k/month.

    flowchart LR
    A[Workload Type] --> B{Fault-Tolerant?}
    B -->|Yes| C[Spot / Preemptible]
    B -->|No| D[On-Demand / Reserved]
    C --> E[Spot Fleet / Node Group]
    C --> F[Checkpointing]
    C --> G[Interruption Handling]
    E --> H[80-90% Savings]
    style H fill:#22c55e,color:#fff

1. AWS Spot Instances

AWS Spot Instances use spare EC2 capacity at up to 90% discount. Pricing varies by instance type, region, and availability.

# Request a Spot Instance
aws ec2 request-spot-instances \
  --spot-price "0.05" \
  --instance-count 5 \
  --type "one-time" \
  --launch-specification '{
    "ImageId": "ami-0c55b159cbfafe1f0",
    "InstanceType": "m5.large",
    "Placement": {"AvailabilityZone": "us-east-1a"}
  }'

# Describe spot price history
aws ec2 describe-spot-price-history \
  --instance-types m5.large \
  --product-description "Linux/UNIX" \
  --start-time 2026-06-01T00:00:00Z

# Check spot instance status
aws ec2 describe-spot-instance-requests \
  --filters "Name=state,Values=active"

Spot price history output:

Time                     InstanceType  ProductDesc    SpotPrice
2026-06-19T12:00:00Z     m5.large      Linux/UNIX      $0.0284
2026-06-19T11:00:00Z     m5.large      Linux/UNIX      $0.0250
2026-06-19T10:00:00Z     m5.large      Linux/UNIX      $0.0312

Spot Fleet

Spot Fleet automatically launches and maintains the optimal mix of spot instances across pools to meet target capacity.

# Create a Spot Fleet
aws ec2 create-fleet \
  --target-capacity-specification '{"TotalTargetCapacity": 20, "DefaultTargetCapacityType": "spot"}' \
  --launch-template-configs '{"LaunchTemplateSpecification": {"LaunchTemplateName": "worker-template", "Version": "1"}}' \
  --type "instant"

2. Azure Spot VMs

Azure Spot VMs offer up to 90% discount with eviction policies: Deallocate (stop VM but keep disk) or Delete (remove VM and disk).

# Create an Azure Spot VM
az vm create \
  --resource-group batch-rg \
  --name spot-worker-1 \
  --image UbuntuLTS \
  --size Standard_D4s_v3 \
  --priority Spot \
  --eviction-policy Delete \
  --max-price -1

# Create a VMSS with Spot priority
az vmss create \
  --resource-group batch-rg \
  --name spot-vmss \
  --image UbuntuLTS \
  --instance-count 5 \
  --vm-sku Standard_D4s_v3 \
  --priority Spot \
  --eviction-policy Delete \
  --max-price -1 \
  --single-placement-group false

Azure eviction policy choices:

Deallocate: VM stops, disk persists, restart later (preserves state)
Delete: VM and disks removed (best for stateless, lowest cost)

3. GCP Preemptible and Spot VMs

GCP offers two types of interruptible VMs:

Type	Max Runtime	Discount	Termination Notice
Preemptible	24 hours	60-91%	30 seconds
Spot	None	60-91%	30 seconds

# Create a GCP Spot VM
gcloud compute instances create spot-worker-1 \
  --zone us-central1-a \
  --machine-type e2-standard-4 \
  --provisioning-model=SPOT \
  --instance-termination-action=STOP

# Create a preemptible VM
gcloud compute instances create preemptible-worker-1 \
  --zone us-central1-a \
  --machine-type e2-standard-4 \
  --preemptible

# Set maintenance behavior for Spot VMs
gcloud compute instances create resilient-worker \
  --zone us-central1-a \
  --machine-type e2-standard-4 \
  --provisioning-model=SPOT \
  --instance-termination-action=DELETE \
  --max-run-duration=4h

4. Interruption Handling and Graceful Shutdown

Spot instances receive a termination notice — handle it to save work and exit cleanly.

# AWS: Listen for spot termination notice
import requests
import time

def check_termination():
    url = "http://169.254.169.254/latest/meta-data/spot/termination-time"
    while True:
        try:
            resp = requests.get(url, timeout=5)
            if resp.status_code == 200:
                print(f"Termination at: {resp.text}")
                save_checkpoint()
                return
        except requests.exceptions.RequestException:
            pass
        time.sleep(5)

def save_checkpoint():
    print("Saving checkpoint before termination...")
    # Save state, upload results, drain connections

# GCP: Similar metadata endpoint
# gcloud compute instances describe --zone us-central1-a spot-worker-1

# Kubernetes: Handle spot interruption with node lifecycle handler
kubectl apply -f - <<EOF
apiVersion: v1
kind: ConfigMap
metadata:
  name: spot-handler
  namespace: kube-system
data:
  spot-handler.sh: |
    #!/bin/bash
    kubectl cordon \$NODE_NAME
    kubectl drain \$NODE_NAME --ignore-daemonsets --delete-emptydir-data
EOF

5. Checkpointing for Long-Running Jobs

Batch processing workloads must save progress periodically so they resume from the last checkpoint after interruption.

# checkpoint_worker.py
import dill, os

CHECKPOINT_FILE = "/tmp/checkpoint.pkl"

class BatchProcessor:
    def __init__(self, tasks):
        self.tasks = tasks
        self.completed = self.load_checkpoint()

    def load_checkpoint(self):
        if os.path.exists(CHECKPOINT_FILE):
            with open(CHECKPOINT_FILE, "rb") as f:
                return dill.load(f)
        return []

    def save_checkpoint(self, task_id):
        self.completed.append(task_id)
        with open(CHECKPOINT_FILE, "wb") as f:
            dill.dump(self.completed, f)

    def run(self):
        for task in self.tasks:
            if task["id"] in self.completed:
                continue
            process(task)
            self.save_checkpoint(task["id"])

processor = BatchProcessor([{"id": i, "data": f"item-{i}"} for i in range(1000)])
processor.run()

6. Fault-Tolerant Workload Design

Architectures that work well on spot:

┌──────────────┐     ┌──────────────┐     ┌──────────────┐
│ Load         │────▶│ Spot Worker  │────▶│ Queue/Storage│
│ Balancer     │     │ Pool (Auto   │     │ (Persistent) │
│              │     │ Scaled)      │     │              │
└──────────────┘     └──────────────┘     └──────────────┘
     │                      │                     │
     │                      ▼                     │
     │              ┌──────────────┐              │
     │              │ Termination  │              │
     └──────────────▶ Handler      │──────────────┘
                     └──────────────┘

7. Spot Orchestration Tools

Spot.io (NetApp) automates spot instance management across AWS, Azure, and GCP:

# Spot.io ELX (Elastigroup) example
spotinst elastigroup create \
  --name "batch-workers" \
  --provider "aws" \
  --region "us-east-1" \
  --strategy '{"risk": 100, "fallbackToOd": true}' \
  --capacity '{"minimum": 2, "maximum": 20, "target": 5}'

Common Mistakes

No interruption handling: Applications crash and lose data when instances are terminated. Always implement termination notice listeners and checkpoints.
Using spot for stateful workloads: Databases, message queues, and stateful apps should never run on spot. They lose data on termination.
Bidding too high: With AWS, you pay the spot price, not your bid. Bidding the on-demand price is safe — you’ll never pay more than on-demand.
Single instance type / AZ: If that instance type has no spot capacity, your workload won’t run. Diversify across instance families and availability zones.
No fallback to on-demand: Always configure a fallback strategy — when spot is unavailable, launch on-demand to maintain availability.

Practice Questions

What is the difference between GCP Preemptible and GCP Spot VMs? Answer: Preemptible VMs have a 24-hour max runtime. Spot VMs have no max runtime and are the recommended option. Both offer similar discounts.
How do you handle spot instance termination gracefully? Answer: Listen to the termination notice endpoint (metadata) and execute a shutdown script: save checkpoints, drain connections, upload results, and terminate.
What workloads are best suited for spot instances? Answer: Stateless, fault-tolerant workloads: batch processing, CI/CD agents, render farms, web workers, testing environments, and data analysis pipelines.
Can you mix spot and on-demand in a Spot Fleet? Answer: Yes. Configure Spot Fleet with a percentage split (e.g., 70% spot, 30% on-demand) to maintain availability while maximizing savings.

Challenge

Design a CI/CD build cluster on spot instances: AWS Spot Fleet with 5 instance types across 3 AZs, termination handler that drains jobs gracefully, checkpoints build artifacts to S3 every 5 minutes, fallback to on-demand when spot price exceeds $0.10/hr, and autoscaling from 0 to 50 workers based on queue depth.

FAQ

How much can I save with spot instances?

: 60-91% compared to on-demand. Most users pay 70-80% less.

Can spot instances be used for web servers?

: Only if they’re stateless, behind a load balancer, and can handle being replaced. Session data must be externalized to Redis or a database.

What happens when a spot instance is terminated?

: AWS sends a 2-minute termination notice. Azure sends a 30-second notice. GCP sends a 30-second notice. Use these to drain traffic and save state.

How do I get notified before spot termination?

: AWS: 169.254.169.254/latest/meta-data/spot/termination-time. Azure: 169.254.169.254/metadata/instance?compute. GCP: Instance termination action metadata.

Is spot pricing stable?

: No — spot prices fluctuate with supply and demand. Use diversified instance types and AWS’s maxPrice = onDemand to avoid paying more than necessary.

What’s Next

Topic	Description
Reserved Instances & Savings Plans	Save 40-70% with commitment-based discounts
Kubernetes Cost Optimization	Reduce K8s infrastructure spend

Related topics: Cloud Cost Optimization, AWS, Azure, GCP

Built by the developers of Doda Browser, DodaZIP, and Durga Antivirus Pro.

Previous Reserved Instances & Savings Plans: Save 40-70% Next Cloud Cost Management Tools: Monitor & Optimize Spend

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro

Home Browse DevOps & Cloud Cost Optimization