Spot & Preemptible Instances: 80-90% Discount on Compute
Spot and preemptible instances offer 80-90% discounts on cloud compute in exchange for the risk of interruption — ideal for batch processing, CI/CD, stateless web workers, and any fault-tolerant workload that can handle being terminated with little notice.
What You’ll Learn
- AWS Spot Instances and Spot Fleet
- Azure Spot VMs and eviction policies
- GCP Preemptible and Spot VMs
- Spot pricing mechanisms and how to bid
- Interruption handling and graceful shutdowns
- Checkpointing for long-running batch jobs
- Designing fault-tolerant spot workloads
- Spot orchestration with Spot.io/NetApp
Why It Matters
On-demand compute is the most expensive option. For workloads that can tolerate interruption — batch processing, testing, CI/CD pipelines, stateless microservices — spot instances reduce compute costs by 80-90%. DodaTech runs all DodaZIP build agents and Durga Antivirus Pro’s malware analysis sandbox on spot instances, saving $14k/month.
flowchart LR
A[Workload Type] --> B{Fault-Tolerant?}
B -->|Yes| C[Spot / Preemptible]
B -->|No| D[On-Demand / Reserved]
C --> E[Spot Fleet / Node Group]
C --> F[Checkpointing]
C --> G[Interruption Handling]
E --> H[80-90% Savings]
style H fill:#22c55e,color:#fff
1. AWS Spot Instances
AWS Spot Instances use spare EC2 capacity at up to 90% discount. Pricing varies by instance type, region, and availability.
# Request a Spot Instance
aws ec2 request-spot-instances \
--spot-price "0.05" \
--instance-count 5 \
--type "one-time" \
--launch-specification '{
"ImageId": "ami-0c55b159cbfafe1f0",
"InstanceType": "m5.large",
"Placement": {"AvailabilityZone": "us-east-1a"}
}'
# Describe spot price history
aws ec2 describe-spot-price-history \
--instance-types m5.large \
--product-description "Linux/UNIX" \
--start-time 2026-06-01T00:00:00Z
# Check spot instance status
aws ec2 describe-spot-instance-requests \
--filters "Name=state,Values=active"Spot price history output:
Time InstanceType ProductDesc SpotPrice
2026-06-19T12:00:00Z m5.large Linux/UNIX $0.0284
2026-06-19T11:00:00Z m5.large Linux/UNIX $0.0250
2026-06-19T10:00:00Z m5.large Linux/UNIX $0.0312Spot Fleet
Spot Fleet automatically launches and maintains the optimal mix of spot instances across pools to meet target capacity.
# Create a Spot Fleet
aws ec2 create-fleet \
--target-capacity-specification '{"TotalTargetCapacity": 20, "DefaultTargetCapacityType": "spot"}' \
--launch-template-configs '{"LaunchTemplateSpecification": {"LaunchTemplateName": "worker-template", "Version": "1"}}' \
--type "instant"2. Azure Spot VMs
Azure Spot VMs offer up to 90% discount with eviction policies: Deallocate (stop VM but keep disk) or Delete (remove VM and disk).
# Create an Azure Spot VM
az vm create \
--resource-group batch-rg \
--name spot-worker-1 \
--image UbuntuLTS \
--size Standard_D4s_v3 \
--priority Spot \
--eviction-policy Delete \
--max-price -1
# Create a VMSS with Spot priority
az vmss create \
--resource-group batch-rg \
--name spot-vmss \
--image UbuntuLTS \
--instance-count 5 \
--vm-sku Standard_D4s_v3 \
--priority Spot \
--eviction-policy Delete \
--max-price -1 \
--single-placement-group falseAzure eviction policy choices:
- Deallocate: VM stops, disk persists, restart later (preserves state)
- Delete: VM and disks removed (best for stateless, lowest cost)
3. GCP Preemptible and Spot VMs
GCP offers two types of interruptible VMs:
| Type | Max Runtime | Discount | Termination Notice |
|---|---|---|---|
| Preemptible | 24 hours | 60-91% | 30 seconds |
| Spot | None | 60-91% | 30 seconds |
# Create a GCP Spot VM
gcloud compute instances create spot-worker-1 \
--zone us-central1-a \
--machine-type e2-standard-4 \
--provisioning-model=SPOT \
--instance-termination-action=STOP
# Create a preemptible VM
gcloud compute instances create preemptible-worker-1 \
--zone us-central1-a \
--machine-type e2-standard-4 \
--preemptible
# Set maintenance behavior for Spot VMs
gcloud compute instances create resilient-worker \
--zone us-central1-a \
--machine-type e2-standard-4 \
--provisioning-model=SPOT \
--instance-termination-action=DELETE \
--max-run-duration=4h4. Interruption Handling and Graceful Shutdown
Spot instances receive a termination notice — handle it to save work and exit cleanly.
# AWS: Listen for spot termination notice
import requests
import time
def check_termination():
url = "http://169.254.169.254/latest/meta-data/spot/termination-time"
while True:
try:
resp = requests.get(url, timeout=5)
if resp.status_code == 200:
print(f"Termination at: {resp.text}")
save_checkpoint()
return
except requests.exceptions.RequestException:
pass
time.sleep(5)
def save_checkpoint():
print("Saving checkpoint before termination...")
# Save state, upload results, drain connections
# GCP: Similar metadata endpoint
# gcloud compute instances describe --zone us-central1-a spot-worker-1# Kubernetes: Handle spot interruption with node lifecycle handler
kubectl apply -f - <<EOF
apiVersion: v1
kind: ConfigMap
metadata:
name: spot-handler
namespace: kube-system
data:
spot-handler.sh: |
#!/bin/bash
kubectl cordon \$NODE_NAME
kubectl drain \$NODE_NAME --ignore-daemonsets --delete-emptydir-data
EOF5. Checkpointing for Long-Running Jobs
Batch processing workloads must save progress periodically so they resume from the last checkpoint after interruption.
# checkpoint_worker.py
import dill, os
CHECKPOINT_FILE = "/tmp/checkpoint.pkl"
class BatchProcessor:
def __init__(self, tasks):
self.tasks = tasks
self.completed = self.load_checkpoint()
def load_checkpoint(self):
if os.path.exists(CHECKPOINT_FILE):
with open(CHECKPOINT_FILE, "rb") as f:
return dill.load(f)
return []
def save_checkpoint(self, task_id):
self.completed.append(task_id)
with open(CHECKPOINT_FILE, "wb") as f:
dill.dump(self.completed, f)
def run(self):
for task in self.tasks:
if task["id"] in self.completed:
continue
process(task)
self.save_checkpoint(task["id"])
processor = BatchProcessor([{"id": i, "data": f"item-{i}"} for i in range(1000)])
processor.run()6. Fault-Tolerant Workload Design
Architectures that work well on spot:
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Load │────▶│ Spot Worker │────▶│ Queue/Storage│
│ Balancer │ │ Pool (Auto │ │ (Persistent) │
│ │ │ Scaled) │ │ │
└──────────────┘ └──────────────┘ └──────────────┘
│ │ │
│ ▼ │
│ ┌──────────────┐ │
│ │ Termination │ │
└──────────────▶ Handler │──────────────┘
└──────────────┘7. Spot Orchestration Tools
Spot.io (NetApp) automates spot instance management across AWS, Azure, and GCP:
# Spot.io ELX (Elastigroup) example
spotinst elastigroup create \
--name "batch-workers" \
--provider "aws" \
--region "us-east-1" \
--strategy '{"risk": 100, "fallbackToOd": true}' \
--capacity '{"minimum": 2, "maximum": 20, "target": 5}'Common Mistakes
No interruption handling: Applications crash and lose data when instances are terminated. Always implement termination notice listeners and checkpoints.
Using spot for stateful workloads: Databases, message queues, and stateful apps should never run on spot. They lose data on termination.
Bidding too high: With AWS, you pay the spot price, not your bid. Bidding the on-demand price is safe — you’ll never pay more than on-demand.
Single instance type / AZ: If that instance type has no spot capacity, your workload won’t run. Diversify across instance families and availability zones.
No fallback to on-demand: Always configure a fallback strategy — when spot is unavailable, launch on-demand to maintain availability.
Practice Questions
What is the difference between GCP Preemptible and GCP Spot VMs? Answer: Preemptible VMs have a 24-hour max runtime. Spot VMs have no max runtime and are the recommended option. Both offer similar discounts.
How do you handle spot instance termination gracefully? Answer: Listen to the termination notice endpoint (metadata) and execute a shutdown script: save checkpoints, drain connections, upload results, and terminate.
What workloads are best suited for spot instances? Answer: Stateless, fault-tolerant workloads: batch processing, CI/CD agents, render farms, web workers, testing environments, and data analysis pipelines.
Can you mix spot and on-demand in a Spot Fleet? Answer: Yes. Configure Spot Fleet with a percentage split (e.g., 70% spot, 30% on-demand) to maintain availability while maximizing savings.
Challenge
Design a CI/CD build cluster on spot instances: AWS Spot Fleet with 5 instance types across 3 AZs, termination handler that drains jobs gracefully, checkpoints build artifacts to S3 every 5 minutes, fallback to on-demand when spot price exceeds $0.10/hr, and autoscaling from 0 to 50 workers based on queue depth.
FAQ
What’s Next
| Topic | Description |
|---|---|
| Save 40-70% with commitment-based discounts | |
| Reduce K8s infrastructure spend |
Related topics: Cloud Cost Optimization, AWS, Azure, GCP
Built by the developers of Doda Browser, DodaZIP, and Durga Antivirus Pro.
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro