Kubernetes Cost Optimization: Reduce K8s Infrastructure Spend
Kubernetes cost optimization is the practice of reducing K8s infrastructure spend by right-sizing clusters, setting pod resource limits, using node autoscaling, scheduling on spot instances, and monitoring costs with tools like Kubecost — without sacrificing reliability.
What You’ll Learn
- Cluster right-sizing and node selection
- Cluster Autoscaler vs Karpenter for node autoscaling
- Pod resource requests and limits
- Vertical Pod Autoscaler (VPA)
- Spot instances for Kubernetes workloads
- Namespace resource quotas
- Cost monitoring with Kubecost and OpenCost
- Garbage collection for unused resources
Why It Matters
Kubernetes clusters are notoriously over-provisioned. Teams set generous requests “to be safe,” leave nodes running 24/7, and run workloads that could be scheduled on spot instances. The result: 40-60% of K8s spend is waste. DodaTech reduced EKS costs for DodaZIP’s backend by 45% using Karpenter spot instances and VPA recommendations.
flowchart LR
A[Cluster Metrics] --> B[Right-Size Nodes]
A --> C[Pod Requests/Limits]
B --> D[Cluster Autoscaler]
B --> E[Karpenter]
C --> F[VPA]
A --> G[Spot Instances]
A --> H[Namespace Quotas]
D --> I[30-50% Savings]
style I fill:#326ce5,color:#fff
1. Cluster Right-Sizing
The first step is choosing the right instance type and size for your nodes. Use tools like kubecost or kube-ops-view to visualize resource utilization.
# Check node resource utilization
kubectl top nodes
# Install kube-ops-view for visualization
kubectl apply -f https://raw.githubusercontent.com/hjacobs/kube-ops-view/master/deploy/kubernetes/deploy.yaml
# Check which node types you're using
kubectl get nodes -o json | jq '.items[].metadata.labels["beta.kubernetes.io/instance-type"]' | sort | uniq -cExpected output example:
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
node-m5-2x-1 850m 10% 12Gi 37%
node-m5-2x-2 1200m 15% 18Gi 56%
node-m5-4x-1 900m 5% 20Gi 31%If all nodes show <50% resource usage, you’re over-provisioned. Downsize to smaller instance types.
2. Node Autoscaling
Cluster Autoscaler scales node groups up and down based on pending pods. Karpenter (AWS) is a next-gen autoscaler that provisions the optimal instance type directly.
# Cluster Autoscaler deployment (AWS EKS)
apiVersion: apps/v1
kind: Deployment
metadata:
name: cluster-autoscaler
namespace: kube-system
spec:
replicas: 1
selector:
matchLabels:
app: cluster-autoscaler
template:
metadata:
labels:
app: cluster-autoscaler
spec:
containers:
- image: k8s.gcr.io/autoscaling/cluster-autoscaler:v1.28.0
name: cluster-autoscaler
command:
- ./cluster-autoscaler
- --v=4
- --stderrthreshold=info
- --cloud-provider=aws
- --skip-nodes-with-system-pods=false
- --balance-similar-node-groups=true
- --scale-down-unneeded-time=10m# Karpenter provisioner (faster, cheaper)
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
name: default
spec:
template:
spec:
requirements:
- key: "karpenter.sh/capacity-type"
operator: In
values: ["on-demand", "spot"]
- key: "kubernetes.io/arch"
operator: In
values: ["amd64"]
nodeClassRef:
name: default
limits:
cpu: 1000
disruption:
consolidationPolicy: WhenUnderutilized
expireAfter: 720hKarpenter advantage: It can launch any instance type that fits the pod requirements, achieving higher density than fixed node groups.
3. Pod Resource Requests and Limits
Setting accurate requests and limits is the single highest-impact K8s cost optimization.
# BAD: no requests/limits (unbounded, noisy neighbors)
apiVersion: v1
kind: Pod
metadata:
name: web-1
spec:
containers:
- name: app
image: nginx:latest
# GOOD: set requests and limits based on profiling
apiVersion: v1
kind: Pod
metadata:
name: web-1
spec:
containers:
- name: app
image: nginx:latest
resources:
requests:
cpu: 250m
memory: 512Mi
limits:
cpu: 500m
memory: 1GiRecommendation: Set requests at the P99 of observed usage and limits at 2x requests. Use VPA to get these numbers.
4. Vertical Pod Autoscaler (VPA)
VPA analyzes historical pod usage and recommends optimal CPU/memory requests.
# VPA recommender for a deployment
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: web-app-vpa
spec:
targetRef:
apiVersion: "apps/v1"
kind: Deployment
name: web-app
updatePolicy:
updateMode: "Off" # recommend only; switch to "Auto" after reviewing
resourcePolicy:
containerPolicies:
- containerName: "*"
minAllowed:
cpu: 100m
memory: 128Mi
maxAllowed:
cpu: 2
memory: 4Gi# Check VPA recommendations
kubectl describe vpa web-app-vpa
# Expected output:
# Lower: 150m CPU, 256Mi RAM
# Target: 300m CPU, 512Mi RAM
# Upper: 800m CPU, 1.5Gi RAM5. Spot Instances for K8s
Use spot instances for worker nodes running stateless, fault-tolerant workloads.
# Node pool with spot instances (AKS example)
az aks nodepool add \
--resource-group prod-rg \
--cluster-name prod-cluster \
--name spotpool \
--priority Spot \
--eviction-policy Delete \
--spot-max-price -1 \
--node-count 3 \
--enable-cluster-autoscaler \
--min-count 1 \
--max-count 10# EKS managed node group with spot
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: prod
region: us-east-1
managedNodeGroups:
- name: spot-workers
instanceTypes:
- m5.large
- m5a.large
- m5d.large
spot: true
minSize: 2
maxSize: 20
desiredCapacity: 26. Namespace Quotas
Prevent one team from consuming all cluster resources.
# Resource quota per namespace
apiVersion: v1
kind: ResourceQuota
metadata:
name: team-backend-quota
namespace: team-backend
spec:
hard:
requests.cpu: "10"
requests.memory: 40Gi
limits.cpu: "20"
limits.memory: 80Gi
persistentvolumeclaims: 10
pods: "50"# Check quota usage
kubectl describe quota team-backend-quota -n team-backend7. Cost Monitoring with Kubecost
Kubecost provides per-namespace, per-deployment, and per-label cost breakdowns.
# Install Kubecost
helm repo add kubecost https://kubecost.github.io/cost-analyzer/
helm install kubecost kubecost/cost-analyzer \
--namespace kubecost \
--create-namespace \
--set kubecostToken="your-token"
# Port-forward to dashboard
kubectl port-forward --namespace kubecost svc/kubecost-cost-analyzer 9090:9090Kubecost key metrics:
- Cluster cost rate (per hour/day/month)
- Namespace cost breakdown
- Idle resource cost
- Spot vs on-demand savings
- Rightsizing recommendations
8. Garbage Collection
Unused resources accumulate: terminated pods, old replicasets, unused PVs, container images.
# Clean up completed pods
kubectl delete pods --field-selector=status.phase==Succeeded
# Delete old ReplicaSets (ReplicaSets older than 1 hour with 0 replicas)
kubectl delete replicasets --all-namespaces \
--field-selector=status.replicas==0
# Use k8s garbage collector for container images
# Configure imagePullPolicy: IfNotPresent and use tag-based retentionCommon Mistakes
No resource requests or limits: Pods can consume unlimited cluster resources, causing noisy neighbors and unpredictable costs.
Ignoring VPA recommendations: Setting requests by guessing leads to massive over-provisioning. VPA provides data-driven recommendations.
Fixed-size node groups: Without Cluster Autoscaler or Karpenter, nodes run 24/7 even when idle. Autoscaling is non-negotiable for cost optimization.
No spot instances: Stateless workloads (CI/CD, batch, web workers) can run on spot at 60-90% discount. Only databases and stateful services need on-demand.
No namespace quotas: One team’s over-provisioned pods inflate the entire cluster’s cost. Quotas enforce fairness and accountability.
Practice Questions
What is the difference between Cluster Autoscaler and Karpenter? Answer: Cluster Autoscaler works with node groups; Karpenter provisions individual optimal instance types. Karpenter achieves higher density and faster scaling.
How do requests and limits affect cost? Answer: Requests determine the minimum resources reserved for a pod (and thus billed). Limits cap resource usage. Over-provisioned requests waste money; under-provisioned limits cause throttling.
What workloads should not run on spot instances? Answer: Stateful workloads (databases), long-running batch jobs without checkpointing, and workloads that cannot tolerate abrupt termination.
How does Kubecost help reduce costs? Answer: It shows exact cost per namespace, deployment, label, and pod. It identifies idle resources, rightsizing opportunities, and savings from spot adoption.
Challenge
Given a 50-node EKS cluster with $25k/month spend: install VPA in recommendation mode for all deployments, implement requests/limits recommendations, switch 70% of nodes to spot using Karpenter, set namespace quotas for all teams, install Kubecost to track savings, and configure Cluster Autoscaler with a 10-minute scale-down unneeded time.
FAQ
What’s Next
| Topic | Description |
|---|---|
| 80-90% discount on compute with spot/preemptible | |
| Cost monitoring tools for multi-cloud |
Related topics: Cloud Cost Optimization, Docker, AWS
Built by the developers of Doda Browser, DodaZIP, and Durga Antivirus Pro.
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro