Skip to content
Kubernetes Cost Optimization: Reduce K8s Infrastructure Spend

Kubernetes Cost Optimization: Reduce K8s Infrastructure Spend

DodaTech Updated Jun 20, 2026 7 min read

Kubernetes cost optimization is the practice of reducing K8s infrastructure spend by right-sizing clusters, setting pod resource limits, using node autoscaling, scheduling on spot instances, and monitoring costs with tools like Kubecost — without sacrificing reliability.

What You’ll Learn

  • Cluster right-sizing and node selection
  • Cluster Autoscaler vs Karpenter for node autoscaling
  • Pod resource requests and limits
  • Vertical Pod Autoscaler (VPA)
  • Spot instances for Kubernetes workloads
  • Namespace resource quotas
  • Cost monitoring with Kubecost and OpenCost
  • Garbage collection for unused resources

Why It Matters

Kubernetes clusters are notoriously over-provisioned. Teams set generous requests “to be safe,” leave nodes running 24/7, and run workloads that could be scheduled on spot instances. The result: 40-60% of K8s spend is waste. DodaTech reduced EKS costs for DodaZIP’s backend by 45% using Karpenter spot instances and VPA recommendations.

    flowchart LR
    A[Cluster Metrics] --> B[Right-Size Nodes]
    A --> C[Pod Requests/Limits]
    B --> D[Cluster Autoscaler]
    B --> E[Karpenter]
    C --> F[VPA]
    A --> G[Spot Instances]
    A --> H[Namespace Quotas]
    D --> I[30-50% Savings]
    style I fill:#326ce5,color:#fff
  

1. Cluster Right-Sizing

The first step is choosing the right instance type and size for your nodes. Use tools like kubecost or kube-ops-view to visualize resource utilization.

# Check node resource utilization
kubectl top nodes

# Install kube-ops-view for visualization
kubectl apply -f https://raw.githubusercontent.com/hjacobs/kube-ops-view/master/deploy/kubernetes/deploy.yaml

# Check which node types you're using
kubectl get nodes -o json | jq '.items[].metadata.labels["beta.kubernetes.io/instance-type"]' | sort | uniq -c

Expected output example:

NAME           CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%
node-m5-2x-1   850m         10%    12Gi            37%
node-m5-2x-2   1200m        15%    18Gi            56%
node-m5-4x-1   900m         5%     20Gi            31%

If all nodes show <50% resource usage, you’re over-provisioned. Downsize to smaller instance types.

2. Node Autoscaling

Cluster Autoscaler scales node groups up and down based on pending pods. Karpenter (AWS) is a next-gen autoscaler that provisions the optimal instance type directly.

# Cluster Autoscaler deployment (AWS EKS)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: cluster-autoscaler
  namespace: kube-system
spec:
  replicas: 1
  selector:
    matchLabels:
      app: cluster-autoscaler
  template:
    metadata:
      labels:
        app: cluster-autoscaler
    spec:
      containers:
      - image: k8s.gcr.io/autoscaling/cluster-autoscaler:v1.28.0
        name: cluster-autoscaler
        command:
        - ./cluster-autoscaler
        - --v=4
        - --stderrthreshold=info
        - --cloud-provider=aws
        - --skip-nodes-with-system-pods=false
        - --balance-similar-node-groups=true
        - --scale-down-unneeded-time=10m
# Karpenter provisioner (faster, cheaper)
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: default
spec:
  template:
    spec:
      requirements:
        - key: "karpenter.sh/capacity-type"
          operator: In
          values: ["on-demand", "spot"]
        - key: "kubernetes.io/arch"
          operator: In
          values: ["amd64"]
      nodeClassRef:
        name: default
  limits:
    cpu: 1000
  disruption:
    consolidationPolicy: WhenUnderutilized
    expireAfter: 720h

Karpenter advantage: It can launch any instance type that fits the pod requirements, achieving higher density than fixed node groups.

3. Pod Resource Requests and Limits

Setting accurate requests and limits is the single highest-impact K8s cost optimization.

# BAD: no requests/limits (unbounded, noisy neighbors)
apiVersion: v1
kind: Pod
metadata:
  name: web-1
spec:
  containers:
  - name: app
    image: nginx:latest

# GOOD: set requests and limits based on profiling
apiVersion: v1
kind: Pod
metadata:
  name: web-1
spec:
  containers:
  - name: app
    image: nginx:latest
    resources:
      requests:
        cpu: 250m
        memory: 512Mi
      limits:
        cpu: 500m
        memory: 1Gi

Recommendation: Set requests at the P99 of observed usage and limits at 2x requests. Use VPA to get these numbers.

4. Vertical Pod Autoscaler (VPA)

VPA analyzes historical pod usage and recommends optimal CPU/memory requests.

# VPA recommender for a deployment
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: web-app-vpa
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind: Deployment
    name: web-app
  updatePolicy:
    updateMode: "Off"  # recommend only; switch to "Auto" after reviewing
  resourcePolicy:
    containerPolicies:
      - containerName: "*"
        minAllowed:
          cpu: 100m
          memory: 128Mi
        maxAllowed:
          cpu: 2
          memory: 4Gi
# Check VPA recommendations
kubectl describe vpa web-app-vpa

# Expected output:
# Lower: 150m CPU, 256Mi RAM
# Target: 300m CPU, 512Mi RAM  
# Upper: 800m CPU, 1.5Gi RAM

5. Spot Instances for K8s

Use spot instances for worker nodes running stateless, fault-tolerant workloads.

# Node pool with spot instances (AKS example)
az aks nodepool add \
  --resource-group prod-rg \
  --cluster-name prod-cluster \
  --name spotpool \
  --priority Spot \
  --eviction-policy Delete \
  --spot-max-price -1 \
  --node-count 3 \
  --enable-cluster-autoscaler \
  --min-count 1 \
  --max-count 10
# EKS managed node group with spot
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
  name: prod
  region: us-east-1
managedNodeGroups:
- name: spot-workers
  instanceTypes:
  - m5.large
  - m5a.large
  - m5d.large
  spot: true
  minSize: 2
  maxSize: 20
  desiredCapacity: 2

6. Namespace Quotas

Prevent one team from consuming all cluster resources.

# Resource quota per namespace
apiVersion: v1
kind: ResourceQuota
metadata:
  name: team-backend-quota
  namespace: team-backend
spec:
  hard:
    requests.cpu: "10"
    requests.memory: 40Gi
    limits.cpu: "20"
    limits.memory: 80Gi
    persistentvolumeclaims: 10
    pods: "50"
# Check quota usage
kubectl describe quota team-backend-quota -n team-backend

7. Cost Monitoring with Kubecost

Kubecost provides per-namespace, per-deployment, and per-label cost breakdowns.

# Install Kubecost
helm repo add kubecost https://kubecost.github.io/cost-analyzer/
helm install kubecost kubecost/cost-analyzer \
  --namespace kubecost \
  --create-namespace \
  --set kubecostToken="your-token"

# Port-forward to dashboard
kubectl port-forward --namespace kubecost svc/kubecost-cost-analyzer 9090:9090

Kubecost key metrics:

  • Cluster cost rate (per hour/day/month)
  • Namespace cost breakdown
  • Idle resource cost
  • Spot vs on-demand savings
  • Rightsizing recommendations

8. Garbage Collection

Unused resources accumulate: terminated pods, old replicasets, unused PVs, container images.

# Clean up completed pods
kubectl delete pods --field-selector=status.phase==Succeeded

# Delete old ReplicaSets (ReplicaSets older than 1 hour with 0 replicas)
kubectl delete replicasets --all-namespaces \
  --field-selector=status.replicas==0

# Use k8s garbage collector for container images
# Configure imagePullPolicy: IfNotPresent and use tag-based retention

Common Mistakes

  1. No resource requests or limits: Pods can consume unlimited cluster resources, causing noisy neighbors and unpredictable costs.

  2. Ignoring VPA recommendations: Setting requests by guessing leads to massive over-provisioning. VPA provides data-driven recommendations.

  3. Fixed-size node groups: Without Cluster Autoscaler or Karpenter, nodes run 24/7 even when idle. Autoscaling is non-negotiable for cost optimization.

  4. No spot instances: Stateless workloads (CI/CD, batch, web workers) can run on spot at 60-90% discount. Only databases and stateful services need on-demand.

  5. No namespace quotas: One team’s over-provisioned pods inflate the entire cluster’s cost. Quotas enforce fairness and accountability.

Practice Questions

  1. What is the difference between Cluster Autoscaler and Karpenter? Answer: Cluster Autoscaler works with node groups; Karpenter provisions individual optimal instance types. Karpenter achieves higher density and faster scaling.

  2. How do requests and limits affect cost? Answer: Requests determine the minimum resources reserved for a pod (and thus billed). Limits cap resource usage. Over-provisioned requests waste money; under-provisioned limits cause throttling.

  3. What workloads should not run on spot instances? Answer: Stateful workloads (databases), long-running batch jobs without checkpointing, and workloads that cannot tolerate abrupt termination.

  4. How does Kubecost help reduce costs? Answer: It shows exact cost per namespace, deployment, label, and pod. It identifies idle resources, rightsizing opportunities, and savings from spot adoption.

Challenge

Given a 50-node EKS cluster with $25k/month spend: install VPA in recommendation mode for all deployments, implement requests/limits recommendations, switch 70% of nodes to spot using Karpenter, set namespace quotas for all teams, install Kubecost to track savings, and configure Cluster Autoscaler with a 10-minute scale-down unneeded time.

FAQ

How much can I save on K8s cost optimization?
: Most teams save 30-50% in the first quarter through rightsizing, spot adoption, and autoscaling.
Should I use VPA or HPA?
: Use VPA for variable resource needs within a pod (recommends CPU/memory). Use HPA for scaling pod count based on load. They complement each other.
What is the best K8s cost monitoring tool?
: Kubecost is the most popular. OpenCost is the CNCF open-source alternative. Both integrate with AWS, Azure, and GCP billing.
How do I handle spot instance interruptions for K8s?
: Use pod disruption budgets, multi-zone deployments, and node lifecycle handlers. Karpenter automatically replaces terminated spot nodes.
What is the biggest waste in K8s?
: Over-provisioned pod requests (setting 2 CPU when the app uses 200m). Always use VPA recommendations.

What’s Next

TopicDescription
Spot & Preemptible Instances
80-90% discount on compute with spot/preemptible
Cloud Cost Management Tools
Cost monitoring tools for multi-cloud

Related topics: Cloud Cost Optimization, Docker, AWS

Built by the developers of Doda Browser, DodaZIP, and Durga Antivirus Pro.

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro