Kubernetes Cost Optimization: Cutting Our AWS Bill by 67%

The Wake-Up Call: $145K Per Month

Our CFO walked into the engineering standup holding a printed AWS bill.

“Why are we spending $145,000 per month on Kubernetes? That’s more than our entire sales team’s salaries.”

We didn’t have a good answer.

This is the story of how we cut our Kubernetes costs by 67% in 4 months—from $145K/month to $48K/month—while actually improving performance and reliability.

The Starting Point: Cloud Waste at Scale

Our Kubernetes setup was a classic example of “it works, don’t touch it”:

Kubernetes Clusters: 3 (prod, staging, dev)
Total Nodes: 187 EC2 instances
Instance Types: 
  - 42x m5.8xlarge (production)
  - 28x m5.4xlarge (production)
  - 45x m5.2xlarge (staging/dev)
  - 72x m5.xlarge (mixed)

Monthly Costs:
  - EC2 instances: $89,000
  - EBS volumes: $23,000
  - Data transfer: $18,000
  - Load balancers: $9,000
  - NAT gateways: $6,000
  Total: $145,000/month

Cluster Utilization:
  - Average CPU: 23%
  - Average Memory: 31%
  - Peak CPU: 68%
  - Peak Memory: 72%

We were paying for 187 instances but using resources equivalent to ~50 instances.

Phase 1: The Visibility Problem

Before optimizing, we needed to understand what we were actually using.

Installing Kubecost

# Deploy Kubecost for cost visibility
helm repo add kubecost https://kubecost.github.io/cost-analyzer/
helm repo update

helm install kubecost kubecost/cost-analyzer \
  --namespace kubecost --create-namespace \
  --set kubecostToken=$KUBECOST_TOKEN \
  --set prometheus.server.global.external_labels.cluster_id=production

# Access Kubecost UI
kubectl port-forward -n kubecost svc/kubecost-cost-analyzer 9090:9090

The Shocking Discovery

After 48 hours of data collection, Kubecost revealed the truth:

cost_analysis = {
    'total_monthly_cost': 145000,
    'waste_breakdown': {
        'over_provisioned_pods': 47000,  # 32% of total
        'idle_resources': 38000,          # 26% of total
        'unused_pvs': 12000,              # 8% of total
        'oversized_nodes': 28000,         # 19% of total
        'inefficient_networking': 8000    # 6% of total
    },
    'top_cost_drivers': {
        'ml-training-jobs': 28000,        # 19% of total
        'data-processing': 23000,         # 16% of total
        'api-services': 19000,            # 13% of total
        'background-workers': 15000,      # 10% of total
        'dev-environments': 14000         # 10% of total
    }
}

print(f"Total identified waste: ${sum(cost_analysis['waste_breakdown'].values()):,}")
# Output: Total identified waste: $133,000

91% of our spending was waste or inefficiency.

Phase 2: Right-Sizing Resources

The Pod Resource Audit

class PodResourceAnalyzer:
    """
    Analyze actual pod resource usage vs requests/limits.
    """
    
    def __init__(self):
        self.k8s_client = kubernetes.client.CoreV1Api()
        self.metrics_api = kubernetes.client.CustomObjectsApi()
    
    def analyze_pod_waste(self, namespace='default', days=30):
        """
        Compare requested vs actual resource usage.
        """
        pods = self.k8s_client.list_namespaced_pod(namespace)
        
        waste_report = []
        
        for pod in pods.items:
            # Get resource requests
            requests = self.get_pod_requests(pod)
            
            # Get actual usage over time
            actual_usage = self.get_actual_usage(
                pod.metadata.name,
                namespace,
                days=days
            )
            
            # Calculate waste
            cpu_waste = requests['cpu'] - actual_usage['cpu_p95']
            memory_waste = requests['memory'] - actual_usage['memory_p95']
            
            if cpu_waste > requests['cpu'] * 0.5 or memory_waste > requests['memory'] * 0.5:
                waste_report.append({
                    'pod': pod.metadata.name,
                    'namespace': namespace,
                    'cpu_requested': requests['cpu'],
                    'cpu_actual_p95': actual_usage['cpu_p95'],
                    'cpu_waste_percent': (cpu_waste / requests['cpu']) * 100,
                    'memory_requested': requests['memory'],
                    'memory_actual_p95': actual_usage['memory_p95'],
                    'memory_waste_percent': (memory_waste / requests['memory']) * 100,
                    'monthly_waste_cost': self.calculate_waste_cost(cpu_waste, memory_waste)
                })
        
        return sorted(waste_report, key=lambda x: x['monthly_waste_cost'], reverse=True)
    
    def get_actual_usage(self, pod_name, namespace, days=30):
        """
        Query Prometheus for actual resource usage.
        """
        # Query for CPU usage (P95)
        cpu_query = f"""
        quantile_over_time(0.95, 
          rate(container_cpu_usage_seconds_total{{
            pod="{pod_name}",
            namespace="{namespace}"
          }}[5m])[{days}d:5m])
        """
        
        # Query for memory usage (P95)
        memory_query = f"""
        quantile_over_time(0.95,
          container_memory_working_set_bytes{{
            pod="{pod_name}",
            namespace="{namespace}"
          }}[{days}d:5m])
        """
        
        cpu_p95 = self.prometheus_query(cpu_query)
        memory_p95 = self.prometheus_query(memory_query)
        
        return {
            'cpu_p95': cpu_p95,
            'memory_p95': memory_p95
        }
    
    def generate_recommendations(self, waste_report):
        """
        Generate resource request recommendations.
        """
        recommendations = []
        
        for pod in waste_report:
            # Recommend 20% headroom above P95 usage
            recommended_cpu = pod['cpu_actual_p95'] * 1.2
            recommended_memory = pod['memory_actual_p95'] * 1.2
            
            recommendations.append({
                'pod': pod['pod'],
                'current_requests': {
                    'cpu': pod['cpu_requested'],
                    'memory': pod['memory_requested']
                },
                'recommended_requests': {
                    'cpu': recommended_cpu,
                    'memory': recommended_memory
                },
                'potential_savings': pod['monthly_waste_cost']
            })
        
        return recommendations

The Top 10 Worst Offenders

# Our actual findings
worst_offenders = [
    {
        'service': 'ml-training-worker',
        'cpu_requested': '8000m',
        'cpu_actual_p95': '450m',
        'memory_requested': '32Gi',
        'memory_actual_p95': '4.2Gi',
        'monthly_waste': '$8,400'
    },
    {
        'service': 'data-processor',
        'cpu_requested': '4000m',
        'cpu_actual_p95': '280m',
        'memory_requested': '16Gi',
        'memory_actual_p95': '2.1Gi',
        'monthly_waste': '$4,200'
    },
    # ... 8 more
]

total_waste_top_10 = sum([float(w['monthly_waste'].replace('$', '').replace(',', '')) 
                          for w in worst_offenders])

print(f"Top 10 services waste: ${total_waste_top_10:,.0f}/month")
# Output: Top 10 services waste: $38,000/month

The Right-Sizing Implementation

# BEFORE: Typical over-provisioned pod
apiVersion: v1
kind: Pod
metadata:
  name: api-service
spec:
  containers:
  - name: api
    image: api-service:v1.2.3
    resources:
      requests:
        memory: "4Gi"    # Way too much!
        cpu: "2000m"     # Way too much!
      limits:
        memory: "8Gi"    # Even worse!
        cpu: "4000m"     # Even worse!

---
# AFTER: Right-sized based on actual usage
apiVersion: v1
kind: Pod
metadata:
  name: api-service
spec:
  containers:
  - name: api
    image: api-service:v1.2.3
    resources:
      requests:
        memory: "512Mi"  # Based on P95 + 20% headroom
        cpu: "250m"      # Based on P95 + 20% headroom
      limits:
        memory: "1Gi"    # 2x requests for burst capacity
        cpu: "500m"      # 2x requests for burst capacity

Results from right-sizing:

Monthly savings: $47,000
Pods affected: 847
Performance impact: None (actually improved due to better scheduling)

Phase 3: Spot Instances for Stateless Workloads

The Spot Instance Strategy

class SpotInstanceManager:
    """
    Manage spot instances for cost-effective Kubernetes workloads.
    """
    
    def __init__(self):
        self.spot_strategies = {
            'diversified': {
                'instance_types': [
                    'm5.2xlarge',
                    'm5a.2xlarge',
                    'm5n.2xlarge',
                    'm4.2xlarge'
                ],
                'availability_zones': ['us-east-1a', 'us-east-1b', 'us-east-1c'],
                'strategy': 'capacity-optimized'
            }
        }
    
    def create_spot_node_group(self, cluster_name, workload_type):
        """
        Create spot instance node group with interruption handling.
        """
        if workload_type == 'stateless':
            # Stateless workloads can tolerate interruptions
            config = {
                'name': f'{cluster_name}-spot-stateless',
                'instance_types': self.spot_strategies['diversified']['instance_types'],
                'capacity_type': 'SPOT',
                'desired_capacity': 10,
                'min_size': 5,
                'max_size': 50,
                'labels': {
                    'node-type': 'spot',
                    'workload': 'stateless'
                },
                'taints': [{
                    'key': 'spot',
                    'value': 'true',
                    'effect': 'NoSchedule'
                }]
            }
        elif workload_type == 'stateful':
            # Stateful workloads need on-demand fallback
            config = {
                'name': f'{cluster_name}-spot-stateful',
                'instance_types': ['r5.xlarge', 'r5a.xlarge'],
                'capacity_type': 'MIXED',  # Mix of spot and on-demand
                'spot_percentage': 70,     # 70% spot, 30% on-demand
                'desired_capacity': 5,
                'min_size': 2,
                'max_size': 20
            }
        
        return config
    
    def handle_spot_interruption(self):
        """
        Handle spot instance interruptions gracefully.
        """
        interruption_handler = """
        apiVersion: v1
        kind: ServiceAccount
        metadata:
          name: aws-node-termination-handler
          namespace: kube-system
        ---
        apiVersion: apps/v1
        kind: DaemonSet
        metadata:
          name: aws-node-termination-handler
          namespace: kube-system
        spec:
          selector:
            matchLabels:
              app: aws-node-termination-handler
          template:
            spec:
              serviceAccountName: aws-node-termination-handler
              containers:
              - name: handler
                image: public.ecr.aws/aws-ec2/aws-node-termination-handler:v1.19.0
                env:
                - name: NODE_NAME
                  valueFrom:
                    fieldRef:
                      fieldPath: spec.nodeName
                - name: POD_NAME
                  valueFrom:
                    fieldRef:
                      fieldPath: metadata.name
                - name: NAMESPACE
                  valueFrom:
                    fieldRef:
                      fieldPath: metadata.namespace
                - name: ENABLE_SPOT_INTERRUPTION_DRAINING
                  value: "true"
                - name: ENABLE_SCHEDULED_EVENT_DRAINING
                  value: "true"
        """
        
        return interruption_handler

Configuring Workloads for Spot

# Deployment optimized for spot instances
apiVersion: apps/v1
kind: Deployment
metadata:
  name: background-worker
spec:
  replicas: 20
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 30%  # Tolerate more unavailability for cost savings
      maxSurge: 50%        # Faster replacement during interruptions
  template:
    spec:
      # Tolerate spot instance taints
      tolerations:
      - key: "spot"
        operator: "Equal"
        value: "true"
        effect: "NoSchedule"
      
      # Prefer spot nodes
      affinity:
        nodeAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            preference:
              matchExpressions:
              - key: node-type
                operator: In
                values:
                - spot
      
      # Handle disruptions gracefully
      terminationGracePeriodSeconds: 120  # Allow cleanup before termination
      
      containers:
      - name: worker
        image: background-worker:v2.1.0
        resources:
          requests:
            cpu: 500m
            memory: 1Gi
          limits:
            cpu: 1000m
            memory: 2Gi

Results from spot instances:

Monthly savings: $38,000
Spot percentage: 72% of stateless workloads
Average interruption rate: 3.2% (well within tolerance)
Spot savings rate: Average 68% discount vs on-demand

Phase 4: Cluster Autoscaling and Bin Packing

Implementing Cluster Autoscaler

# Cluster Autoscaler deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: cluster-autoscaler
  namespace: kube-system
spec:
  replicas: 1
  template:
    spec:
      containers:
      - name: cluster-autoscaler
        image: k8s.gcr.io/autoscaling/cluster-autoscaler:v1.24.0
        command:
        - ./cluster-autoscaler
        - --v=4
        - --stderrthreshold=info
        - --cloud-provider=aws
        - --skip-nodes-with-local-storage=false
        - --expander=least-waste  # Optimize for cost
        - --balance-similar-node-groups
        - --skip-nodes-with-system-pods=false
        - --scale-down-enabled=true
        - --scale-down-unneeded-time=5m  # Aggressive scale-down
        - --scale-down-utilization-threshold=0.5
        - --max-node-provision-time=15m
        env:
        - name: AWS_REGION
          value: us-east-1

Improving Pod Bin Packing

class PodSchedulingOptimizer:
    """
    Optimize pod scheduling for better bin packing.
    """
    
    def configure_priority_classes(self):
        """
        Create priority classes for workload prioritization.
        """
        priority_classes = {
            'critical': {
                'value': 1000000,
                'description': 'Critical production workloads',
                'preemptionPolicy': 'Never'
            },
            'high': {
                'value': 100000,
                'description': 'High priority production workloads',
                'preemptionPolicy': 'PreemptLowerPriority'
            },
            'normal': {
                'value': 10000,
                'description': 'Normal production workloads',
                'preemptionPolicy': 'PreemptLowerPriority'
            },
            'low': {
                'value': 1000,
                'description': 'Best-effort workloads',
                'preemptionPolicy': 'PreemptLowerPriority'
            }
        }
        
        return priority_classes
    
    def optimize_pod_affinity(self, deployment_name):
        """
        Configure pod affinity for better bin packing.
        """
        affinity = {
            'podAntiAffinity': {
                'preferredDuringSchedulingIgnoredDuringExecution': [{
                    'weight': 100,
                    'podAffinityTerm': {
                        'labelSelector': {
                            'matchExpressions': [{
                                'key': 'app',
                                'operator': 'In',
                                'values': [deployment_name]
                            }]
                        },
                        'topologyKey': 'kubernetes.io/hostname'
                    }
                }]
            }
        }
        
        return affinity

Results from autoscaling:

Monthly savings: $18,000
Average cluster utilization: Increased from 23% to 71%
Node count reduction: From 187 to 67 nodes (64% reduction)
Scale-down time: Average 8 minutes

Phase 5: Storage Optimization

The PVC Audit

# Find unused Persistent Volumes
kubectl get pv | grep Released | wc -l
# Output: 347 released PVs

# Calculate storage waste
kubectl get pv -o json | jq '[.items[] | select(.status.phase=="Released")] | map(.spec.capacity.storage) | add'
# Output: "42Ti"

# Monthly cost of unused storage
# 42 TB × $0.10/GB-month = $4,200/month wasted

Automated PVC Cleanup

class PVCCleanupManager:
    """
    Automated cleanup of unused persistent volumes.
    """
    
    def __init__(self):
        self.k8s_client = kubernetes.client.CoreV1Api()
        self.storage_api = kubernetes.client.StorageV1Api()
    
    def find_orphaned_pvcs(self, days_unused=30):
        """
        Find PVCs not attached to any pods.
        """
        all_pvcs = self.k8s_client.list_persistent_volume_claim_for_all_namespaces()
        all_pods = self.k8s_client.list_pod_for_all_namespaces()
        
        # Get all PVCs in use
        pvcs_in_use = set()
        for pod in all_pods.items:
            if pod.spec.volumes:
                for volume in pod.spec.volumes:
                    if volume.persistent_volume_claim:
                        pvcs_in_use.add(volume.persistent_volume_claim.claim_name)
        
        # Find orphaned PVCs
        orphaned_pvcs = []
        for pvc in all_pvcs.items:
            if pvc.metadata.name not in pvcs_in_use:
                # Check if unused for specified duration
                last_used = self.get_last_used_time(pvc)
                if (datetime.now() - last_used).days > days_unused:
                    orphaned_pvcs.append({
                        'name': pvc.metadata.name,
                        'namespace': pvc.metadata.namespace,
                        'size': pvc.spec.resources.requests['storage'],
                        'last_used': last_used,
                        'estimated_cost': self.calculate_pvc_cost(pvc)
                    })
        
        return orphaned_pvcs
    
    def cleanup_orphaned_pvcs(self, dry_run=True):
        """
        Clean up orphaned PVCs with safety checks.
        """
        orphaned = self.find_orphaned_pvcs()
        
        print(f"Found {len(orphaned)} orphaned PVCs")
        print(f"Total monthly cost: ${sum(p['estimated_cost'] for p in orphaned):,.2f}")
        
        if not dry_run:
            for pvc in orphaned:
                # Backup before deletion
                self.backup_pvc(pvc)
                
                # Delete PVC
                self.k8s_client.delete_namespaced_persistent_volume_claim(
                    name=pvc['name'],
                    namespace=pvc['namespace']
                )
                
                print(f"✅ Deleted PVC: {pvc['name']}")

Results from storage optimization:

Monthly savings: $6,800
Deleted PVCs: 347
Freed storage: 42 TB
Storage utilization: Improved from 48% to 89%

The Final Results: 67% Cost Reduction

Complete Transformation

cost_transformation = {
    'month_0_baseline': {
        'total_cost': 145000,
        'ec2_instances': 89000,
        'ebs_volumes': 23000,
        'data_transfer': 18000,
        'load_balancers': 9000,
        'nat_gateways': 6000
    },
    'month_4_optimized': {
        'total_cost': 48000,
        'ec2_instances': 28000,  # 69% reduction
        'ebs_volumes': 7000,     # 70% reduction
        'data_transfer': 7000,   # 61% reduction
        'load_balancers': 4000,  # 56% reduction
        'nat_gateways': 2000     # 67% reduction
    },
    'optimizations': {
        'right_sizing': {
            'monthly_savings': 47000,
            'percentage': 48
        },
        'spot_instances': {
            'monthly_savings': 38000,
            'percentage': 39
        },
        'autoscaling': {
            'monthly_savings': 18000,
            'percentage': 19
        },
        'storage_cleanup': {
            'monthly_savings': 6800,
            'percentage': 7
        }
    },
    'performance_impact': {
        'avg_response_time': '-8%',  # Actually improved!
        'p95_response_time': '-12%',
        'availability': 'no change',
        'deployment_frequency': '+15%'
    }
}

total_savings = sum([opt['monthly_savings'] 
                     for opt in cost_transformation['optimizations'].values()])

print(f"Total monthly savings: ${total_savings:,}")
print(f"Annual savings: ${total_savings * 12:,}")
print(f"Cost reduction: {((145000 - 48000) / 145000) * 100:.1f}%")

Output:

Total monthly savings: $109,800
Annual savings: $1,317,600
Cost reduction: 67.0%

The Lessons: What Actually Moved the Needle

1. Visibility First

You can’t optimize what you can’t measure. Installing Kubecost was the best $500/month we ever spent.

2. Right-Sizing Has the Biggest Impact

48% of our savings came from right-sizing resource requests. Most pods request 4-8x more resources than they use.

3. Spot Instances Are Production-Ready

With proper interruption handling, spot instances are reliable enough for 70%+ of workloads.

4. Automation is Non-Negotiable

Manual optimization doesn’t scale. Implement:

Cluster Autoscaler for node scaling
Vertical Pod Autoscaler for pod right-sizing
Automated PVC cleanup
Cost anomaly detection

5. Start with Low-Hanging Fruit

Don’t try to optimize everything at once:

Week 1-2: Install monitoring, gather data
Week 3-4: Right-size obvious waste
Week 5-6: Implement spot instances
Week 7-8: Enable autoscaling
Week 9-12: Fine-tune and optimize

Key Takeaways

✅ Install Kubecost or similar for visibility
✅ Right-size based on actual P95 usage + 20% headroom
✅ Use spot instances for 70%+ of stateless workloads
✅ Enable Cluster Autoscaler with aggressive scale-down
✅ Clean up unused PVCs monthly
✅ Implement pod priority classes
✅ Use mixed instance types for availability
✅ Monitor cost trends weekly
✅ Set up cost anomaly alerts
✅ Review and optimize quarterly

Conclusion: FinOps as a Practice

Cost optimization isn’t a one-time project—it’s an ongoing practice. We now review costs weekly, have automated alerts for anomalies, and treat cost efficiency as a core engineering metric.

Our 67% cost reduction came from:

48% from right-sizing
39% from spot instances
19% from autoscaling
7% from storage cleanup

The best part? Performance actually improved due to better resource utilization and scheduling.

For more on Kubernetes advanced patterns, container orchestration strategies, and Kubernetes operators, check out CrashBytes.

Additional Resources

These tools and resources were critical to our cost optimization success:

This post is part of my implementation series, where I share real-world lessons from production cost optimization. For more on FinOps strategies and cloud cost management, visit CrashBytes.