The Wake-Up Call: $145K Per Month
Our CFO walked into the engineering standup holding a printed AWS bill.
“Why are we spending $145,000 per month on Kubernetes? That’s more than our entire sales team’s salaries.”
We didn’t have a good answer.
This is the story of how we cut our Kubernetes costs by 67% in 4 months—from $145K/month to $48K/month—while actually improving performance and reliability.
The Starting Point: Cloud Waste at Scale
Our Kubernetes setup was a classic example of “it works, don’t touch it”:
Kubernetes Clusters: 3 (prod, staging, dev)
Total Nodes: 187 EC2 instances
Instance Types:
- 42x m5.8xlarge (production)
- 28x m5.4xlarge (production)
- 45x m5.2xlarge (staging/dev)
- 72x m5.xlarge (mixed)
Monthly Costs:
- EC2 instances: $89,000
- EBS volumes: $23,000
- Data transfer: $18,000
- Load balancers: $9,000
- NAT gateways: $6,000
Total: $145,000/month
Cluster Utilization:
- Average CPU: 23%
- Average Memory: 31%
- Peak CPU: 68%
- Peak Memory: 72%
We were paying for 187 instances but using resources equivalent to ~50 instances.
Phase 1: The Visibility Problem
Before optimizing, we needed to understand what we were actually using.
Installing Kubecost
# Deploy Kubecost for cost visibility
helm repo add kubecost https://kubecost.github.io/cost-analyzer/
helm repo update
helm install kubecost kubecost/cost-analyzer \
--namespace kubecost --create-namespace \
--set kubecostToken=$KUBECOST_TOKEN \
--set prometheus.server.global.external_labels.cluster_id=production
# Access Kubecost UI
kubectl port-forward -n kubecost svc/kubecost-cost-analyzer 9090:9090
The Shocking Discovery
After 48 hours of data collection, Kubecost revealed the truth:
cost_analysis = {
'total_monthly_cost': 145000,
'waste_breakdown': {
'over_provisioned_pods': 47000, # 32% of total
'idle_resources': 38000, # 26% of total
'unused_pvs': 12000, # 8% of total
'oversized_nodes': 28000, # 19% of total
'inefficient_networking': 8000 # 6% of total
},
'top_cost_drivers': {
'ml-training-jobs': 28000, # 19% of total
'data-processing': 23000, # 16% of total
'api-services': 19000, # 13% of total
'background-workers': 15000, # 10% of total
'dev-environments': 14000 # 10% of total
}
}
print(f"Total identified waste: ${sum(cost_analysis['waste_breakdown'].values()):,}")
# Output: Total identified waste: $133,000
91% of our spending was waste or inefficiency.
Phase 2: Right-Sizing Resources
The Pod Resource Audit
class PodResourceAnalyzer:
"""
Analyze actual pod resource usage vs requests/limits.
"""
def __init__(self):
self.k8s_client = kubernetes.client.CoreV1Api()
self.metrics_api = kubernetes.client.CustomObjectsApi()
def analyze_pod_waste(self, namespace='default', days=30):
"""
Compare requested vs actual resource usage.
"""
pods = self.k8s_client.list_namespaced_pod(namespace)
waste_report = []
for pod in pods.items:
# Get resource requests
requests = self.get_pod_requests(pod)
# Get actual usage over time
actual_usage = self.get_actual_usage(
pod.metadata.name,
namespace,
days=days
)
# Calculate waste
cpu_waste = requests['cpu'] - actual_usage['cpu_p95']
memory_waste = requests['memory'] - actual_usage['memory_p95']
if cpu_waste > requests['cpu'] * 0.5 or memory_waste > requests['memory'] * 0.5:
waste_report.append({
'pod': pod.metadata.name,
'namespace': namespace,
'cpu_requested': requests['cpu'],
'cpu_actual_p95': actual_usage['cpu_p95'],
'cpu_waste_percent': (cpu_waste / requests['cpu']) * 100,
'memory_requested': requests['memory'],
'memory_actual_p95': actual_usage['memory_p95'],
'memory_waste_percent': (memory_waste / requests['memory']) * 100,
'monthly_waste_cost': self.calculate_waste_cost(cpu_waste, memory_waste)
})
return sorted(waste_report, key=lambda x: x['monthly_waste_cost'], reverse=True)
def get_actual_usage(self, pod_name, namespace, days=30):
"""
Query Prometheus for actual resource usage.
"""
# Query for CPU usage (P95)
cpu_query = f"""
quantile_over_time(0.95,
rate(container_cpu_usage_seconds_total{{
pod="{pod_name}",
namespace="{namespace}"
}}[5m])[{days}d:5m])
"""
# Query for memory usage (P95)
memory_query = f"""
quantile_over_time(0.95,
container_memory_working_set_bytes{{
pod="{pod_name}",
namespace="{namespace}"
}}[{days}d:5m])
"""
cpu_p95 = self.prometheus_query(cpu_query)
memory_p95 = self.prometheus_query(memory_query)
return {
'cpu_p95': cpu_p95,
'memory_p95': memory_p95
}
def generate_recommendations(self, waste_report):
"""
Generate resource request recommendations.
"""
recommendations = []
for pod in waste_report:
# Recommend 20% headroom above P95 usage
recommended_cpu = pod['cpu_actual_p95'] * 1.2
recommended_memory = pod['memory_actual_p95'] * 1.2
recommendations.append({
'pod': pod['pod'],
'current_requests': {
'cpu': pod['cpu_requested'],
'memory': pod['memory_requested']
},
'recommended_requests': {
'cpu': recommended_cpu,
'memory': recommended_memory
},
'potential_savings': pod['monthly_waste_cost']
})
return recommendations
The Top 10 Worst Offenders
# Our actual findings
worst_offenders = [
{
'service': 'ml-training-worker',
'cpu_requested': '8000m',
'cpu_actual_p95': '450m',
'memory_requested': '32Gi',
'memory_actual_p95': '4.2Gi',
'monthly_waste': '$8,400'
},
{
'service': 'data-processor',
'cpu_requested': '4000m',
'cpu_actual_p95': '280m',
'memory_requested': '16Gi',
'memory_actual_p95': '2.1Gi',
'monthly_waste': '$4,200'
},
# ... 8 more
]
total_waste_top_10 = sum([float(w['monthly_waste'].replace('$', '').replace(',', ''))
for w in worst_offenders])
print(f"Top 10 services waste: ${total_waste_top_10:,.0f}/month")
# Output: Top 10 services waste: $38,000/month
The Right-Sizing Implementation
# BEFORE: Typical over-provisioned pod
apiVersion: v1
kind: Pod
metadata:
name: api-service
spec:
containers:
- name: api
image: api-service:v1.2.3
resources:
requests:
memory: "4Gi" # Way too much!
cpu: "2000m" # Way too much!
limits:
memory: "8Gi" # Even worse!
cpu: "4000m" # Even worse!
---
# AFTER: Right-sized based on actual usage
apiVersion: v1
kind: Pod
metadata:
name: api-service
spec:
containers:
- name: api
image: api-service:v1.2.3
resources:
requests:
memory: "512Mi" # Based on P95 + 20% headroom
cpu: "250m" # Based on P95 + 20% headroom
limits:
memory: "1Gi" # 2x requests for burst capacity
cpu: "500m" # 2x requests for burst capacity
Results from right-sizing:
- Monthly savings: $47,000
- Pods affected: 847
- Performance impact: None (actually improved due to better scheduling)
Phase 3: Spot Instances for Stateless Workloads
The Spot Instance Strategy
class SpotInstanceManager:
"""
Manage spot instances for cost-effective Kubernetes workloads.
"""
def __init__(self):
self.spot_strategies = {
'diversified': {
'instance_types': [
'm5.2xlarge',
'm5a.2xlarge',
'm5n.2xlarge',
'm4.2xlarge'
],
'availability_zones': ['us-east-1a', 'us-east-1b', 'us-east-1c'],
'strategy': 'capacity-optimized'
}
}
def create_spot_node_group(self, cluster_name, workload_type):
"""
Create spot instance node group with interruption handling.
"""
if workload_type == 'stateless':
# Stateless workloads can tolerate interruptions
config = {
'name': f'{cluster_name}-spot-stateless',
'instance_types': self.spot_strategies['diversified']['instance_types'],
'capacity_type': 'SPOT',
'desired_capacity': 10,
'min_size': 5,
'max_size': 50,
'labels': {
'node-type': 'spot',
'workload': 'stateless'
},
'taints': [{
'key': 'spot',
'value': 'true',
'effect': 'NoSchedule'
}]
}
elif workload_type == 'stateful':
# Stateful workloads need on-demand fallback
config = {
'name': f'{cluster_name}-spot-stateful',
'instance_types': ['r5.xlarge', 'r5a.xlarge'],
'capacity_type': 'MIXED', # Mix of spot and on-demand
'spot_percentage': 70, # 70% spot, 30% on-demand
'desired_capacity': 5,
'min_size': 2,
'max_size': 20
}
return config
def handle_spot_interruption(self):
"""
Handle spot instance interruptions gracefully.
"""
interruption_handler = """
apiVersion: v1
kind: ServiceAccount
metadata:
name: aws-node-termination-handler
namespace: kube-system
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: aws-node-termination-handler
namespace: kube-system
spec:
selector:
matchLabels:
app: aws-node-termination-handler
template:
spec:
serviceAccountName: aws-node-termination-handler
containers:
- name: handler
image: public.ecr.aws/aws-ec2/aws-node-termination-handler:v1.19.0
env:
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: ENABLE_SPOT_INTERRUPTION_DRAINING
value: "true"
- name: ENABLE_SCHEDULED_EVENT_DRAINING
value: "true"
"""
return interruption_handler
Configuring Workloads for Spot
# Deployment optimized for spot instances
apiVersion: apps/v1
kind: Deployment
metadata:
name: background-worker
spec:
replicas: 20
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 30% # Tolerate more unavailability for cost savings
maxSurge: 50% # Faster replacement during interruptions
template:
spec:
# Tolerate spot instance taints
tolerations:
- key: "spot"
operator: "Equal"
value: "true"
effect: "NoSchedule"
# Prefer spot nodes
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
preference:
matchExpressions:
- key: node-type
operator: In
values:
- spot
# Handle disruptions gracefully
terminationGracePeriodSeconds: 120 # Allow cleanup before termination
containers:
- name: worker
image: background-worker:v2.1.0
resources:
requests:
cpu: 500m
memory: 1Gi
limits:
cpu: 1000m
memory: 2Gi
Results from spot instances:
- Monthly savings: $38,000
- Spot percentage: 72% of stateless workloads
- Average interruption rate: 3.2% (well within tolerance)
- Spot savings rate: Average 68% discount vs on-demand
Phase 4: Cluster Autoscaling and Bin Packing
Implementing Cluster Autoscaler
# Cluster Autoscaler deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: cluster-autoscaler
namespace: kube-system
spec:
replicas: 1
template:
spec:
containers:
- name: cluster-autoscaler
image: k8s.gcr.io/autoscaling/cluster-autoscaler:v1.24.0
command:
- ./cluster-autoscaler
- --v=4
- --stderrthreshold=info
- --cloud-provider=aws
- --skip-nodes-with-local-storage=false
- --expander=least-waste # Optimize for cost
- --balance-similar-node-groups
- --skip-nodes-with-system-pods=false
- --scale-down-enabled=true
- --scale-down-unneeded-time=5m # Aggressive scale-down
- --scale-down-utilization-threshold=0.5
- --max-node-provision-time=15m
env:
- name: AWS_REGION
value: us-east-1
Improving Pod Bin Packing
class PodSchedulingOptimizer:
"""
Optimize pod scheduling for better bin packing.
"""
def configure_priority_classes(self):
"""
Create priority classes for workload prioritization.
"""
priority_classes = {
'critical': {
'value': 1000000,
'description': 'Critical production workloads',
'preemptionPolicy': 'Never'
},
'high': {
'value': 100000,
'description': 'High priority production workloads',
'preemptionPolicy': 'PreemptLowerPriority'
},
'normal': {
'value': 10000,
'description': 'Normal production workloads',
'preemptionPolicy': 'PreemptLowerPriority'
},
'low': {
'value': 1000,
'description': 'Best-effort workloads',
'preemptionPolicy': 'PreemptLowerPriority'
}
}
return priority_classes
def optimize_pod_affinity(self, deployment_name):
"""
Configure pod affinity for better bin packing.
"""
affinity = {
'podAntiAffinity': {
'preferredDuringSchedulingIgnoredDuringExecution': [{
'weight': 100,
'podAffinityTerm': {
'labelSelector': {
'matchExpressions': [{
'key': 'app',
'operator': 'In',
'values': [deployment_name]
}]
},
'topologyKey': 'kubernetes.io/hostname'
}
}]
}
}
return affinity
Results from autoscaling:
- Monthly savings: $18,000
- Average cluster utilization: Increased from 23% to 71%
- Node count reduction: From 187 to 67 nodes (64% reduction)
- Scale-down time: Average 8 minutes
Phase 5: Storage Optimization
The PVC Audit
# Find unused Persistent Volumes
kubectl get pv | grep Released | wc -l
# Output: 347 released PVs
# Calculate storage waste
kubectl get pv -o json | jq '[.items[] | select(.status.phase=="Released")] | map(.spec.capacity.storage) | add'
# Output: "42Ti"
# Monthly cost of unused storage
# 42 TB × $0.10/GB-month = $4,200/month wasted
Automated PVC Cleanup
class PVCCleanupManager:
"""
Automated cleanup of unused persistent volumes.
"""
def __init__(self):
self.k8s_client = kubernetes.client.CoreV1Api()
self.storage_api = kubernetes.client.StorageV1Api()
def find_orphaned_pvcs(self, days_unused=30):
"""
Find PVCs not attached to any pods.
"""
all_pvcs = self.k8s_client.list_persistent_volume_claim_for_all_namespaces()
all_pods = self.k8s_client.list_pod_for_all_namespaces()
# Get all PVCs in use
pvcs_in_use = set()
for pod in all_pods.items:
if pod.spec.volumes:
for volume in pod.spec.volumes:
if volume.persistent_volume_claim:
pvcs_in_use.add(volume.persistent_volume_claim.claim_name)
# Find orphaned PVCs
orphaned_pvcs = []
for pvc in all_pvcs.items:
if pvc.metadata.name not in pvcs_in_use:
# Check if unused for specified duration
last_used = self.get_last_used_time(pvc)
if (datetime.now() - last_used).days > days_unused:
orphaned_pvcs.append({
'name': pvc.metadata.name,
'namespace': pvc.metadata.namespace,
'size': pvc.spec.resources.requests['storage'],
'last_used': last_used,
'estimated_cost': self.calculate_pvc_cost(pvc)
})
return orphaned_pvcs
def cleanup_orphaned_pvcs(self, dry_run=True):
"""
Clean up orphaned PVCs with safety checks.
"""
orphaned = self.find_orphaned_pvcs()
print(f"Found {len(orphaned)} orphaned PVCs")
print(f"Total monthly cost: ${sum(p['estimated_cost'] for p in orphaned):,.2f}")
if not dry_run:
for pvc in orphaned:
# Backup before deletion
self.backup_pvc(pvc)
# Delete PVC
self.k8s_client.delete_namespaced_persistent_volume_claim(
name=pvc['name'],
namespace=pvc['namespace']
)
print(f"✅ Deleted PVC: {pvc['name']}")
Results from storage optimization:
- Monthly savings: $6,800
- Deleted PVCs: 347
- Freed storage: 42 TB
- Storage utilization: Improved from 48% to 89%
The Final Results: 67% Cost Reduction
Complete Transformation
cost_transformation = {
'month_0_baseline': {
'total_cost': 145000,
'ec2_instances': 89000,
'ebs_volumes': 23000,
'data_transfer': 18000,
'load_balancers': 9000,
'nat_gateways': 6000
},
'month_4_optimized': {
'total_cost': 48000,
'ec2_instances': 28000, # 69% reduction
'ebs_volumes': 7000, # 70% reduction
'data_transfer': 7000, # 61% reduction
'load_balancers': 4000, # 56% reduction
'nat_gateways': 2000 # 67% reduction
},
'optimizations': {
'right_sizing': {
'monthly_savings': 47000,
'percentage': 48
},
'spot_instances': {
'monthly_savings': 38000,
'percentage': 39
},
'autoscaling': {
'monthly_savings': 18000,
'percentage': 19
},
'storage_cleanup': {
'monthly_savings': 6800,
'percentage': 7
}
},
'performance_impact': {
'avg_response_time': '-8%', # Actually improved!
'p95_response_time': '-12%',
'availability': 'no change',
'deployment_frequency': '+15%'
}
}
total_savings = sum([opt['monthly_savings']
for opt in cost_transformation['optimizations'].values()])
print(f"Total monthly savings: ${total_savings:,}")
print(f"Annual savings: ${total_savings * 12:,}")
print(f"Cost reduction: {((145000 - 48000) / 145000) * 100:.1f}%")
Output:
Total monthly savings: $109,800
Annual savings: $1,317,600
Cost reduction: 67.0%
The Lessons: What Actually Moved the Needle
1. Visibility First
You can’t optimize what you can’t measure. Installing Kubecost was the best $500/month we ever spent.
2. Right-Sizing Has the Biggest Impact
48% of our savings came from right-sizing resource requests. Most pods request 4-8x more resources than they use.
3. Spot Instances Are Production-Ready
With proper interruption handling, spot instances are reliable enough for 70%+ of workloads.
4. Automation is Non-Negotiable
Manual optimization doesn’t scale. Implement:
- Cluster Autoscaler for node scaling
- Vertical Pod Autoscaler for pod right-sizing
- Automated PVC cleanup
- Cost anomaly detection
5. Start with Low-Hanging Fruit
Don’t try to optimize everything at once:
- Week 1-2: Install monitoring, gather data
- Week 3-4: Right-size obvious waste
- Week 5-6: Implement spot instances
- Week 7-8: Enable autoscaling
- Week 9-12: Fine-tune and optimize
Key Takeaways
✅ Install Kubecost or similar for visibility
✅ Right-size based on actual P95 usage + 20% headroom
✅ Use spot instances for 70%+ of stateless workloads
✅ Enable Cluster Autoscaler with aggressive scale-down
✅ Clean up unused PVCs monthly
✅ Implement pod priority classes
✅ Use mixed instance types for availability
✅ Monitor cost trends weekly
✅ Set up cost anomaly alerts
✅ Review and optimize quarterly
Conclusion: FinOps as a Practice
Cost optimization isn’t a one-time project—it’s an ongoing practice. We now review costs weekly, have automated alerts for anomalies, and treat cost efficiency as a core engineering metric.
Our 67% cost reduction came from:
- 48% from right-sizing
- 39% from spot instances
- 19% from autoscaling
- 7% from storage cleanup
The best part? Performance actually improved due to better resource utilization and scheduling.
For more on Kubernetes advanced patterns, container orchestration strategies, and Kubernetes operators, check out CrashBytes.
Additional Resources
These tools and resources were critical to our cost optimization success:
- Kubecost - Kubernetes Cost Monitoring
- AWS Node Termination Handler
- Kubernetes Cluster Autoscaler
- Vertical Pod Autoscaler
- Goldilocks - VPA Recommendations
- AWS Savings Plans Calculator
- Spot by NetApp (formerly Spot.io)
- Karpenter - Node Autoscaling
- FinOps Foundation Best Practices
- AWS Cost Optimization Guide
- Google Cloud Cost Optimization
- CNCF FinOps for Kubernetes
This post is part of my implementation series, where I share real-world lessons from production cost optimization. For more on FinOps strategies and cloud cost management, visit CrashBytes.