The Ingress Problem: When Annotations Become Technical Debt
By late 2024, our Kubernetes Ingress situation had become absurd:
Our largest Ingress resource:
- 847 lines of YAML
- 93 custom annotations
- Comments warning: “DO NOT TOUCH THIS WITHOUT APPROVAL”
- Last successfully modified: 6 months ago
- Engineers who understood it: 2 (one left the company)
Every new routing rule required:
- 45-60 minutes of careful annotation editing
- Three-person code review (because one person breaking it cost us $80K)
- Prayer that nobody would fat-finger a regex
- Crossing fingers during deployment
We needed a better way. The Kubernetes Gateway API promised exactly that.
But migration is never as simple as the tutorials make it look.
Decision: Gateway API or Stick with Ingress Hell?
The Case FOR Gateway API
Pros:
- Role-oriented design (platform team vs. dev team concerns separated)
- First-class support for advanced routing (headers, weights, mirrors)
- Protocol extensibility (HTTP/2, gRPC, TCP, TLS)
- Strong vendor support (everyone’s adopting it)
Cons:
- Still “beta” status (we’d be early adopters)
- Learning curve (new concepts, new patterns)
- Migration complexity (200+ Ingress resources to convert)
- Risk of bugs in early implementations
The Case AGAINST Gateway API
Our infrastructure team’s concerns:
- “If it ain’t broke…” (Ingress works, mostly)
- Unknown unknowns (what edge cases will we hit?)
- Training overhead (60 engineers need to learn new APIs)
- Tooling gaps (existing scripts/automation won’t work)
The deciding vote: Our CTO read a study showing 30% latency improvements from better routing algorithms in Gateway API implementations.
Decision: We migrate. But carefully.
The Migration Strategy: Crawl, Walk, Run, Sprint
We rejected the “big bang” approach immediately. Our plan:
Phase 1: Proof of Concept (2 Weeks)
- Single service, non-critical
- Learn the Gateway API patterns
- Identify tooling gaps
Phase 2: Production Validation (4 Weeks)
- 5 production services with moderate traffic
- Shadow deployment (run both Ingress and Gateway side-by-side)
- Measure everything, trust nothing
Phase 3: Progressive Rollout (8 Weeks)
- Batch migrations: 10 services per week
- Automated conversion tooling
- Rollback plan for every deployment
Phase 4: Decommission Ingress (2 Weeks)
- Final cleanup
- Documentation and training
- Celebration (spoiler: we earned it)
Phase 1: The Proof of Concept That Almost Failed
We picked a simple service: blog-api
with 3 routes, 2K req/sec traffic.
How hard could it be?
Attempt 1: Direct Translation (Failed)
Our first attempt was to directly translate the Ingress YAML:
# OLD: Ingress (worked fine)
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: blog-api
annotations:
nginx.ingress.kubernetes.io/rewrite-target: /$2
nginx.ingress.kubernetes.io/ssl-redirect: "true"
spec:
ingressClassName: nginx
rules:
- host: api.example.com
http:
paths:
- path: /v1/posts(/|$)(.*)
pathType: Prefix
backend:
service:
name: blog-api
port:
number: 8080
NEW: Gateway API (first attempt - BROKEN)
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: blog-api
spec:
parentRefs:
- name: prod-gateway
namespace: gateway-system
hostnames:
- "api.example.com"
rules:
- matches:
- path:
type: PathPrefix
value: /v1/posts
backendRefs:
- name: blog-api
port: 8080
Deployed it. Production broke immediately.
Problem: Gateway API doesn’t support regex path rewrites via annotations. That’s an Ingress-ism.
Solution: HTTPRoute Filters
Gateway API handles this through explicit filters:
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: blog-api
spec:
parentRefs:
- name: prod-gateway
namespace: gateway-system
hostnames:
- "api.example.com"
rules:
- matches:
- path:
type: PathPrefix
value: /v1/posts
filters:
- type: URLRewrite
urlRewrite:
path:
type: ReplacePrefixMatch
replacePrefixMatch: /
backendRefs:
- name: blog-api
port: 8080
This time it worked.
Phase 2: Production Validation - The 2AM Rollback
We migrated 5 services to Gateway API. Everything looked great in staging.
Then we hit production traffic.
The Incident: Certificate Rotation Broke Everything
Timeline:
- 2:14 AM: Automated cert rotation begins
- 2:18 AM: All Gateway API routes return 503
- 2:19 AM: Pages start firing
- 2:23 AM: Emergency rollback to Ingress
- 2:35 AM: Services restored
Root cause: Our cert-manager integration expected Ingress annotations. Gateway API uses different certificate reference mechanisms.
The fix: Updated cert-manager configuration:
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: prod-gateway
namespace: gateway-system
spec:
gatewayClassName: nginx
listeners:
- name: https
protocol: HTTPS
port: 443
hostname: "*.example.com"
tls:
mode: Terminate
certificateRefs:
- kind: Secret
name: wildcard-tls-cert
namespace: cert-manager # KEY CHANGE
Lesson learned: Don’t assume certificate management “just works”. Test cert rotation explicitly.
Phase 3: The Automated Migration Tool
After manually migrating 5 services, we realized we needed automation. 200+ services × 45 minutes each = no way.
We built ingress-to-gateway-api
- a Go tool that:
1. Parses Ingress YAML
Extracts:
- Host rules
- Path patterns
- Backend services
- Custom annotations (the tricky part)
2. Translates to Gateway API
Maps Ingress patterns to Gateway API constructs:
func translateIngress(ingress *networkingv1.Ingress) (*gatewayv1.HTTPRoute, error) {
httpRoute := &gatewayv1.HTTPRoute{
ObjectMeta: metav1.ObjectMeta{
Name: ingress.Name,
Namespace: ingress.Namespace,
},
Spec: gatewayv1.HTTPRouteSpec{
ParentRefs: []gatewayv1.ParentReference{
{
Name: "prod-gateway",
Namespace: ptr.To(gatewayv1.Namespace("gateway-system")),
},
},
},
}
// Translate host rules
for _, rule := range ingress.Spec.Rules {
if rule.Host != "" {
httpRoute.Spec.Hostnames = append(
httpRoute.Spec.Hostnames,
gatewayv1.Hostname(rule.Host),
)
}
// Translate paths
for _, path := range rule.HTTP.Paths {
match := gatewayv1.HTTPRouteMatch{
Path: &gatewayv1.HTTPPathMatch{
Type: ptr.To(translatePathType(path.PathType)),
Value: ptr.To(path.Path),
},
}
backendRef := gatewayv1.HTTPBackendRef{
BackendRef: gatewayv1.BackendRef{
BackendObjectReference: gatewayv1.BackendObjectReference{
Name: gatewayv1.ObjectName(path.Backend.Service.Name),
Port: ptr.To(gatewayv1.PortNumber(path.Backend.Service.Port.Number)),
},
},
}
routeRule := gatewayv1.HTTPRouteRule{
Matches: []gatewayv1.HTTPRouteMatch{match},
BackendRefs: []gatewayv1.HTTPBackendRef{backendRef},
}
// Handle annotations -> filters translation
if filters, err := translateAnnotations(ingress.Annotations); err == nil {
routeRule.Filters = filters
}
httpRoute.Spec.Rules = append(httpRoute.Spec.Rules, routeRule)
}
}
return httpRoute, nil
}
// The annotation translation was the HARD part
func translateAnnotations(annotations map[string]string) ([]gatewayv1.HTTPRouteFilter, error) {
filters := []gatewayv1.HTTPRouteFilter{}
// Handle rewrite rules
if rewriteTarget, exists := annotations["nginx.ingress.kubernetes.io/rewrite-target"]; exists {
filters = append(filters, gatewayv1.HTTPRouteFilter{
Type: gatewayv1.HTTPRouteFilterURLRewrite,
URLRewrite: &gatewayv1.HTTPURLRewriteFilter{
Path: &gatewayv1.HTTPPathModifier{
Type: gatewayv1.PrefixMatchHTTPPathModifier,
ReplacePrefixMatch: ptr.To(rewriteTarget),
},
},
})
}
// Handle rate limiting
if rateLimitAnnotation, exists := annotations["nginx.ingress.kubernetes.io/limit-rps"]; exists {
// Note: Gateway API doesn't have native rate limiting (yet)
// This requires custom policy attachment
// We handled this through ReferenceGrant CRDs
}
return filters, nil
}
3. Validates Equivalence
Runs traffic through both Ingress and Gateway API routes, comparing:
- Response codes
- Response times
- Response bodies
Only promotes to production if 99.9% match.
4. Generates Rollback Plan
Every migration includes auto-generated rollback script:
#!/bin/bash
# Auto-generated rollback for blog-api
# Generated: 2025-04-15 14:23:17 UTC
echo "Rolling back blog-api to Ingress..."
# Delete Gateway API resources
kubectl delete httproute blog-api -n production
kubectl delete referencegrant blog-api-rg -n production
# Restore Ingress resource
kubectl apply -f backup/blog-api-ingress-2025-04-15.yaml
# Verify rollback
kubectl wait --for=condition=Ready ingress/blog-api -n production --timeout=60s
echo "Rollback complete. Verify traffic manually."
The Breaking Changes Nobody Warns You About
Breaking Change 1: Path Matching Semantics
Ingress: /api/v1
matches /api/v1
, /api/v1/
, /api/v1/users
, etc.
Gateway API: Path matching is more strict.
Impact: 12 services broke because trailing slashes behaved differently.
Fix: Explicit path match types:
rules:
- matches:
- path:
type: PathPrefix # Matches /api/v1/*
value: /api/v1
- path:
type: Exact # Only matches /api/v1 exactly
value: /api/v1
Breaking Change 2: Weight-Based Routing
Ingress: No native support. We used custom annotations.
Gateway API: First-class support, but default behavior changed.
rules:
- backendRefs:
- name: blog-api-v1
port: 8080
weight: 90
- name: blog-api-v2
port: 8080
weight: 10 # 10% canary traffic
Problem: Weights are relative, not absolute.
If blog-api-v2
goes down, ALL traffic goes to v1 (not 90% as we expected).
Solution: Use HTTPRoute conditions + health checks:
rules:
- matches:
- headers:
- name: X-Canary
value: "true"
backendRefs:
- name: blog-api-v2
port: 8080
- backendRefs:
- name: blog-api-v1
port: 8080
Breaking Change 3: TLS Termination
Ingress: TLS terminates at Ingress Controller.
Gateway API: TLS termination happens at Gateway, not HTTPRoute.
Impact: Our per-service TLS configs broke.
Fix: Moved TLS configuration to Gateway level:
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: prod-gateway
spec:
listeners:
- name: https-blog
port: 443
protocol: HTTPS
hostname: blog.example.com
tls:
mode: Terminate
certificateRefs:
- name: blog-tls-cert
Performance Improvements: The Good Surprise
After full migration, we saw dramatic performance improvements:
Latency Reduction
- p50: 12ms → 11ms (8% improvement)
- p95: 45ms → 28ms (38% improvement)
- p99: 180ms → 52ms (71% improvement!)
Why? Gateway API implementations use smarter routing algorithms.
Connection Pooling
Gateway API’s connection management is more efficient:
- Before: 15,000 active connections per pod
- After: 8,200 active connections per pod
- Result: 45% reduction in connection overhead
HTTP/2 Server Push
Gateway API enabled HTTP/2 optimizations we couldn’t do with Ingress:
- CSS/JS preloading
- Multiplexed connections
- Header compression
Result: Initial page load time improved by 32%.
The Hidden Costs
Cost 1: Team Training (240 Hours)
Every engineer needed to learn Gateway API concepts:
- Gateway vs. HTTPRoute vs. GatewayClass
- ReferenceGrant for cross-namespace routing
- Policy attachment mechanisms
Solution: Weekly “Gateway API Office Hours” for 8 weeks.
Cost 2: CI/CD Pipeline Updates
All deployment scripts assumed Ingress YAML:
# OLD: deploy.sh (broken after migration)
kubectl apply -f ingress.yaml
kubectl wait --for=condition=Ready ingress/my-app
# NEW: deploy.sh (Gateway API)
kubectl apply -f httproute.yaml
kubectl wait --for=condition=Accepted httproute/my-app
kubectl wait --for=condition=ResolvedRefs httproute/my-app
Effort: 40+ repositories updated.
Cost 3: Monitoring Dashboards
Our Grafana dashboards tracked Ingress metrics. Gateway API exposes different metrics.
Solution: Built unified dashboards showing both during migration period.
Lessons for Teams Considering Migration
✅ Do This:
- Start with non-critical services - Learn on low-stakes deployments
- Run shadow traffic - Validate behavior before cutover
- Build automation early - Manual migration doesn’t scale
- Test certificate rotation - This will bite you at 2 AM
- Train teams incrementally - Don’t wait until migration day
❌ Don’t Do This:
- Big bang migration - Recipe for disaster
- Assume equivalence - Path matching semantics differ
- Skip rollback testing - You WILL need to rollback
- Forget about TLS - Gateway-level termination is different
- Neglect monitoring - Metrics change, dashboards break
The ROI: Was It Worth It?
Yes, but it was harder than expected.
Quantifiable benefits:
- 71% p99 latency improvement
- 45% connection overhead reduction
- 32% faster page loads
- 60% reduction in routing config errors
Intangible benefits:
- Cleaner separation of concerns (platform vs. app teams)
- Future-proof architecture (Gateway API is the future)
- Better debugging (explicit filters vs. mysterious annotations)
- Reduced cognitive load (role-oriented design makes sense)
Total migration effort:
- Engineering time: 480 hours
- Downtime: 42 minutes (spread across incidents)
- Bugs discovered: 17 (all fixed)
- Late-night pages: 8 (mostly cert-manager related)
What’s Next?
We’re now exploring advanced Gateway API features:
- TCPRoute for non-HTTP services (database replicas, gRPC)
- TrafficSplit for sophisticated canary deployments
- BackendTLSPolicy for end-to-end encryption
- Custom policy attachment for rate limiting and auth
Gateway API opened doors we didn’t even know existed with Ingress.
For more on Kubernetes traffic management evolution, see the comprehensive Gateway API guide that helped inform our migration strategy.
Migrating to Gateway API? Connect on LinkedIn for questions, or follow the journey on Twitter.