The Problem: Our Users Were Tired of Waiting
Q3 2024. Our analytics dashboard showed a harsh reality: users in Asia-Pacific were experiencing 2.3-second page loads while US users enjoyed 340ms loads.
The root cause was embarrassingly simple: all our serverless functions ran in us-east-1.
Every API request from Sydney:
- Traveled 13,000+ kilometers
- Through 23 network hops
- Added 380ms of latency (before processing)
- Made our app feel “slow and laggy” (user feedback)
After reading about serverless edge computing, I pitched a radical plan: move compute to the edge, everywhere.
My manager’s response: “That sounds expensive and complicated.”
She was half right.
The Architecture Decision: Where to Run Edge Functions?
We evaluated three platforms:
Option 1: AWS Lambda@Edge
Pros: Integrated with CloudFront, good DynamoDB integration, familiar AWS tooling Cons: 128MB memory limit, 5-second timeout, CloudFormation deployment complexity Cost: ~$0.60 per 1M requests
Option 2: Cloudflare Workers
Pros: 190+ locations, V8 isolates (fast cold starts), great DX, generous free tier Cons: 128MB memory limit, no long-running tasks, KV eventual consistency Cost: ~$0.50 per 1M requests
Option 3: Fastly Compute@Edge
Pros: WebAssembly runtime, predictable performance, strong streaming support Cons: Smaller edge network (70 locations), steeper learning curve, higher costs Cost: ~$0.75 per 1M requests
We chose a hybrid: Cloudflare Workers for 90% of requests, Lambda@Edge for heavy compute.
Phase 1: The “Simple” Migration (Weeks 1-3)
Our first function to migrate: GET /api/user/profile
- a simple endpoint that fetched user data from DynamoDB.
How hard could it be?
Attempt 1: Direct Port (Complete Failure)
We took our existing Lambda function and deployed it to Cloudflare Workers:
// Original Lambda code (worked in us-east-1)
const AWS = require('aws-sdk');
const dynamodb = new AWS.DynamoDB.DocumentClient();
exports.handler = async (event) => {
const userId = event.pathParameters.userId;
const result = await dynamodb.get({
TableName: 'Users',
Key: { userId }
}).promise();
return {
statusCode: 200,
body: JSON.stringify(result.Item)
};
};
Deployed to Workers. Immediate failure.
Problem 1: No AWS SDK in Workers (it’s a V8 isolate, not Node.js) Problem 2: No native DynamoDB access from Workers Problem 3: Cold starts were actually slower than us-east-1 Lambda
The Solution: Rearchitect for Edge
We completely rewrote the function:
// Cloudflare Workers version (optimized for edge)
export default {
async fetch(request, env) {
const url = new URL(request.url);
const userId = url.pathname.split('/').pop();
// Check Workers KV cache first (< 1ms)
const cached = await env.USER_CACHE.get(userId);
if (cached) {
return new Response(cached, {
headers: {
'Content-Type': 'application/json',
'X-Cache': 'HIT'
}
});
}
// Cache miss: fetch from origin API
const originResponse = await fetch(
`https://api-origin.example.com/users/${userId}`,
{
headers: { 'Authorization': env.API_SECRET },
cf: { cacheTtl: 300 } // Cloudflare edge cache
}
);
const data = await originResponse.text();
// Store in KV for next time (eventually consistent)
await env.USER_CACHE.put(userId, data, { expirationTtl: 600 });
return new Response(data, {
headers: {
'Content-Type': 'application/json',
'X-Cache': 'MISS'
}
});
}
};
Key changes:
- Multi-layer caching: Workers KV + Cloudflare CDN
- Origin fallback: Heavy queries still hit our DynamoDB API
- Eventual consistency: Accepted trade-off for 95% cache hit rate
Results After Rewrite
Latency improvements:
- US users: 340ms → 145ms (57% faster)
- EU users: 890ms → 167ms (81% faster)
- APAC users: 2,300ms → 198ms (91% faster!)
Cache hit rate: 94.7% (KV + CDN combined)
But we had a new problem…
The Cold Start Nightmare (Week 4)
After migrating 12 endpoints, we noticed something disturbing: cold starts were killing us.
The Problem: Workers Weren’t Staying Warm
Despite Cloudflare’s promise of “sub-millisecond cold starts,” we were seeing:
- p50 cold start: 47ms (acceptable)
- p95 cold start: 340ms (bad)
- p99 cold start: 1,200ms (terrible!)
Root cause: Our Workers were being evicted from edge caches due to:
- Too many KV reads (KV access triggers eviction)
- Large Worker bundle sizes (342KB after bundling)
- Low traffic to some edge locations
Solution 1: Keep Workers Warm
We implemented a “warmer” function:
// Cron trigger: every 5 minutes
export default {
async scheduled(event, env, ctx) {
const endpoints = [
'/api/user/profile',
'/api/posts/feed',
'/api/comments/recent'
];
// Ping each endpoint from multiple locations
const locations = ['sfo', 'fra', 'sin', 'syd'];
await Promise.all(
locations.flatMap(loc =>
endpoints.map(endpoint =>
fetch(`https://api.example.com${endpoint}`, {
headers: { 'X-Warmer': 'true' }
})
)
)
);
}
};
Result: p99 cold starts dropped from 1,200ms to 180ms.
Solution 2: Reduce Bundle Size
We split our monolithic Worker into micro-Workers:
// Before: One 342KB Worker for all endpoints
export default {
async fetch(request) {
const url = new URL(request.url);
if (url.pathname.startsWith('/api/user')) {
return handleUser(request);
} else if (url.pathname.startsWith('/api/posts')) {
return handlePosts(request);
}
// ... 15 more handlers
}
};
// After: Separate Workers (25-40KB each)
// worker-user.js
export default {
async fetch(request, env) {
return handleUser(request, env);
}
};
// worker-posts.js
export default {
async fetch(request, env) {
return handlePosts(request, env);
}
};
Result: Average Worker size dropped to 38KB, cold starts improved 40%.
The $12K Debugging Bill (Week 6)
One morning, I woke up to a panicked Slack message: “Cloudflare bill is $12,400 this month!”
Our normal bill: $1,200/month.
Root Cause: Logging Gone Wild
We had enabled comprehensive logging for debugging:
// The expensive mistake
export default {
async fetch(request, env) {
console.log('Request started', {
url: request.url,
headers: Object.fromEntries(request.headers),
timestamp: Date.now()
});
const response = await handleRequest(request, env);
console.log('Request completed', {
status: response.status,
responseHeaders: Object.fromEntries(response.headers),
duration: Date.now() - start
});
return response;
}
};
The problem: Cloudflare charges $0.50 per million log lines.
At 127 million requests/day, we were generating 254 million log lines/day.
Cost: 254M × $0.50/1M = $127/day = $3,810/month just for logs!
The Solution: Smart Sampling
// Intelligent sampling strategy
const shouldLog = (request) => {
// Always log errors
if (request.headers.get('X-Error')) return true;
// Log 1% of successful requests
if (Math.random() < 0.01) return true;
// Log 100% of slow requests
const duration = request.cf?.requestDuration;
if (duration && duration > 500) return true;
return false;
};
export default {
async fetch(request, env) {
const start = Date.now();
const response = await handleRequest(request, env);
const duration = Date.now() - start;
if (shouldLog(request) || response.status >= 400) {
console.log(JSON.stringify({
url: request.url,
status: response.status,
duration,
location: request.cf?.colo
}));
}
return response;
}
};
Result: Log volume dropped 97%, bill returned to $1,400/month.
Lambda@Edge for Heavy Lifting (Week 8)
Some operations couldn’t run on Cloudflare Workers due to memory/CPU constraints:
- Image resizing (JPEG encoding requires 40MB+ memory)
- PDF generation (puppeteer needs full Node.js)
- Video thumbnail extraction (ffmpeg)
For these, we used Lambda@Edge:
// Lambda@Edge for image resizing
const sharp = require('sharp');
exports.handler = async (event) => {
const request = event.Records[0].cf.request;
const queryParams = new URLSearchParams(request.querystring);
const width = parseInt(queryParams.get('w')) || 800;
const quality = parseInt(queryParams.get('q')) || 80;
// Fetch original image from S3
const s3Response = await s3.getObject({
Bucket: 'images-origin',
Key: request.uri
}).promise();
// Resize with sharp
const resized = await sharp(s3Response.Body)
.resize(width, null, { withoutEnlargement: true })
.jpeg({ quality })
.toBuffer();
return {
status: '200',
headers: {
'content-type': [{ value: 'image/jpeg' }],
'cache-control': [{ value: 'public, max-age=31536000' }]
},
body: resized.toString('base64'),
bodyEncoding: 'base64'
};
};
Deployment strategy:
- Cloudflare Workers: 90% of traffic (lightweight operations)
- Lambda@Edge: 10% of traffic (heavy compute)
The Multi-Region Data Challenge
Edge functions are fast, but data consistency is hard.
Problem: Writes Don’t Work at the Edge
User updates their profile → Worker writes to KV → Eventual consistency nightmare.
User in Singapore updates bio → Worker in Singapore writes to KV → User in US still sees old bio for 60+ seconds.
Solution: Write-Through to Origin
// Hybrid approach: Reads from edge, writes to origin
export default {
async fetch(request, env) {
const url = new URL(request.url);
if (request.method === 'GET') {
// Read from edge (KV + cache)
return handleReadFromEdge(request, env);
} else {
// POST/PUT/DELETE: proxy to origin
const response = await fetch('https://api-origin.example.com' + url.pathname, {
method: request.method,
headers: request.headers,
body: request.body
});
// Invalidate cache on successful write
if (response.ok) {
await invalidateCache(url.pathname, env);
}
return response;
}
}
};
Trade-off: Writes are slow (origin latency), but consistent. Reads are fast (edge cache).
Performance Numbers: Before vs. After
After 3 months of production deployment:
Global Latency (p99)
- US: 340ms → 145ms (57% improvement)
- EU: 890ms → 167ms (81% improvement)
- APAC: 2,300ms → 198ms (91% improvement)
- South America: 1,850ms → 214ms (88% improvement)
Cache Hit Rates
- Cloudflare CDN: 76% (static assets)
- Workers KV: 89% (API responses)
- Combined: 94.7% cache hit rate
Cost Efficiency
- Requests/month: 3.8 billion
- Cloudflare Workers cost: $1,900/month
- Lambda@Edge cost: $420/month
- Total edge compute: $2,320/month
- Previous (Lambda us-east-1): $4,100/month
- Savings: 43% cost reduction
Reliability
- Uptime: 99.98% (was 99.91%)
- Failed requests: 0.012% (was 0.089%)
- Origin load: Reduced by 94% (cache absorption)
The Hidden Costs
Cost 1: Debugging Complexity
Challenge: Distributed tracing across 190 locations is HARD.
Solution: We built custom tracing:
export default {
async fetch(request, env, ctx) {
const traceId = crypto.randomUUID();
const location = request.cf?.colo || 'unknown';
// Distributed tracing context
ctx.passThroughOnException();
try {
const response = await handleRequest(request, env);
// Sample 1% for detailed traces
if (Math.random() < 0.01) {
ctx.waitUntil(sendTrace({
traceId,
location,
status: response.status,
duration: Date.now() - start,
cacheHit: response.headers.get('X-Cache') === 'HIT'
}));
}
response.headers.set('X-Trace-Id', traceId);
response.headers.set('X-Served-By', location);
return response;
} catch (error) {
// Always log errors
ctx.waitUntil(sendTrace({
traceId,
location,
error: error.message
}));
throw error;
}
}
};
Cost 2: Deployment Complexity
Challenge: Deploying to 190 locations takes 8-12 minutes (vs. 30 seconds for Lambda).
Solution: Blue-green deployments with gradual rollout:
# Deploy to 1% of edge locations first
wrangler publish --percentage 1
# Monitor for 10 minutes
sleep 600
# If error rate < 0.1%, deploy to 10%
wrangler publish --percentage 10
# Gradual rollout: 1% → 10% → 50% → 100%
Cost 3: State Management
Challenge: No persistent connections, no WebSockets (yet), no long-running tasks.
Solution: Use Durable Objects for stateful operations:
// Durable Object for WebSocket-like behavior
export class ChatRoom {
constructor(state, env) {
this.state = state;
this.sessions = [];
}
async fetch(request) {
if (request.headers.get('Upgrade') === 'websocket') {
const pair = new WebSocketPair();
await this.handleSession(pair[1]);
return new Response(null, { status: 101, webSocket: pair[0] });
}
return new Response('Expected WebSocket', { status: 400 });
}
async handleSession(webSocket) {
webSocket.accept();
this.sessions.push(webSocket);
webSocket.addEventListener('message', (msg) => {
this.broadcast(msg.data);
});
}
broadcast(message) {
this.sessions.forEach(session => session.send(message));
}
}
Lessons for Teams Considering Edge
✅ Do This:
- Start with read-heavy workloads - Writes are complex at the edge
- Embrace caching - 95%+ cache hit rates make edge economics work
- Keep Workers small - Bundle size directly impacts cold starts
- Use multi-layer caching - KV + CDN + origin
- Monitor per-location - Performance varies wildly across edge locations
❌ Don’t Do This:
- Port Lambda code directly - Edge runtimes are fundamentally different
- Assume consistency - Eventual consistency is the default
- Log everything - Logs are expensive at scale
- Skip load testing - Edge behavior under load is unpredictable
- Ignore cold starts - They matter more at the edge
What’s Next?
We’re now exploring:
- Durable Objects for stateful edge compute
- R2 storage for edge-native object storage
- WebGPU for AI inference at the edge
- TCP/UDP Workers for non-HTTP protocols
Serverless edge computing transformed our global performance, but it required rethinking every assumption about serverless architecture.
For more on edge computing architecture patterns, see the comprehensive serverless edge guide that helped shape our migration strategy.
Running serverless at the edge? Connect on LinkedIn or share your edge stories on Twitter.