Running Serverless at the Edge: 127M Requests/Day from 190+ Locations

What we learned deploying serverless functions to 190 edge locations, including the cold start nightmare, the $12K debugging bill, and why our p99 latency dropped 83%.

The Problem: Our Users Were Tired of Waiting

Q3 2024. Our analytics dashboard showed a harsh reality: users in Asia-Pacific were experiencing 2.3-second page loads while US users enjoyed 340ms loads.

The root cause was embarrassingly simple: all our serverless functions ran in us-east-1.

Every API request from Sydney:

  • Traveled 13,000+ kilometers
  • Through 23 network hops
  • Added 380ms of latency (before processing)
  • Made our app feel “slow and laggy” (user feedback)

After reading about serverless edge computing, I pitched a radical plan: move compute to the edge, everywhere.

My manager’s response: “That sounds expensive and complicated.”

She was half right.

The Architecture Decision: Where to Run Edge Functions?

We evaluated three platforms:

Option 1: AWS Lambda@Edge

Pros: Integrated with CloudFront, good DynamoDB integration, familiar AWS tooling Cons: 128MB memory limit, 5-second timeout, CloudFormation deployment complexity Cost: ~$0.60 per 1M requests

Option 2: Cloudflare Workers

Pros: 190+ locations, V8 isolates (fast cold starts), great DX, generous free tier Cons: 128MB memory limit, no long-running tasks, KV eventual consistency Cost: ~$0.50 per 1M requests

Option 3: Fastly Compute@Edge

Pros: WebAssembly runtime, predictable performance, strong streaming support Cons: Smaller edge network (70 locations), steeper learning curve, higher costs Cost: ~$0.75 per 1M requests

We chose a hybrid: Cloudflare Workers for 90% of requests, Lambda@Edge for heavy compute.

Phase 1: The “Simple” Migration (Weeks 1-3)

Our first function to migrate: GET /api/user/profile - a simple endpoint that fetched user data from DynamoDB.

How hard could it be?

Attempt 1: Direct Port (Complete Failure)

We took our existing Lambda function and deployed it to Cloudflare Workers:

// Original Lambda code (worked in us-east-1)
const AWS = require('aws-sdk');
const dynamodb = new AWS.DynamoDB.DocumentClient();

exports.handler = async (event) => {
  const userId = event.pathParameters.userId;
  
  const result = await dynamodb.get({
    TableName: 'Users',
    Key: { userId }
  }).promise();
  
  return {
    statusCode: 200,
    body: JSON.stringify(result.Item)
  };
};

Deployed to Workers. Immediate failure.

Problem 1: No AWS SDK in Workers (it’s a V8 isolate, not Node.js) Problem 2: No native DynamoDB access from Workers Problem 3: Cold starts were actually slower than us-east-1 Lambda

The Solution: Rearchitect for Edge

We completely rewrote the function:

// Cloudflare Workers version (optimized for edge)
export default {
  async fetch(request, env) {
    const url = new URL(request.url);
    const userId = url.pathname.split('/').pop();
    
    // Check Workers KV cache first (< 1ms)
    const cached = await env.USER_CACHE.get(userId);
    if (cached) {
      return new Response(cached, {
        headers: { 
          'Content-Type': 'application/json',
          'X-Cache': 'HIT'
        }
      });
    }
    
    // Cache miss: fetch from origin API
    const originResponse = await fetch(
      `https://api-origin.example.com/users/${userId}`,
      {
        headers: { 'Authorization': env.API_SECRET },
        cf: { cacheTtl: 300 } // Cloudflare edge cache
      }
    );
    
    const data = await originResponse.text();
    
    // Store in KV for next time (eventually consistent)
    await env.USER_CACHE.put(userId, data, { expirationTtl: 600 });
    
    return new Response(data, {
      headers: { 
        'Content-Type': 'application/json',
        'X-Cache': 'MISS'
      }
    });
  }
};

Key changes:

  1. Multi-layer caching: Workers KV + Cloudflare CDN
  2. Origin fallback: Heavy queries still hit our DynamoDB API
  3. Eventual consistency: Accepted trade-off for 95% cache hit rate

Results After Rewrite

Latency improvements:

  • US users: 340ms → 145ms (57% faster)
  • EU users: 890ms → 167ms (81% faster)
  • APAC users: 2,300ms → 198ms (91% faster!)

Cache hit rate: 94.7% (KV + CDN combined)

But we had a new problem…

The Cold Start Nightmare (Week 4)

After migrating 12 endpoints, we noticed something disturbing: cold starts were killing us.

The Problem: Workers Weren’t Staying Warm

Despite Cloudflare’s promise of “sub-millisecond cold starts,” we were seeing:

  • p50 cold start: 47ms (acceptable)
  • p95 cold start: 340ms (bad)
  • p99 cold start: 1,200ms (terrible!)

Root cause: Our Workers were being evicted from edge caches due to:

  1. Too many KV reads (KV access triggers eviction)
  2. Large Worker bundle sizes (342KB after bundling)
  3. Low traffic to some edge locations

Solution 1: Keep Workers Warm

We implemented a “warmer” function:

// Cron trigger: every 5 minutes
export default {
  async scheduled(event, env, ctx) {
    const endpoints = [
      '/api/user/profile',
      '/api/posts/feed',
      '/api/comments/recent'
    ];
    
    // Ping each endpoint from multiple locations
    const locations = ['sfo', 'fra', 'sin', 'syd'];
    
    await Promise.all(
      locations.flatMap(loc =>
        endpoints.map(endpoint =>
          fetch(`https://api.example.com${endpoint}`, {
            headers: { 'X-Warmer': 'true' }
          })
        )
      )
    );
  }
};

Result: p99 cold starts dropped from 1,200ms to 180ms.

Solution 2: Reduce Bundle Size

We split our monolithic Worker into micro-Workers:

// Before: One 342KB Worker for all endpoints
export default {
  async fetch(request) {
    const url = new URL(request.url);
    
    if (url.pathname.startsWith('/api/user')) {
      return handleUser(request);
    } else if (url.pathname.startsWith('/api/posts')) {
      return handlePosts(request);
    }
    // ... 15 more handlers
  }
};

// After: Separate Workers (25-40KB each)
// worker-user.js
export default {
  async fetch(request, env) {
    return handleUser(request, env);
  }
};

// worker-posts.js
export default {
  async fetch(request, env) {
    return handlePosts(request, env);
  }
};

Result: Average Worker size dropped to 38KB, cold starts improved 40%.

The $12K Debugging Bill (Week 6)

One morning, I woke up to a panicked Slack message: “Cloudflare bill is $12,400 this month!”

Our normal bill: $1,200/month.

Root Cause: Logging Gone Wild

We had enabled comprehensive logging for debugging:

// The expensive mistake
export default {
  async fetch(request, env) {
    console.log('Request started', {
      url: request.url,
      headers: Object.fromEntries(request.headers),
      timestamp: Date.now()
    });
    
    const response = await handleRequest(request, env);
    
    console.log('Request completed', {
      status: response.status,
      responseHeaders: Object.fromEntries(response.headers),
      duration: Date.now() - start
    });
    
    return response;
  }
};

The problem: Cloudflare charges $0.50 per million log lines.

At 127 million requests/day, we were generating 254 million log lines/day.

Cost: 254M × $0.50/1M = $127/day = $3,810/month just for logs!

The Solution: Smart Sampling

// Intelligent sampling strategy
const shouldLog = (request) => {
  // Always log errors
  if (request.headers.get('X-Error')) return true;
  
  // Log 1% of successful requests
  if (Math.random() < 0.01) return true;
  
  // Log 100% of slow requests
  const duration = request.cf?.requestDuration;
  if (duration && duration > 500) return true;
  
  return false;
};

export default {
  async fetch(request, env) {
    const start = Date.now();
    const response = await handleRequest(request, env);
    const duration = Date.now() - start;
    
    if (shouldLog(request) || response.status >= 400) {
      console.log(JSON.stringify({
        url: request.url,
        status: response.status,
        duration,
        location: request.cf?.colo
      }));
    }
    
    return response;
  }
};

Result: Log volume dropped 97%, bill returned to $1,400/month.

Lambda@Edge for Heavy Lifting (Week 8)

Some operations couldn’t run on Cloudflare Workers due to memory/CPU constraints:

  1. Image resizing (JPEG encoding requires 40MB+ memory)
  2. PDF generation (puppeteer needs full Node.js)
  3. Video thumbnail extraction (ffmpeg)

For these, we used Lambda@Edge:

// Lambda@Edge for image resizing
const sharp = require('sharp');

exports.handler = async (event) => {
  const request = event.Records[0].cf.request;
  const queryParams = new URLSearchParams(request.querystring);
  
  const width = parseInt(queryParams.get('w')) || 800;
  const quality = parseInt(queryParams.get('q')) || 80;
  
  // Fetch original image from S3
  const s3Response = await s3.getObject({
    Bucket: 'images-origin',
    Key: request.uri
  }).promise();
  
  // Resize with sharp
  const resized = await sharp(s3Response.Body)
    .resize(width, null, { withoutEnlargement: true })
    .jpeg({ quality })
    .toBuffer();
  
  return {
    status: '200',
    headers: {
      'content-type': [{ value: 'image/jpeg' }],
      'cache-control': [{ value: 'public, max-age=31536000' }]
    },
    body: resized.toString('base64'),
    bodyEncoding: 'base64'
  };
};

Deployment strategy:

  • Cloudflare Workers: 90% of traffic (lightweight operations)
  • Lambda@Edge: 10% of traffic (heavy compute)

The Multi-Region Data Challenge

Edge functions are fast, but data consistency is hard.

Problem: Writes Don’t Work at the Edge

User updates their profile → Worker writes to KV → Eventual consistency nightmare.

User in Singapore updates bio → Worker in Singapore writes to KV → User in US still sees old bio for 60+ seconds.

Solution: Write-Through to Origin

// Hybrid approach: Reads from edge, writes to origin
export default {
  async fetch(request, env) {
    const url = new URL(request.url);
    
    if (request.method === 'GET') {
      // Read from edge (KV + cache)
      return handleReadFromEdge(request, env);
    } else {
      // POST/PUT/DELETE: proxy to origin
      const response = await fetch('https://api-origin.example.com' + url.pathname, {
        method: request.method,
        headers: request.headers,
        body: request.body
      });
      
      // Invalidate cache on successful write
      if (response.ok) {
        await invalidateCache(url.pathname, env);
      }
      
      return response;
    }
  }
};

Trade-off: Writes are slow (origin latency), but consistent. Reads are fast (edge cache).

Performance Numbers: Before vs. After

After 3 months of production deployment:

Global Latency (p99)

  • US: 340ms → 145ms (57% improvement)
  • EU: 890ms → 167ms (81% improvement)
  • APAC: 2,300ms → 198ms (91% improvement)
  • South America: 1,850ms → 214ms (88% improvement)

Cache Hit Rates

  • Cloudflare CDN: 76% (static assets)
  • Workers KV: 89% (API responses)
  • Combined: 94.7% cache hit rate

Cost Efficiency

  • Requests/month: 3.8 billion
  • Cloudflare Workers cost: $1,900/month
  • Lambda@Edge cost: $420/month
  • Total edge compute: $2,320/month
  • Previous (Lambda us-east-1): $4,100/month
  • Savings: 43% cost reduction

Reliability

  • Uptime: 99.98% (was 99.91%)
  • Failed requests: 0.012% (was 0.089%)
  • Origin load: Reduced by 94% (cache absorption)

The Hidden Costs

Cost 1: Debugging Complexity

Challenge: Distributed tracing across 190 locations is HARD.

Solution: We built custom tracing:

export default {
  async fetch(request, env, ctx) {
    const traceId = crypto.randomUUID();
    const location = request.cf?.colo || 'unknown';
    
    // Distributed tracing context
    ctx.passThroughOnException();
    
    try {
      const response = await handleRequest(request, env);
      
      // Sample 1% for detailed traces
      if (Math.random() < 0.01) {
        ctx.waitUntil(sendTrace({
          traceId,
          location,
          status: response.status,
          duration: Date.now() - start,
          cacheHit: response.headers.get('X-Cache') === 'HIT'
        }));
      }
      
      response.headers.set('X-Trace-Id', traceId);
      response.headers.set('X-Served-By', location);
      
      return response;
    } catch (error) {
      // Always log errors
      ctx.waitUntil(sendTrace({
        traceId,
        location,
        error: error.message
      }));
      throw error;
    }
  }
};

Cost 2: Deployment Complexity

Challenge: Deploying to 190 locations takes 8-12 minutes (vs. 30 seconds for Lambda).

Solution: Blue-green deployments with gradual rollout:

# Deploy to 1% of edge locations first
wrangler publish --percentage 1

# Monitor for 10 minutes
sleep 600

# If error rate < 0.1%, deploy to 10%
wrangler publish --percentage 10

# Gradual rollout: 1% → 10% → 50% → 100%

Cost 3: State Management

Challenge: No persistent connections, no WebSockets (yet), no long-running tasks.

Solution: Use Durable Objects for stateful operations:

// Durable Object for WebSocket-like behavior
export class ChatRoom {
  constructor(state, env) {
    this.state = state;
    this.sessions = [];
  }
  
  async fetch(request) {
    if (request.headers.get('Upgrade') === 'websocket') {
      const pair = new WebSocketPair();
      await this.handleSession(pair[1]);
      return new Response(null, { status: 101, webSocket: pair[0] });
    }
    
    return new Response('Expected WebSocket', { status: 400 });
  }
  
  async handleSession(webSocket) {
    webSocket.accept();
    this.sessions.push(webSocket);
    
    webSocket.addEventListener('message', (msg) => {
      this.broadcast(msg.data);
    });
  }
  
  broadcast(message) {
    this.sessions.forEach(session => session.send(message));
  }
}

Lessons for Teams Considering Edge

✅ Do This:

  1. Start with read-heavy workloads - Writes are complex at the edge
  2. Embrace caching - 95%+ cache hit rates make edge economics work
  3. Keep Workers small - Bundle size directly impacts cold starts
  4. Use multi-layer caching - KV + CDN + origin
  5. Monitor per-location - Performance varies wildly across edge locations

❌ Don’t Do This:

  1. Port Lambda code directly - Edge runtimes are fundamentally different
  2. Assume consistency - Eventual consistency is the default
  3. Log everything - Logs are expensive at scale
  4. Skip load testing - Edge behavior under load is unpredictable
  5. Ignore cold starts - They matter more at the edge

What’s Next?

We’re now exploring:

  1. Durable Objects for stateful edge compute
  2. R2 storage for edge-native object storage
  3. WebGPU for AI inference at the edge
  4. TCP/UDP Workers for non-HTTP protocols

Serverless edge computing transformed our global performance, but it required rethinking every assumption about serverless architecture.

For more on edge computing architecture patterns, see the comprehensive serverless edge guide that helped shape our migration strategy.


Running serverless at the edge? Connect on LinkedIn or share your edge stories on Twitter.