Building for 100M Players: What Fortnite Taught Us About Distributed Systems

How Epic Games handles 100M+ concurrent players—and what we stole from their architecture to fix our real-time platform that was crumbling at 50K users.

The Crisis: Our Real-Time Platform Was Dying at 50K Users

June 2024. 11:47 AM. Product launch day. We expected 50,000 concurrent users.

11:52 AM: 23,000 users connected. WebSocket server CPU: 87%.

11:58 AM: 42,000 users. Database connections: 2,847 of 3,000.

12:03 PM: 51,000 users. Complete platform failure.

The damage:

  • Total outage: 2 hours 34 minutes
  • Lost users: 34,000 (never came back)
  • Revenue impact: $1.2M
  • Support tickets: 2,340
  • Trending on Twitter: “#CompanyNameBroken”

The realization: Our real-time platform architecture couldn’t scale past 50K users.

Meanwhile, Fortnite:

  • 100M+ concurrent players
  • 350M total users
  • 12M people in a single virtual concert
  • Minimal downtime

We needed to learn from the gaming industry—fast.

This is how we rebuilt our entire real-time infrastructure using lessons from Epic Games, and scaled from 50K to 2M concurrent users.

What We Had: A “Real-Time” Platform That Wasn’t

Our pre-gaming-architecture platform:

The Broken Architecture

WebSocket servers:

// Our original WebSocket server (don't do this)
const WebSocket = require('ws');
const wss = new WebSocket.Server({ port: 8080 });

let connections = new Map();  // All connections in memory!

wss.on('connection', (ws, req) => {
  const userId = getUserIdFromToken(req.headers.authorization);
  
  // Store connection
  connections.set(userId, ws);
  
  // Broadcast to everyone on every message
  ws.on('message', (message) => {
    // This killed us at scale
    connections.forEach((conn) => {
      if (conn.readyState === WebSocket.OPEN) {
        conn.send(message);  // No filtering, no sharding
      }
    });
  });
  
  // Memory leak: Never cleaned up
  ws.on('close', () => {
    // Sometimes this didn't fire
    connections.delete(userId);
  });
});

Problems:

  • Single server, single process
  • All connections in-memory
  • Broadcast to everyone for every message
  • No connection pooling
  • No load balancing
  • No horizontal scaling

Database:

-- Every WebSocket message hit the database
INSERT INTO messages (user_id, content, created_at)
VALUES ($1, $2, NOW());

-- Read query ran every 100ms per user
SELECT * FROM messages
WHERE room_id = $1
AND created_at > $2
ORDER BY created_at DESC
LIMIT 100;

At 50K users:

  • 50K database queries every 100ms = 500K queries/second
  • Database CPU: 100%
  • Connection pool exhausted
  • Messages delayed 15-30 seconds

What Broke During Launch

The cascade:

  1. WebSocket server maxed out at 47K connections

    • Node.js single process limit
    • Ran out of file descriptors
    • CPU pegged at 100%
  2. Database drowned in queries

    • 500K queries/second
    • Connections maxed out
    • Queries timing out
  3. Users started reconnecting

    • Failed connections retried
    • Exponential backoff broken
    • DDoS’d ourselves
  4. Complete platform failure

    • WebSocket server crashed
    • Database unreachable
    • All 51K users disconnected simultaneously

Recovery time: 2 hours 34 minutes (manual restart of everything)

What We Learned from Gaming Architecture

After the disaster, I spent a week reading Epic Games’ engineering blogs.

Key Insight #1: Dedicated Server Architecture

Fortnite’s approach:

  • 100M players distributed across thousands of dedicated game servers
  • Each server handles 50-100 players (a “shard”)
  • Players don’t connect to each other—they connect to their shard
  • Horizontal scaling: More players = more shards

Our mistake:

  • One WebSocket server for everyone
  • Broadcasting every message to every connection
  • No concept of sharding or rooms

Key Insight #2: State Synchronization Not Message Broadcasting

Gaming approach:

  • Server maintains authoritative state
  • Clients receive state updates (not all messages)
  • Delta compression: Only send what changed
  • Client-side prediction for responsiveness

Our mistake:

  • Sending every message to everyone
  • No state management
  • Clients didn’t cache anything
  • Every update required database hit

Key Insight #3: Kubernetes for Dynamic Scaling

Epic Games’ infrastructure:

  • Fortnite runs on AWS with Kubernetes
  • Game servers deployed as pods
  • Auto-scaling based on player demand
  • 5,000+ Kinesis shards processing 125M events/minute

Our infrastructure:

  • Single EC2 instance
  • Manual scaling (SSH in, start more processes)
  • No orchestration
  • No auto-scaling

What We Built: Gaming-Inspired Real-Time Architecture

Rebuilding took 4 months, 8 engineers, and complete buy-in from leadership.

Architecture 1: Shard-Based Connection Management

Inspired by: Fortnite’s match instances (50-100 players per server)

Our implementation:

// Shard-based WebSocket architecture
class ShardedWebSocketServer {
  constructor() {
    this.maxConnectionsPerShard = 1000;
    this.shards = new Map();
  }
  
  assignUserToShard(userId, roomId) {
    // Find or create shard for this room
    let shard = this.findAvailableShard(roomId);
    
    if (!shard || shard.connections.size >= this.maxConnectionsPerShard) {
      // Create new shard
      shard = this.createShard(roomId);
    }
    
    return shard;
  }
  
  createShard(roomId) {
    const shardId = `${roomId}-${generateId()}`;
    
    const shard = {
      id: shardId,
      roomId: roomId,
      connections: new Map(),
      state: new GameState(),  // Authoritative state
      maxConnections: this.maxConnectionsPerShard
    };
    
    this.shards.set(shardId, shard);
    
    // Deploy pod for this shard
    this.deployShardPod(shardId);
    
    return shard;
  }
}

Benefits:

  • Each shard handles 1,000 connections max
  • Horizontal scaling: More users = more shards
  • Isolation: One shard crashing doesn’t affect others
  • Dynamic scaling: Create/destroy shards based on demand

Architecture 2: State Synchronization

Inspired by: Unreal Engine’s network replication

Our implementation:

// Authoritative state management
class RoomState {
  private state: Map<string, any>;
  private dirtyFlags: Set<string>;
  
  constructor() {
    this.state = new Map();
    this.dirtyFlags = new Set();
  }
  
  // Update state (server-side only)
  updateState(key: string, value: any) {
    this.state.set(key, value);
    this.dirtyFlags.add(key);  // Mark as changed
  }
  
  // Get state delta (what changed since last sync)
  getStateDelta(): Record<string, any> {
    const delta: Record<string, any> = {};
    
    for (const key of this.dirtyFlags) {
      delta[key] = this.state.get(key);
    }
    
    this.dirtyFlags.clear();
    return delta;
  }
  
  // Sync to clients
  syncToClients(connections: Set<WebSocket>) {
    const delta = this.getStateDelta();
    
    if (Object.keys(delta).length === 0) {
      return;  // Nothing changed
    }
    
    // Only send what changed
    const payload = {
      type: 'state_update',
      delta: delta,
      timestamp: Date.now()
    };
    
    connections.forEach(ws => {
      if (ws.readyState === WebSocket.OPEN) {
        ws.send(JSON.stringify(payload));
      }
    });
  }
}

// Sync every 100ms (10 ticks per second)
setInterval(() => {
  shards.forEach(shard => {
    shard.state.syncToClients(shard.connections.values());
  });
}, 100);

Benefits:

  • Only send what changed (delta compression)
  • Predictable bandwidth (not per-message)
  • Server-authoritative (prevents cheating)
  • Client-side prediction enabled

Architecture 3: Kubernetes Orchestration

Inspired by: Epic’s Kubernetes deployment for Fortnite servers

Our implementation:

# shard-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: websocket-shard
spec:
  replicas: 10  # Start with 10 shards
  selector:
    matchLabels:
      app: websocket-shard
  template:
    metadata:
      labels:
        app: websocket-shard
    spec:
      containers:
      - name: shard-server
        image: company/websocket-shard:v2.0
        ports:
        - containerPort: 8080
          name: websocket
        resources:
          requests:
            cpu: "500m"
            memory: "512Mi"
          limits:
            cpu: "1000m"
            memory: "1Gi"
        env:
        - name: MAX_CONNECTIONS
          value: "1000"
        - name: SHARD_ID
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        livenessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /ready
            port: 8080
          initialDelaySeconds: 5
          periodSeconds: 5
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: websocket-shard-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: websocket-shard
  minReplicas: 10
  maxReplicas: 1000
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Pods
    pods:
      metric:
        name: active_connections
      target:
        type: AverageValue
        averageValue: "800"  # Scale when avg connections > 800

Benefits:

  • Auto-scaling based on load
  • Rolling updates with zero downtime
  • Health checks and auto-restart
  • Resource limits prevent resource exhaustion

Architecture 4: Redis for Shard Coordination

Inspired by: Gaming’s use of centralized matchmaking services

Our implementation:

# Shard registry and load balancing
import redis
from typing import Optional

class ShardRegistry:
    def __init__(self):
        self.redis = redis.Redis(host='redis-cluster')
    
    def register_shard(self, shard_id: str, capacity: int):
        """Register a new shard with its capacity"""
        self.redis.hset(
            'shards',
            shard_id,
            json.dumps({
                'capacity': capacity,
                'current_load': 0,
                'created_at': time.time()
            })
        )
    
    def find_available_shard(self, room_id: str) -> Optional[str]:
        """Find shard with capacity for this room"""
        # Get all shards for this room
        pattern = f'shard:{room_id}:*'
        shards = self.redis.keys(pattern)
        
        # Find one with capacity
        for shard_key in shards:
            shard_data = json.loads(self.redis.get(shard_key))
            
            if shard_data['current_load'] < shard_data['capacity']:
                return shard_key.decode('utf-8')
        
        # No available shard, need to create one
        return None
    
    def increment_shard_load(self, shard_id: str):
        """User connected to shard"""
        shard_data = json.loads(self.redis.hget('shards', shard_id))
        shard_data['current_load'] += 1
        self.redis.hset('shards', shard_id, json.dumps(shard_data))
    
    def decrement_shard_load(self, shard_id: str):
        """User disconnected from shard"""
        shard_data = json.loads(self.redis.hget('shards', shard_id))
        shard_data['current_load'] -= 1
        self.redis.hset('shards', shard_id, json.dumps(shard_data))
        
        # If shard is empty, mark for cleanup
        if shard_data['current_load'] == 0:
            self.redis.setex(f'cleanup:{shard_id}', 300, '1')  # 5 min TTL

Benefits:

  • Centralized shard registry
  • Load balancing across shards
  • Automatic cleanup of empty shards
  • Fast lookups (Redis is fast)

Architecture 5: Client-Side Prediction

Inspired by: Gaming’s client-side prediction for responsive feel

Our implementation:

// Client-side state prediction
class PredictiveStateManager {
  private serverState: any;
  private predictedState: any;
  private pendingActions: Action[];
  
  constructor() {
    this.serverState = {};
    this.predictedState = {};
    this.pendingActions = [];
  }
  
  // User performs action
  applyAction(action: Action) {
    // Immediately apply to predicted state
    this.predictedState = this.applyActionToState(
      this.predictedState,
      action
    );
    
    // Store for server reconciliation
    this.pendingActions.push(action);
    
    // Send to server
    this.sendActionToServer(action);
    
    // Update UI immediately (feels instant!)
    this.render(this.predictedState);
  }
  
  // Server sends authoritative state update
  onServerUpdate(serverState: any, acknowledgedActions: Action[]) {
    // Remove acknowledged actions
    this.pendingActions = this.pendingActions.filter(
      action => !acknowledgedActions.includes(action)
    );
    
    // Start from server state
    this.serverState = serverState;
    this.predictedState = serverState;
    
    // Re-apply pending actions
    this.pendingActions.forEach(action => {
      this.predictedState = this.applyActionToState(
        this.predictedState,
        action
      );
    });
    
    // Render final state
    this.render(this.predictedState);
  }
}

Benefits:

  • Instant UI response (no waiting for server)
  • Automatic correction when server disagrees
  • Smooth experience even with 100-200ms latency
  • Works just like gaming

The Results: From 50K to 2M Concurrent Users

6 months after the rebuild:

Scalability

Before:

  • Max concurrent users: 50,000 (then crashed)
  • Single WebSocket server
  • Manual scaling (impossible)
  • Recovery time: Hours

After:

  • Max concurrent users tested: 2,000,000 ✅
  • 2,000 shards running (1,000 users each)
  • Auto-scaling (up/down based on load)
  • Recovery time: Seconds (auto-heal)

Actual production test:

  • Simulated 2M concurrent connections
  • Zero downtime
  • Average latency: 47ms (p95: 120ms)
  • CPU usage per shard: 45% average

Performance

Message latency:

  • Before: 15-30 seconds during load
  • After: 45ms average (p99: 180ms)

Database load:

  • Before: 500K queries/second (crashed)
  • After: 12K queries/second (state cached in shards)

Memory usage:

  • Before: 47GB on single server (then OOM)
  • After: 512MB per shard × 2,000 shards = 1TB total (distributed)

Cost

Before (broken architecture):

  • EC2 instance: $8,400/month (max size, still crashed)
  • Database: $12,000/month (oversized for load)
  • Total: $20,400/month

After (gaming architecture):

  • Kubernetes cluster: $18,000/month (auto-scales)
  • Redis cluster: $2,400/month
  • Database: $4,200/month (right-sized)
  • Total: $24,600/month

Cost increase: $4,200/month

But:

  • Supports 40x more users
  • Zero downtime
  • Auto-scaling
  • Cost per user: 95% reduction

Business Impact

Customer retention:

  • Before: 34,000 users left after outage
  • After: Zero outages, 89% retention rate

New features shipped:

  • Before: Blocked (couldn’t scale)
  • After: 12 major features in 6 months

Engineering velocity:

  • Before: 73% time on scaling issues
  • After: 8% time on infrastructure

Lessons From Gaming We Applied

1. Sharding Is Everything

Gaming lesson: 100M players split across thousands of game instances

Our application: 2M users split across 2,000 shards (1,000 each)

Key insight: Don’t try to scale vertically. Scale horizontally with sharding.

2. Server-Authoritative Architecture

Gaming lesson: Server maintains truth, clients predict

Our application:

  • Server stores authoritative state
  • Clients predict for responsiveness
  • Server corrections handle conflicts

Key insight: Never trust the client, but make clients feel responsive.

3. Delta Compression

Gaming lesson: Only send what changed, not full game state

Our application:

  • State updates every 100ms
  • Only changed data transmitted
  • 97% bandwidth reduction

Key insight: Full state updates don’t scale. Send deltas.

4. Kubernetes for Dynamic Workloads

Gaming lesson: Epic uses Kubernetes for Fortnite’s servers

Our application:

  • Shards deployed as pods
  • Auto-scaling based on connections
  • Zero-downtime deployments

Key insight: Kubernetes handles the complexity of distributed systems.

5. Client-Side Prediction

Gaming lesson: Predict player actions locally, correct from server

Our application:

  • Optimistic UI updates
  • Server reconciliation
  • 45ms perceived latency vs 200ms actual

Key insight: Users don’t notice latency if prediction works.

Practical Implementation Guide

Want to apply gaming architecture to your real-time platform?

Week 1-2: Shard Your Connections

# Start simple: Route users to multiple servers
@app.route('/connect')
def connect():
    room_id = request.args.get('room_id')
    
    # Find available shard
    shard = find_or_create_shard(room_id)
    
    # Return WebSocket URL for that shard
    return {
        'websocket_url': f'wss://{shard.hostname}:8080',
        'shard_id': shard.id
    }

Month 1: Deploy on Kubernetes

# Deploy first shard
kubectl apply -f shard-deployment.yaml

# Set up auto-scaling
kubectl apply -f shard-hpa.yaml

# Monitor
kubectl top pods -l app=websocket-shard

Month 2: Implement State Synchronization

// State manager
class StateManager {
  syncLoop() {
    setInterval(() => {
      const delta = this.getStateDelta();
      this.broadcastToClients(delta);
    }, 100);  // 10 ticks/second
  }
}

Month 3: Add Client Prediction

// Client-side
function handleUserAction(action) {
  // Apply immediately (prediction)
  applyActionLocally(action);
  
  // Send to server
  sendToServer(action);
  
  // Wait for server confirmation
  waitForServerUpdate();
}

Resources That Saved Us

These resources guided our rebuild:

The Bottom Line

Gaming architecture isn’t just for games.

The patterns that let Fortnite handle 100M concurrent players work for:

  • Real-time collaboration platforms
  • Live streaming systems
  • IoT device coordination
  • Trading platforms
  • Chat applications
  • Any real-time system at scale

We went from:

  • 50K users → Total failure
  • Single server → 2,000 shards
  • Hours of downtime → Zero downtime
  • $20K/month broken → $25K/month rock-solid

Key patterns:

  • Shard-based architecture (horizontal scaling)
  • Server-authoritative state (prevents cheating/conflicts)
  • Delta synchronization (bandwidth efficiency)
  • Client-side prediction (responsiveness)
  • Kubernetes orchestration (dynamic scaling)

The gaming industry solved massive-scale real-time systems years ago.

Stop reinventing the wheel. Copy what works.


Building real-time systems at scale? Let’s talk about gaming-inspired architectures that actually handle millions of concurrent users.