The Crisis: Our Real-Time Platform Was Dying at 50K Users
June 2024. 11:47 AM. Product launch day. We expected 50,000 concurrent users.
11:52 AM: 23,000 users connected. WebSocket server CPU: 87%.
11:58 AM: 42,000 users. Database connections: 2,847 of 3,000.
12:03 PM: 51,000 users. Complete platform failure.
The damage:
- Total outage: 2 hours 34 minutes
- Lost users: 34,000 (never came back)
- Revenue impact: $1.2M
- Support tickets: 2,340
- Trending on Twitter: “#CompanyNameBroken”
The realization: Our real-time platform architecture couldn’t scale past 50K users.
Meanwhile, Fortnite:
- 100M+ concurrent players
- 350M total users
- 12M people in a single virtual concert
- Minimal downtime
We needed to learn from the gaming industry—fast.
This is how we rebuilt our entire real-time infrastructure using lessons from Epic Games, and scaled from 50K to 2M concurrent users.
What We Had: A “Real-Time” Platform That Wasn’t
Our pre-gaming-architecture platform:
The Broken Architecture
WebSocket servers:
// Our original WebSocket server (don't do this)
const WebSocket = require('ws');
const wss = new WebSocket.Server({ port: 8080 });
let connections = new Map(); // All connections in memory!
wss.on('connection', (ws, req) => {
const userId = getUserIdFromToken(req.headers.authorization);
// Store connection
connections.set(userId, ws);
// Broadcast to everyone on every message
ws.on('message', (message) => {
// This killed us at scale
connections.forEach((conn) => {
if (conn.readyState === WebSocket.OPEN) {
conn.send(message); // No filtering, no sharding
}
});
});
// Memory leak: Never cleaned up
ws.on('close', () => {
// Sometimes this didn't fire
connections.delete(userId);
});
});
Problems:
- Single server, single process
- All connections in-memory
- Broadcast to everyone for every message
- No connection pooling
- No load balancing
- No horizontal scaling
Database:
-- Every WebSocket message hit the database
INSERT INTO messages (user_id, content, created_at)
VALUES ($1, $2, NOW());
-- Read query ran every 100ms per user
SELECT * FROM messages
WHERE room_id = $1
AND created_at > $2
ORDER BY created_at DESC
LIMIT 100;
At 50K users:
- 50K database queries every 100ms = 500K queries/second
- Database CPU: 100%
- Connection pool exhausted
- Messages delayed 15-30 seconds
What Broke During Launch
The cascade:
-
WebSocket server maxed out at 47K connections
- Node.js single process limit
- Ran out of file descriptors
- CPU pegged at 100%
-
Database drowned in queries
- 500K queries/second
- Connections maxed out
- Queries timing out
-
Users started reconnecting
- Failed connections retried
- Exponential backoff broken
- DDoS’d ourselves
-
Complete platform failure
- WebSocket server crashed
- Database unreachable
- All 51K users disconnected simultaneously
Recovery time: 2 hours 34 minutes (manual restart of everything)
What We Learned from Gaming Architecture
After the disaster, I spent a week reading Epic Games’ engineering blogs.
Key Insight #1: Dedicated Server Architecture
Fortnite’s approach:
- 100M players distributed across thousands of dedicated game servers
- Each server handles 50-100 players (a “shard”)
- Players don’t connect to each other—they connect to their shard
- Horizontal scaling: More players = more shards
Our mistake:
- One WebSocket server for everyone
- Broadcasting every message to every connection
- No concept of sharding or rooms
Key Insight #2: State Synchronization Not Message Broadcasting
Gaming approach:
- Server maintains authoritative state
- Clients receive state updates (not all messages)
- Delta compression: Only send what changed
- Client-side prediction for responsiveness
Our mistake:
- Sending every message to everyone
- No state management
- Clients didn’t cache anything
- Every update required database hit
Key Insight #3: Kubernetes for Dynamic Scaling
Epic Games’ infrastructure:
- Fortnite runs on AWS with Kubernetes
- Game servers deployed as pods
- Auto-scaling based on player demand
- 5,000+ Kinesis shards processing 125M events/minute
Our infrastructure:
- Single EC2 instance
- Manual scaling (SSH in, start more processes)
- No orchestration
- No auto-scaling
What We Built: Gaming-Inspired Real-Time Architecture
Rebuilding took 4 months, 8 engineers, and complete buy-in from leadership.
Architecture 1: Shard-Based Connection Management
Inspired by: Fortnite’s match instances (50-100 players per server)
Our implementation:
// Shard-based WebSocket architecture
class ShardedWebSocketServer {
constructor() {
this.maxConnectionsPerShard = 1000;
this.shards = new Map();
}
assignUserToShard(userId, roomId) {
// Find or create shard for this room
let shard = this.findAvailableShard(roomId);
if (!shard || shard.connections.size >= this.maxConnectionsPerShard) {
// Create new shard
shard = this.createShard(roomId);
}
return shard;
}
createShard(roomId) {
const shardId = `${roomId}-${generateId()}`;
const shard = {
id: shardId,
roomId: roomId,
connections: new Map(),
state: new GameState(), // Authoritative state
maxConnections: this.maxConnectionsPerShard
};
this.shards.set(shardId, shard);
// Deploy pod for this shard
this.deployShardPod(shardId);
return shard;
}
}
Benefits:
- Each shard handles 1,000 connections max
- Horizontal scaling: More users = more shards
- Isolation: One shard crashing doesn’t affect others
- Dynamic scaling: Create/destroy shards based on demand
Architecture 2: State Synchronization
Inspired by: Unreal Engine’s network replication
Our implementation:
// Authoritative state management
class RoomState {
private state: Map<string, any>;
private dirtyFlags: Set<string>;
constructor() {
this.state = new Map();
this.dirtyFlags = new Set();
}
// Update state (server-side only)
updateState(key: string, value: any) {
this.state.set(key, value);
this.dirtyFlags.add(key); // Mark as changed
}
// Get state delta (what changed since last sync)
getStateDelta(): Record<string, any> {
const delta: Record<string, any> = {};
for (const key of this.dirtyFlags) {
delta[key] = this.state.get(key);
}
this.dirtyFlags.clear();
return delta;
}
// Sync to clients
syncToClients(connections: Set<WebSocket>) {
const delta = this.getStateDelta();
if (Object.keys(delta).length === 0) {
return; // Nothing changed
}
// Only send what changed
const payload = {
type: 'state_update',
delta: delta,
timestamp: Date.now()
};
connections.forEach(ws => {
if (ws.readyState === WebSocket.OPEN) {
ws.send(JSON.stringify(payload));
}
});
}
}
// Sync every 100ms (10 ticks per second)
setInterval(() => {
shards.forEach(shard => {
shard.state.syncToClients(shard.connections.values());
});
}, 100);
Benefits:
- Only send what changed (delta compression)
- Predictable bandwidth (not per-message)
- Server-authoritative (prevents cheating)
- Client-side prediction enabled
Architecture 3: Kubernetes Orchestration
Inspired by: Epic’s Kubernetes deployment for Fortnite servers
Our implementation:
# shard-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: websocket-shard
spec:
replicas: 10 # Start with 10 shards
selector:
matchLabels:
app: websocket-shard
template:
metadata:
labels:
app: websocket-shard
spec:
containers:
- name: shard-server
image: company/websocket-shard:v2.0
ports:
- containerPort: 8080
name: websocket
resources:
requests:
cpu: "500m"
memory: "512Mi"
limits:
cpu: "1000m"
memory: "1Gi"
env:
- name: MAX_CONNECTIONS
value: "1000"
- name: SHARD_ID
valueFrom:
fieldRef:
fieldPath: metadata.name
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: websocket-shard-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: websocket-shard
minReplicas: 10
maxReplicas: 1000
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Pods
pods:
metric:
name: active_connections
target:
type: AverageValue
averageValue: "800" # Scale when avg connections > 800
Benefits:
- Auto-scaling based on load
- Rolling updates with zero downtime
- Health checks and auto-restart
- Resource limits prevent resource exhaustion
Architecture 4: Redis for Shard Coordination
Inspired by: Gaming’s use of centralized matchmaking services
Our implementation:
# Shard registry and load balancing
import redis
from typing import Optional
class ShardRegistry:
def __init__(self):
self.redis = redis.Redis(host='redis-cluster')
def register_shard(self, shard_id: str, capacity: int):
"""Register a new shard with its capacity"""
self.redis.hset(
'shards',
shard_id,
json.dumps({
'capacity': capacity,
'current_load': 0,
'created_at': time.time()
})
)
def find_available_shard(self, room_id: str) -> Optional[str]:
"""Find shard with capacity for this room"""
# Get all shards for this room
pattern = f'shard:{room_id}:*'
shards = self.redis.keys(pattern)
# Find one with capacity
for shard_key in shards:
shard_data = json.loads(self.redis.get(shard_key))
if shard_data['current_load'] < shard_data['capacity']:
return shard_key.decode('utf-8')
# No available shard, need to create one
return None
def increment_shard_load(self, shard_id: str):
"""User connected to shard"""
shard_data = json.loads(self.redis.hget('shards', shard_id))
shard_data['current_load'] += 1
self.redis.hset('shards', shard_id, json.dumps(shard_data))
def decrement_shard_load(self, shard_id: str):
"""User disconnected from shard"""
shard_data = json.loads(self.redis.hget('shards', shard_id))
shard_data['current_load'] -= 1
self.redis.hset('shards', shard_id, json.dumps(shard_data))
# If shard is empty, mark for cleanup
if shard_data['current_load'] == 0:
self.redis.setex(f'cleanup:{shard_id}', 300, '1') # 5 min TTL
Benefits:
- Centralized shard registry
- Load balancing across shards
- Automatic cleanup of empty shards
- Fast lookups (Redis is fast)
Architecture 5: Client-Side Prediction
Inspired by: Gaming’s client-side prediction for responsive feel
Our implementation:
// Client-side state prediction
class PredictiveStateManager {
private serverState: any;
private predictedState: any;
private pendingActions: Action[];
constructor() {
this.serverState = {};
this.predictedState = {};
this.pendingActions = [];
}
// User performs action
applyAction(action: Action) {
// Immediately apply to predicted state
this.predictedState = this.applyActionToState(
this.predictedState,
action
);
// Store for server reconciliation
this.pendingActions.push(action);
// Send to server
this.sendActionToServer(action);
// Update UI immediately (feels instant!)
this.render(this.predictedState);
}
// Server sends authoritative state update
onServerUpdate(serverState: any, acknowledgedActions: Action[]) {
// Remove acknowledged actions
this.pendingActions = this.pendingActions.filter(
action => !acknowledgedActions.includes(action)
);
// Start from server state
this.serverState = serverState;
this.predictedState = serverState;
// Re-apply pending actions
this.pendingActions.forEach(action => {
this.predictedState = this.applyActionToState(
this.predictedState,
action
);
});
// Render final state
this.render(this.predictedState);
}
}
Benefits:
- Instant UI response (no waiting for server)
- Automatic correction when server disagrees
- Smooth experience even with 100-200ms latency
- Works just like gaming
The Results: From 50K to 2M Concurrent Users
6 months after the rebuild:
Scalability
Before:
- Max concurrent users: 50,000 (then crashed)
- Single WebSocket server
- Manual scaling (impossible)
- Recovery time: Hours
After:
- Max concurrent users tested: 2,000,000 ✅
- 2,000 shards running (1,000 users each)
- Auto-scaling (up/down based on load)
- Recovery time: Seconds (auto-heal)
Actual production test:
- Simulated 2M concurrent connections
- Zero downtime
- Average latency: 47ms (p95: 120ms)
- CPU usage per shard: 45% average
Performance
Message latency:
- Before: 15-30 seconds during load
- After: 45ms average (p99: 180ms)
Database load:
- Before: 500K queries/second (crashed)
- After: 12K queries/second (state cached in shards)
Memory usage:
- Before: 47GB on single server (then OOM)
- After: 512MB per shard × 2,000 shards = 1TB total (distributed)
Cost
Before (broken architecture):
- EC2 instance: $8,400/month (max size, still crashed)
- Database: $12,000/month (oversized for load)
- Total: $20,400/month
After (gaming architecture):
- Kubernetes cluster: $18,000/month (auto-scales)
- Redis cluster: $2,400/month
- Database: $4,200/month (right-sized)
- Total: $24,600/month
Cost increase: $4,200/month
But:
- Supports 40x more users
- Zero downtime
- Auto-scaling
- Cost per user: 95% reduction
Business Impact
Customer retention:
- Before: 34,000 users left after outage
- After: Zero outages, 89% retention rate
New features shipped:
- Before: Blocked (couldn’t scale)
- After: 12 major features in 6 months
Engineering velocity:
- Before: 73% time on scaling issues
- After: 8% time on infrastructure
Lessons From Gaming We Applied
1. Sharding Is Everything
Gaming lesson: 100M players split across thousands of game instances
Our application: 2M users split across 2,000 shards (1,000 each)
Key insight: Don’t try to scale vertically. Scale horizontally with sharding.
2. Server-Authoritative Architecture
Gaming lesson: Server maintains truth, clients predict
Our application:
- Server stores authoritative state
- Clients predict for responsiveness
- Server corrections handle conflicts
Key insight: Never trust the client, but make clients feel responsive.
3. Delta Compression
Gaming lesson: Only send what changed, not full game state
Our application:
- State updates every 100ms
- Only changed data transmitted
- 97% bandwidth reduction
Key insight: Full state updates don’t scale. Send deltas.
4. Kubernetes for Dynamic Workloads
Gaming lesson: Epic uses Kubernetes for Fortnite’s servers
Our application:
- Shards deployed as pods
- Auto-scaling based on connections
- Zero-downtime deployments
Key insight: Kubernetes handles the complexity of distributed systems.
5. Client-Side Prediction
Gaming lesson: Predict player actions locally, correct from server
Our application:
- Optimistic UI updates
- Server reconciliation
- 45ms perceived latency vs 200ms actual
Key insight: Users don’t notice latency if prediction works.
Practical Implementation Guide
Want to apply gaming architecture to your real-time platform?
Week 1-2: Shard Your Connections
# Start simple: Route users to multiple servers
@app.route('/connect')
def connect():
room_id = request.args.get('room_id')
# Find available shard
shard = find_or_create_shard(room_id)
# Return WebSocket URL for that shard
return {
'websocket_url': f'wss://{shard.hostname}:8080',
'shard_id': shard.id
}
Month 1: Deploy on Kubernetes
# Deploy first shard
kubectl apply -f shard-deployment.yaml
# Set up auto-scaling
kubectl apply -f shard-hpa.yaml
# Monitor
kubectl top pods -l app=websocket-shard
Month 2: Implement State Synchronization
// State manager
class StateManager {
syncLoop() {
setInterval(() => {
const delta = this.getStateDelta();
this.broadcastToClients(delta);
}, 100); // 10 ticks/second
}
}
Month 3: Add Client Prediction
// Client-side
function handleUserAction(action) {
// Apply immediately (prediction)
applyActionLocally(action);
// Send to server
sendToServer(action);
// Wait for server confirmation
waitForServerUpdate();
}
Resources That Saved Us
These resources guided our rebuild:
- Epic Games Fortnite Postmortem - Real incident learnings
- Unreal Engine Networking - Networking patterns
- Agones Kubernetes Game Servers - Gaming on K8s
- AWS GameLift Architecture - Managed game hosting
- Unity Netcode Documentation - Synchronization patterns
- Colyseus Multiplayer Framework - State synchronization
- WebRTC for Games - Peer-to-peer patterns
- Kubernetes HPA Documentation - Auto-scaling
- Redis Cluster Specification - Distributed coordination
- CrashBytes: Gaming Architecture Guide - Enterprise patterns
The Bottom Line
Gaming architecture isn’t just for games.
The patterns that let Fortnite handle 100M concurrent players work for:
- Real-time collaboration platforms
- Live streaming systems
- IoT device coordination
- Trading platforms
- Chat applications
- Any real-time system at scale
We went from:
- 50K users → Total failure
- Single server → 2,000 shards
- Hours of downtime → Zero downtime
- $20K/month broken → $25K/month rock-solid
Key patterns:
- Shard-based architecture (horizontal scaling)
- Server-authoritative state (prevents cheating/conflicts)
- Delta synchronization (bandwidth efficiency)
- Client-side prediction (responsiveness)
- Kubernetes orchestration (dynamic scaling)
The gaming industry solved massive-scale real-time systems years ago.
Stop reinventing the wheel. Copy what works.
Building real-time systems at scale? Let’s talk about gaming-inspired architectures that actually handle millions of concurrent users.