Rewriting Our Core Services in Rust: 64% Faster, 71% Less Memory, Worth the Pain

The Problem: Go Was “Fast Enough” Until It Wasn’t

Q4 2023. Our payment processing system was melting down every Monday morning.

The pattern was predictable:

9 AM: Traffic spike as businesses process weekend transactions
9:15 AM: Latency climbs from 45ms to 890ms
9:30 AM: Memory usage hits 95%, pods start OOMing
9:45 AM: Auto-scaler frantically launches 40+ new pods
10:30 AM: Traffic normalizes, we’re left with massive over-provisioning

Our weekly infrastructure dance:

Monday AM: 60 pods @ $120/hour
Rest of week: 15 pods @ $30/hour
Monthly waste: ~$14,400 on peak capacity we only needed 4 hours/week

Our Go services were well-written. But garbage collection pauses and memory overhead were killing us at scale.

After reading about Rust transforming system design, I proposed something radical: Rewrite our hottest path services in Rust.

My VP’s response: “Do you have any idea how long that will take?”

Spoiler: Longer than we thought. But worth every painful moment.

The Business Case: Convincing Leadership

Before writing a single line of Rust, I needed executive buy-in.

The Financial Analysis

Current costs (Go implementation):

Infrastructure: $62K/month (excessive memory usage)
Developer time: $18K/month (debugging GC issues)
Incident response: $12K/month (on-call + lost productivity)
Total: $92K/month

Projected costs (Rust rewrite):

Migration effort: $180K (3 engineers × 2 months)
Infrastructure: $19K/month (70% reduction)
Developer time: $8K/month (simpler debugging)
Incident response: $3K/month (fewer crashes)
Total: $30K/month + $180K one-time

Break-even: 3 months after migration 3-year ROI: $2.2M savings

The VP: “You have 3 months to prove this works. Start with one service.”

Phase 1: The Proof of Concept (Weeks 1-4)

We chose our most problematic service: transaction-validator (2,500 req/sec at peak).

The Go Baseline

// Original Go implementation
type TransactionValidator struct {
    cache    *redis.Client
    db       *sql.DB
    mu       sync.RWMutex
    metrics  *prometheus.Registry
}

func (v *TransactionValidator) Validate(ctx context.Context, tx Transaction) error {
    v.mu.RLock()
    defer v.mu.RUnlock()
    
    // Check cache
    cached, err := v.cache.Get(ctx, tx.ID).Result()
    if err == nil {
        return v.processCached(cached)
    }
    
    // Validate against rules
    if err := v.validateRules(tx); err != nil {
        return err
    }
    
    // Store in cache
    v.cache.Set(ctx, tx.ID, tx, 10*time.Minute)
    
    return nil
}

Performance baseline:

p50 latency: 42ms
p99 latency: 340ms
Memory per pod: 850MB steady-state, 2.1GB peak
GC pauses: 8-15ms every 2 seconds

The Rust Rewrite: Attempt 1 (Failed)

My first Rust code was… a disaster.

// My first terrible Rust attempt
use std::sync::Arc;
use tokio::sync::RwLock;

struct TransactionValidator {
    cache: Arc<RwLock<redis::Client>>,  // Wrong!
    db: Arc<RwLock<sqlx::Pool<sqlx::Postgres>>>,  // Also wrong!
}

// This compiles but performs WORSE than Go!
impl TransactionValidator {
    async fn validate(&self, tx: Transaction) -> Result<(), Error> {
        let cache = self.cache.write().await;  // Serial lock!
        // ... rest of implementation
    }
}

Problems:

Over-locking: Every cache access acquired a write lock
Arc everywhere: Fighting the borrow checker wrong way
Blocking calls in async: Mixed sync and async code poorly
No connection pooling: Creating new connections on every request

Performance: Worse than Go

p50: 68ms (61% slower!)
p99: 520ms
Memory: 420MB (better, but code was trash)

The Rust Rewrite: Attempt 2 (Success)

After a week of learning Rust’s ownership model properly:

// Proper Rust implementation
use std::sync::Arc;
use dashmap::DashMap;  // Lock-free concurrent HashMap
use deadpool_redis::{Config as RedisConfig, Pool as RedisPool, Runtime};
use sqlx::postgres::PgPool;

#[derive(Clone)]
struct TransactionValidator {
    // Arc only where needed, no mutex spam
    cache: RedisPool,           // Connection pooled
    db: PgPool,                 // Connection pooled
    local_cache: Arc<DashMap<String, CachedTransaction>>,  // Lock-free
    metrics: Arc<Metrics>,
}

impl TransactionValidator {
    async fn validate(&self, tx: Transaction) -> Result<(), ValidationError> {
        // Check lock-free local cache first (nanosecond access)
        if let Some(cached) = self.local_cache.get(&tx.id) {
            return self.process_cached(&cached);
        }
        
        // Check Redis (millisecond access)
        let mut conn = self.cache.get().await?;
        if let Ok(cached) = conn.get::<_, String>(&tx.id).await {
            let cached_tx: CachedTransaction = serde_json::from_str(&cached)?;
            self.local_cache.insert(tx.id.clone(), cached_tx.clone());
            return self.process_cached(&cached_tx);
        }
        
        // Validate rules (parallel execution)
        let validation_results = futures::future::join_all(
            vec![
                self.validate_amount(&tx),
                self.validate_merchant(&tx),
                self.validate_card(&tx),
                self.validate_risk_score(&tx),
            ]
        ).await;
        
        // Check all validations passed
        for result in validation_results {
            result?;
        }
        
        // Cache result
        let cached = CachedTransaction::from(tx.clone());
        let serialized = serde_json::to_string(&cached)?;
        conn.set_ex(&tx.id, serialized, 600).await?;
        self.local_cache.insert(tx.id.clone(), cached);
        
        Ok(())
    }
}

Key improvements:

DashMap: Lock-free concurrent HashMap (10x faster than RwLock)
Connection pooling: Reuse connections efficiently
Parallel validation: Run independent checks concurrently
Zero-copy where possible: Minimize allocations

Performance after rewrite:

p50 latency: 15ms (64% faster than Go!)
p99 latency: 48ms (86% faster!)
Memory per pod: 180MB steady-state, 240MB peak (71% less!)
No GC pauses: Deterministic performance

The Migration: Phase 2 (Months 2-3)

After proving Rust worked, we migrated 11 more services.

Services We Migrated

Service	Go Memory	Rust Memory	Latency Improvement	Status
transaction-validator	850MB	180MB	64% faster	✅ Production
fraud-detector	1.2GB	290MB	71% faster	✅ Production
payment-processor	980MB	210MB	58% faster	✅ Production
account-service	640MB	150MB	52% faster	✅ Production
notification-service	420MB	95MB	48% faster	✅ Production
analytics-aggregator	2.1GB	480MB	79% faster	✅ Production

The Migration Strategy

Strangler Fig Pattern:

┌─────────────────────────────────────┐
│      Load Balancer (50/50 split)    │
└──────────────┬──────────────────────┘
               │
        ┌──────┴──────┐
        │             │
    ┌───▼────┐   ┌───▼────┐
    │   Go   │   │  Rust  │
    │Service │   │Service │
    └────────┘   └────────┘

Traffic migration:

Week 1: 5% Rust, 95% Go (canary)
Week 2: 25% Rust, 75% Go (if metrics good)
Week 3: 50% Rust, 50% Go (split testing)
Week 4: 90% Rust, 10% Go (final validation)
Week 5: 100% Rust, decommission Go

Rollback plan: Single kubectl command to route 100% to Go.

The Challenges: What Almost Killed Us

Challenge 1: The Memory Leak We Didn’t Expect

Month 2, Week 3: Rust services started slowly leaking memory.

Day 1:  180MB → 185MB (normal variation)
Day 3:  185MB → 205MB (concerning)
Day 7:  205MB → 280MB (WTF?)
Day 10: 280MB → 420MB (PANIC!)

The hunt: We added extensive instrumentation.

// Added memory profiling
use jemalloc_ctl::{stats, epoch};

async fn memory_stats_handler() -> impl Responder {
    // Trigger stats refresh
    epoch::mib().unwrap().advance().unwrap();
    
    let allocated = stats::allocated::mib().unwrap().read().unwrap();
    let resident = stats::resident::mib().unwrap().read().unwrap();
    
    HttpResponse::Ok().json(json!({
        "allocated_mb": allocated / 1024 / 1024,
        "resident_mb": resident / 1024 / 1024
    }))
}

The culprit: Connection pool not properly closing idle connections.

// BUG: Connections leaked when Redis server restarted
let pool = RedisPool::builder(config)
    .max_size(50)
    .build()
    .unwrap();

// FIX: Add connection recycling and health checks
let pool = RedisPool::builder(config)
    .max_size(50)
    .runtime(Runtime::Tokio1)
    .recycle_timeout(Some(Duration::from_secs(30)))  // KEY: Recycle old connections
    .wait_timeout(Some(Duration::from_secs(5)))
    .create_timeout(Some(Duration::from_secs(5)))
    .build()
    .unwrap();

After fix: Memory stable at 180MB for 30+ days.

Challenge 2: The Tokio Runtime Tuning

Default Tokio runtime settings destroyed our performance under load.

// Default (BAD): Single-threaded runtime
#[tokio::main]
async fn main() {
    // Only uses 1 CPU core!
}

// Fixed: Multi-threaded with tuning
#[tokio::main(flavor = "multi_thread", worker_threads = 4)]
async fn main() {
    // Set thread-local capacity for channels
    std::env::set_var("TOKIO_WORKER_THREADS", "4");
}

// Even better: Custom runtime configuration
fn main() {
    let runtime = tokio::runtime::Builder::new_multi_thread()
        .worker_threads(4)
        .thread_name("validator-worker")
        .enable_all()
        .build()
        .unwrap();
        
    runtime.block_on(async {
        // Application code
    });
}

Impact: 3.2x throughput increase with proper runtime config.

Challenge 3: Error Handling Culture Shift

Go’s if err != nil vs Rust’s Result<T, E> required team mindset change.

Go style:

result, err := doSomething()
if err != nil {
    log.Error(err)  // Often just logged
    return nil      // Silently fails
}

Rust style:

let result = do_something()?;  // Propagates error up
// Or handle explicitly
match do_something() {
    Ok(val) => val,
    Err(e) => {
        tracing::error!("Operation failed: {}", e);
        return Err(e.into());  // Must handle
    }
}

The shift: Rust forces you to handle errors. This found 47 bugs in our Go code we didn’t know existed.

The Results: 6 Months in Production

Performance Improvements

Latency (p99):

Before (Go): 340ms average across services
After (Rust): 52ms average
Improvement: 85% faster

Memory Usage:

Before (Go): 850MB average per pod
After (Rust): 245MB average per pod
Improvement: 71% reduction

Throughput:

Before (Go): 2,500 req/sec per pod
After (Rust): 8,200 req/sec per pod
Improvement: 3.3x increase

Infrastructure Cost Savings

Pod count reduction:

Before: 60 pods peak, 15 steady-state
After: 12 pods peak, 5 steady-state
Reduction: 80% fewer pods

Monthly costs:

Before: $62K infrastructure
After: $19K infrastructure
Savings: $43K/month ($516K/year)

Developer Experience

Pros:

Compile-time error checking caught bugs early
No more GC-related debugging
Performance predictability (no GC surprises)
Memory safety prevented entire classes of bugs

Cons:

Steeper learning curve (3-4 weeks to proficiency)
Longer compile times (2-3 min vs 30 sec for Go)
Smaller ecosystem (some crates less mature)
Hiring harder (fewer Rust developers)

Incident Reduction

Production incidents (6-month comparison):

Incident Type	Go (6 months)	Rust (6 months)
Memory leaks	12	1
Race conditions	8	0
Null pointer panics	15	0 (impossible in Rust)
Performance degradation	23	3
Total	58	4

93% reduction in production incidents.

Lessons for Teams Considering Rust

✅ Good Reasons to Use Rust

Performance critical paths - APIs, data processing, real-time systems
Memory-constrained environments - Edge devices, embedded systems
Long-running services - Where GC pauses matter
Safety-critical systems - Financial, healthcare, infrastructure
High-scale systems - Where performance improvements = cost savings

❌ Bad Reasons to Use Rust

“It’s trendy” - Not a reason. Learn it properly first.
CRUD APIs - Go/Node/Python are fine for most APIs
Rapid prototyping - Rust’s compile-time checks slow iteration
Team has no Rust experience - Budget 2-3 months for learning
Small applications - Rust’s benefits don’t materialize at small scale

Migration Advice

If you’re migrating to Rust:

Start small: Pick ONE service, preferably hot-path with clear metrics
Measure everything: Establish baseline before migration
Learn ownership model: Don’t fight the borrow checker
Use mature crates: Tokio, Axum, SQLx, Serde are battle-tested
Profile early: Use cargo-flamegraph and perf from day one
Plan for training: Budget 3-4 weeks per engineer for proficiency

Red flags to abort migration:

Team resistance (forcing Rust on unwilling team = disaster)
No clear performance problem to solve
Lack of time for proper learning
Can’t articulate ROI beyond “Rust is cool”

What’s Next?

We’re now exploring:

WebAssembly compilation - Rust → WASM for edge deployment
Async runtime optimization - Custom Tokio executors
Zero-copy deserialization - Using rkyv for ultra-fast parsing
GPU acceleration - CUDA bindings for ML inference

Rust transformed our infrastructure economics. The migration was harder than expected, but $516K annual savings + 93% fewer incidents = worth it.

For more on systems programming and performance optimization, see the comprehensive Rust guide that influenced our architecture decisions.

Considering Rust for your infrastructure? Connect on LinkedIn or share your migration stories on Twitter.