📡

Bimodal Latency Distribution

Live

Real HTTP requests to a live Cloudflare Worker → Azure PostgreSQL + Redis cache. Watch how cache-aside creates two distinct latency peaks — the bimodal shape your average metric completely hides.

💻 Browser → ⚡ CF Worker → ⚡ Redis Cache or miss → 🔗 Hyperdrive → 🐘 Azure PostgreSQL

⚙️ Demo Controls

Cache Mode

80% cached IDs + 20% fresh DB hits — reveals the bimodal distribution

Request Count

Concurrency

How many requests fly in parallel. 1 = sequential (pure queue, clear separation of cache vs DB times). 5–20 = realistic production load — the Cloudflare Worker handles them simultaneously, Redis and PostgreSQL serve different requests at the same time, and latencies can overlap. Higher concurrency reveals whether your stack degrades gracefully under pressure or collapses into a single slow blob on the histogram.

📊 Live Latency Distribution

Cache hit DB hit

Hit 🚀 Fire Requests to watch the bimodal distribution form live

📈 Stats

Requests Fired

0

Cache Hits

—

DB Hits

—

Mean —

P50 (median) —

P95 —

P99 —

Cache speedup —

📋 Run History

No runs yet. Fire some requests to see results here.

🤔 Why Your Average Lies

If cache hits take ~150ms and DB hits take ~450ms at an 80/20 split:

Mean = 0.8 × 150ms + 0.2 × 450ms

= 210ms ← nobody actually sees this

80% of users get ~150ms, 20% get ~450ms. The mean is a phantom — it exists in the gap between the two real experiences.

🔮 What to Monitor Instead

✓P50 — what a typical user experiences
✓P95 / P99 — what your worst-served users experience
✓Cache hit rate — leading indicator; drops before latency spikes
✓Histogram shape — bimodal = hidden slow tail; unimodal = healthy
✕Mean — collapses the distribution into a single misleading number

📚 Further Reading

Why P99 Latency Is the Only Number Your Users Feel

medium.com · Michał Bojko

Variance in Bins — A Beginner's Guide for Software Engineers

medium.com · Observability is Engineering

Clustering Algorithms in Observability

medium.com · Observability is Engineering