Hedged Requests: Tame P99 Tail Latency at Scale

Kodetra Technologies·May 20, 2026·9 min read Intermediate

Summary

Send a backup when the first call is slow. Cut P99 tail latency without overloading services.

Your P50 looks great. Your P99 is on fire. A handful of slow calls keep dragging the experience down, and you have already squeezed the obvious wins out of caching, indexes, and pool sizes. This is where hedged requests earn their keep. They are a small, focused pattern that targets the long tail of latency directly instead of trying to make every request faster.

The idea was popularized by Google's "The Tail at Scale" paper (Dean and Barroso, 2013) and is now baked into systems like BigTable, Spanner, gRPC client retries, and Envoy. In May 2026, it is also the most useful trick for cutting P99 latency in LLM inference fan-outs, multi-region reads, and any RPC graph where a single slow leaf node poisons the whole response. This guide walks through what hedging is, when to use it, how to implement it safely in Go, and the gotchas that bite teams in production.

Keep reading — it's free

Enter your email to keep reading — plus the best of AI & tech, daily. Free, forever.

Already a member? Sign in

Comments

Subscribe to join the conversation...

Be the first to comment