Idempotency Keys: Make POST Endpoints Safe to Retry — ContentBuffer guide

Idempotency Keys: Make POST Endpoints Safe to Retry

K
Kodetra Technologies··9 min read Intermediate

Summary

Build a Redis + Postgres idempotency layer that survives retries, races, and crashes.

Why this matters now

Every backend engineer eventually hits the same nightmare. A user taps "Pay" twice on a flaky 3G connection. A mobile client retries a request after a 30-second timeout. A queue worker re-runs a job because Kubernetes killed the pod between the database write and the ACK. The result is the same: duplicate charges, duplicate orders, duplicate emails, duplicate everything.

GET, PUT, and DELETE are naturally idempotent at the HTTP semantics level — repeat them and the world doesn't change. POST is the troublemaker. POST is supposed to create, and creating twice creates twice. But networks lie. Clients retry. Load balancers retry. Service meshes retry. If your POST endpoints can't survive a duplicate, your system is one bad lunch break away from a Stripe-style refund storm.

The fix is older than REST itself, but Stripe's 2017 design crystallized it for the API era: idempotency keys. Client generates a UUID, sends it as a header, and the server promises that the first request with that key is processed and every retry returns the same response. Stripe runs billions of these a day. So does Adyen. So does Shopify. As of 2026, the IETF has a draft RFC that standardizes the Idempotency-Key header and behavior. If you're shipping a payments, ordering, or messaging API in 2026, this is table stakes.

This guide builds a production-grade idempotency layer in Node.js with Redis and Postgres. We'll cover the obvious happy path, then spend most of our time on the gotchas that turn naive implementations into outage generators: concurrency races, partial failures, payload fingerprinting, key TTLs, and replay storms.


Prerequisites

Before you start, you should be comfortable with:

  • A web framework that exposes raw request/response (Express, Fastify, Koa, or equivalent in your language).
  • Redis as a cache and atomic primitive store (SET NX, Lua scripting).
  • A relational database with row-level locks (Postgres, MySQL).
  • HTTP semantics: status codes, headers, idempotent vs. safe methods.

Code samples use Node.js 22, Express 5, and ioredis. The patterns translate one-to-one to Go (use redis/go-redis and a sql.DB wrapper), Python (FastAPI plus redis-py), or Rust.


The contract: what "idempotent" actually means

The naive version: "same key, same response." That's necessary but not sufficient. The full contract has four parts.

  1. Same key, same fingerprint, same response. A retry with identical body and headers must return the originally stored response — same status, same body, same X-Idempotency-Replay: true marker.
  2. Same key, different fingerprint, error. If the second request changes the body or critical headers, return 422 (or 409). This catches client bugs where the same key is reused for different operations.
  3. Concurrent requests with the same key, only one processes. Two retries arriving 5ms apart must serialize. The first runs the handler; the second waits or returns "in progress." It must never fan out into two writes.
  4. In-flight crash, request can be retried. If the server starts a request, fails to commit, and crashes mid-flight, the next retry must be allowed to actually run — not stuck forever in "in progress."

Number 4 is the one most homegrown implementations get wrong. Number 3 is the one most tutorials skip. Both are why you don't write idempotency from scratch unless you understand the failure modes.


Architecture overview

The high-level flow:

Client                      Server                       Redis             DB
  |                            |                          |                |
  | POST /charges              |                          |                |
  | Idempotency-Key: 9d3f..    |                          |                |
  |--------------------------->|                          |                |
  |                            | SET NX key:9d3f "lock"   |                |
  |                            | EX 60                    |                |
  |                            |------------------------->|                |
  |                            |<------------------------ OK               |
  |                            |                                           |
  |                            | BEGIN TX                                  |
  |                            |------------------------------------------>|
  |                            | INSERT charge                             |
  |                            | INSERT idempotency_record (key, response) |
  |                            | COMMIT                                    |
  |                            |<------------------------------------------|
  |                            |                                           |
  |                            | SET key:9d3f cached_response, EX 86400    |
  |                            |------------------------->|                |
  |<-- 200 OK + body ----------|                          |                |

Two storage layers, on purpose:

  • Redis handles the lock (only one request at a time per key) and the fast path replay cache (memory-speed reads for the next 24h).
  • Postgres stores the durable record so retries 7 days later still hit a deterministic response, and the lock state survives a Redis flush.

Redis-only works for low-stakes idempotency. Money requires both.


Step 1 — The data model

Two tables and a Redis keyspace.

CREATE TABLE idempotency_records (
  key            text PRIMARY KEY,
  fingerprint    text NOT NULL,           -- sha256 of method+path+body
  user_id        uuid NOT NULL,
  status         text NOT NULL,           -- 'in_progress' | 'completed' | 'failed'
  response_code  int,
  response_body  jsonb,
  created_at     timestamptz NOT NULL DEFAULT now(),
  completed_at   timestamptz,
  expires_at     timestamptz NOT NULL DEFAULT (now() + interval '24 hours')
);

CREATE INDEX idx_idem_user_created ON idempotency_records (user_id, created_at DESC);
CREATE INDEX idx_idem_expires      ON idempotency_records (expires_at);

Keying by key alone is dangerous in a multi-tenant API — a malicious client could spam keys to discover other tenants' responses. Always namespace by user or tenant: (user_id, key) is safer. Stripe scopes by API key. Your composite primary key should match.

Redis keyspace mirrors the relational layout but is purely advisory:

idem:{user_id}:{key}:lock        SET NX EX 60   value=request_uuid
idem:{user_id}:{key}:response    SET EX 86400   value=JSON(response)

Step 2 — The middleware

Below is the core middleware. Read it once, then we'll dissect the four risk areas.

import crypto from 'node:crypto';
import { Redis } from 'ioredis';
import { pool } from './db.js';

const redis = new Redis(process.env.REDIS_URL);
const LOCK_TTL = 60;           // seconds — generous for slow handlers
const RECORD_TTL_SEC = 86_400; // 24h Redis cache; DB row keeps longer

function fingerprint(req) {
  const h = crypto.createHash('sha256');
  h.update(req.method);
  h.update('\n');
  h.update(req.originalUrl);
  h.update('\n');
  h.update(JSON.stringify(req.body ?? {}));
  return h.digest('hex');
}

export function idempotency() {
  return async (req, res, next) => {
    const key = req.header('Idempotency-Key');
    if (!key) {
      // Required for unsafe writes? Up to your API contract.
      return next();
    }
    if (!/^[A-Za-z0-9_\-]{8,255}$/.test(key)) {
      return res.status(400).json({ error: 'invalid_idempotency_key' });
    }

    const userId = req.user?.id ?? 'anon';
    const fp = fingerprint(req);
    const lockKey = `idem:${userId}:${key}:lock`;
    const respKey = `idem:${userId}:${key}:response`;

    // 1. Fast path — already-completed cached response in Redis
    const cached = await redis.get(respKey);
    if (cached) {
      const parsed = JSON.parse(cached);
      if (parsed.fingerprint !== fp) {
        return res.status(422).json({ error: 'idempotency_key_reuse_with_different_payload' });
      }
      res.set('X-Idempotency-Replay', 'true');
      return res.status(parsed.status).json(parsed.body);
    }

    // 2. Try to acquire the lock — atomic SET NX
    const requestId = crypto.randomUUID();
    const locked = await redis.set(lockKey, requestId, 'EX', LOCK_TTL, 'NX');

    if (!locked) {
      return res.status(409).set('Retry-After', '2')
        .json({ error: 'idempotency_request_in_progress' });
    }

    // 3. Check Postgres for a completed record (Redis may have evicted)
    const { rows } = await pool.query(
      `SELECT fingerprint, status, response_code, response_body
         FROM idempotency_records
        WHERE key = $1 AND user_id = $2`,
      [key, userId]
    );
    if (rows[0]?.status === 'completed') {
      const r = rows[0];
      if (r.fingerprint !== fp) {
        await redis.del(lockKey);
        return res.status(422).json({ error: 'idempotency_key_reuse_with_different_payload' });
      }
      await redis.set(respKey, JSON.stringify({
        fingerprint: r.fingerprint, status: r.response_code, body: r.response_body
      }), 'EX', RECORD_TTL_SEC);
      await redis.del(lockKey);
      res.set('X-Idempotency-Replay', 'true');
      return res.status(r.response_code).json(r.response_body);
    }

    // 4. Insert/refresh the in-progress record
    await pool.query(
      `INSERT INTO idempotency_records (key, user_id, fingerprint, status)
       VALUES ($1, $2, $3, 'in_progress')
       ON CONFLICT (key) DO UPDATE
         SET fingerprint = EXCLUDED.fingerprint,
             status = 'in_progress',
             completed_at = NULL
       WHERE idempotency_records.status != 'completed'
          OR idempotency_records.expires_at < now()`,
      [key, userId, fp]
    );

    // 5. Hijack res.json so we can capture the response
    const originalJson = res.json.bind(res);
    res.json = (body) => {
      res.locals.__idem_body = body;
      return originalJson(body);
    };

    res.on('finish', async () => {
      try {
        const body = res.locals.__idem_body ?? null;
        const status = res.statusCode;

        // Don't cache 5xx — those are retryable, not deterministic
        if (status >= 500) {
          await pool.query(
            `UPDATE idempotency_records SET status='failed', completed_at=now()
              WHERE key=$1 AND user_id=$2`, [key, userId]);
          await redis.del(lockKey);
          return;
        }

        await pool.query(
          `UPDATE idempotency_records
              SET status='completed', response_code=$3, response_body=$4, completed_at=now()
            WHERE key=$1 AND user_id=$2`,
          [key, userId, status, body]
        );
        await redis.set(respKey, JSON.stringify({
          fingerprint: fp, status, body
        }), 'EX', RECORD_TTL_SEC);
      } finally {
        // Lua compare-and-delete — only release if we own the lock
        const lua = `if redis.call('get', KEYS[1]) == ARGV[1]
                     then return redis.call('del', KEYS[1]) else return 0 end`;
        await redis.eval(lua, 1, lockKey, requestId);
      }
    });

    next();
  };
}

Mount it before any route that creates resources:

app.use('/charges', idempotency(), chargesRouter);
app.use('/orders',  idempotency(), ordersRouter);

Step 3 — Example I/O

Happy path:

POST /charges HTTP/1.1
Idempotency-Key: 9d3f8c12-aa54-4b8e-8f24-1c7e6d29b021
Content-Type: application/json

{"amount": 5000, "currency": "usd", "customer": "cus_K9"}

→ 201 Created
{"id":"ch_01HW...","amount":5000,"status":"succeeded"}

Same request retried 30 seconds later:

POST /charges HTTP/1.1
Idempotency-Key: 9d3f8c12-aa54-4b8e-8f24-1c7e6d29b021
...same body...

→ 201 Created
X-Idempotency-Replay: true
{"id":"ch_01HW...","amount":5000,"status":"succeeded"}

Same key, different amount (client bug):

→ 422 Unprocessable Entity
{"error":"idempotency_key_reuse_with_different_payload"}

Two retries arriving 10ms apart, while the first is still running:

→ 409 Conflict
Retry-After: 2
{"error":"idempotency_request_in_progress"}

Step 4 — The four gotchas

Gotcha 1 — Caching 5xx responses

Naive code stores every completed response. That's wrong. A 500 from "database unavailable" is not deterministic — the client should be able to retry it. If you cache the 500 with the idempotency key, every retry hits the cached 500 and the operation never recovers.

The rule: only cache 2xx and 4xx. 5xx is a transient failure; release the lock, mark the record failed, and let the next retry actually run. Stripe's docs spell this out explicitly.

Gotcha 2 — The dual-write inside the handler

Your handler probably does:

await db.query('INSERT INTO charges ...');
await stripe.charges.create(...);  // external call

If the DB insert succeeds but the Stripe call fails with a 500, the lock releases, the record is marked failed, and the next retry runs the whole handler again — double DB insert.

Fix: put the DB write and the idempotency record write in the same transaction. Even better: use the transactional outbox pattern for the Stripe call. Idempotency keys are a per-request guard. They are not a substitute for atomicity inside the handler. Combine them.

Gotcha 3 — TTL too short, too long

24 hours is the default for Stripe and Shopify. Why?

  • Too short (1 hour): a mobile client puts the phone in airplane mode for an afternoon flight, retries on landing — your record has expired, the request runs again, double charge.
  • Too long (30 days): legitimate reuse of human-friendly keys (order-123) becomes a footgun, and your storage grows linearly with every request forever.

24h is the Goldilocks default. Go to 7 days only if you have async retry queues that span multi-day outages.

Gotcha 4 — Concurrency: SET NX is not enough

The SET NX lock guarantees that only one request enters the handler at a time. But what about the lock expiring mid-handler? Default TTL is 60s, but a slow Stripe call could take 90s. When the lock expires, a retry sneaks in and starts a parallel handler. Now you have two handlers running.

Two fixes:

  1. Fencing tokens — store the request UUID inside the lock value. When the original handler tries to release with DEL, do a Lua compare-and-swap. The middleware above already does this. But it only protects the release — not the write.
  2. DB-side dedup — add a UNIQUE constraint on a business key (e.g., (user_id, idempotency_key) on the charges table itself). Two parallel handlers will both try to insert; the second fails with a unique-violation. Catch it, look up the existing row, and return its response.

Belt and suspenders. The lock is for performance (avoid running the work twice). The unique constraint is for correctness (guarantee at most one row exists). You need both.


Quick reference

DecisionStripe-style defaultWhy
Header nameIdempotency-KeyIETF RFC draft; widely adopted
Key formatUUIDv4, 8–255 charsEnough entropy, no collision risk
Required?Optional, recommendedLets clients opt in for unsafe ops
TTL24 hoursSurvives mobile airplane mode, bounds storage
Lock TTL60 secondsLonger than slowest handler
Cache 5xx?No5xx is retryable, not deterministic
Body fingerprintsha256(method+url+body)Detects accidental reuse
Multi-tenant scope(user_id, key)Prevents cross-tenant key collisions
Replay markerX-Idempotency-Replay: trueLets clients distinguish first vs. retry
StorageRedis (lock+cache) + Postgres (durable)Speed + durability
BackstopDB UNIQUE on business keyCorrectness when locks expire

Next steps

You've got the spine. To turn it into something production-ready, layer on:

  • Observability. Tag every idempotency outcome with a Prometheus counter: idem_requests_total{outcome="cache_hit|new|in_progress|fingerprint_mismatch"}. The cache_hit ratio is your duplicate-retry signal — sudden spikes tell you a client is in a retry loop.
  • Garbage collection. A nightly job that does DELETE FROM idempotency_records WHERE expires_at < now() keeps the table bounded.
  • Replay testing. Chaos test by sending every request twice in CI. If your endpoints aren't idempotent, the second response will diverge.
  • The Idempotency-Key RFC. The current IETF draft is worth reading once — it documents edge cases (key length, accepted characters, deprecation paths) you'll hit eventually.

Idempotency keys are one of those backend features that are invisible when they work and catastrophic when they don't. Get them right once, in middleware, and you can stop worrying about retries forever.

Comments

Subscribe to join the conversation...

Be the first to comment