Transactional Outbox Pattern: Stop Losing Events

Why your events keep disappearing

You wrap a database write and a Kafka publish in the same function. The DB succeeds, then Kafka times out — or the reverse. Either way you now have an inconsistent system: an order exists with no OrderCreated event, or an event fired for a row that was never committed. This is the dual-write problem, and it shows up in every microservices stack the moment two pieces of state need to stay in sync.

The Transactional Outbox pattern fixes this by reducing the problem to one durable write. Instead of writing to two systems, you write your business row and your event to the same database, in the same transaction. A separate relay process ships the event to Kafka after the fact. If Kafka is down, the event waits. If your app crashes, the event waits. No more lost messages, no distributed transactions, no XA.

This guide walks through a production-grade outbox in PostgreSQL — schema, code, relay loop, and the four mistakes that cost teams a weekend each. By the end you can paste this into a service today.

Prerequisites

PostgreSQL 13+ (we'll use SKIP LOCKED) or any RDBMS with row-level locks
Kafka, RabbitMQ, SNS, or any message broker — the pattern is broker-agnostic
Comfortable with SQL transactions and at least one backend language (we'll use Python; Node and Go versions are trivial)
Basic familiarity with at-least-once delivery semantics

Step 1: Understand the dual-write trap

Here's the broken code that ships in nearly every greenfield service:

# BROKEN: dual write
def create_order(order):
    db.insert("orders", order)            # 1
    kafka.publish("orders.created", order) # 2
    return order

Three failure modes:

Step 1 commits, step 2 throws — order exists, no event. Downstream services (inventory, billing, search) never hear about it.
Step 2 publishes, step 1 rolls back later (e.g., a constraint check) — phantom event for a row that doesn't exist.
The process crashes between 1 and 2 — same as case 1, but harder to detect.

Wrapping both in a try/except doesn't help — there is no atomic primitive that spans Postgres and Kafka. XA transactions exist but they are slow, fragile, and most managed brokers don't support them.

Step 2: Add an outbox table

The outbox is just another table in your service's database. Every event you want to publish gets inserted there — in the same transaction as the business write.

CREATE TABLE outbox (
    id            BIGSERIAL PRIMARY KEY,
    aggregate_id  TEXT        NOT NULL,         -- e.g. order id
    aggregate     TEXT        NOT NULL,         -- "order", "user", ...
    event_type    TEXT        NOT NULL,         -- "OrderCreated"
    payload       JSONB       NOT NULL,
    created_at    TIMESTAMPTZ NOT NULL DEFAULT now(),
    published_at  TIMESTAMPTZ,                  -- NULL until relayed
    attempts      INT         NOT NULL DEFAULT 0
);

-- The relay only cares about unpublished rows.
CREATE INDEX outbox_unpublished_idx
    ON outbox (created_at)
    WHERE published_at IS NULL;

A few design notes baked into that schema:

aggregate_id + aggregate let consumers route or partition by entity. Kafka producers will use aggregate_id as the message key so events for the same order land on the same partition in order.
payload is JSONB so you can evolve the event shape without migrations.
The partial index on published_at IS NULL keeps the relay query fast even when the table grows to millions of rows — the published rows fall out of the index entirely.
We keep published rows around for audit and replay; a nightly job archives anything older than N days.

Step 3: Write business + event in one transaction

Your service code now looks like this. Notice that nothing talks to Kafka — that's the whole point.

# Correct: single transaction, no broker call
def create_order(order, conn):
    with conn.transaction():
        conn.execute(
            "INSERT INTO orders (id, customer_id, total, status) "
            "VALUES (%s, %s, %s, 'PENDING')",
            (order.id, order.customer_id, order.total),
        )
        conn.execute(
            "INSERT INTO outbox "
            "  (aggregate_id, aggregate, event_type, payload) "
            "VALUES (%s, 'order', 'OrderCreated', %s::jsonb)",
            (order.id, json.dumps(order.to_event())),
        )
    return order  # both rows committed, or neither

Postgres guarantees both inserts succeed together or fail together. There is no broker call in the request path — your endpoint stays fast and stops depending on Kafka's availability for correctness.

Step 4: Build the relay (poller version)

The relay is a small worker that reads unpublished rows, sends them to Kafka, and marks them published. The trick is to do this safely across multiple relay instances — you'll run more than one for HA.

# relay.py — runs as N replicas in your cluster
import json, time, psycopg
from confluent_kafka import Producer

producer = Producer({"bootstrap.servers": "kafka:9092",
                     "enable.idempotence": True,
                     "acks": "all"})

BATCH = 100
SLEEP = 0.2

def relay_loop(conn):
    while True:
        with conn.transaction():
            rows = conn.execute(
                # SKIP LOCKED is the magic — replicas never fight over rows.
                f"""SELECT id, aggregate_id, event_type, payload
                       FROM outbox
                       WHERE published_at IS NULL
                       ORDER BY id
                       LIMIT {BATCH}
                       FOR UPDATE SKIP LOCKED"""
            ).fetchall()

            if not rows:
                time.sleep(SLEEP); continue

            for r in rows:
                producer.produce(
                    topic = "orders.created",
                    key   = r["aggregate_id"].encode(),
                    value = json.dumps(r["payload"]).encode(),
                    headers = [("event_type", r["event_type"].encode()),
                               ("outbox_id", str(r["id"]).encode())],
                )
            producer.flush(timeout=5)  # block until acked or fail

            ids = [r["id"] for r in rows]
            conn.execute(
                "UPDATE outbox SET published_at = now() WHERE id = ANY(%s)",
                (ids,)
            )
        # transaction commits, locks release

Three things make this safe under concurrency:

FOR UPDATE SKIP LOCKED — each relay grabs a different batch. No coordination, no Redis lock, no leader election.
producer.flush() blocks until Kafka acks. If it raises, the transaction rolls back and the rows stay published_at = NULL for the next attempt.
enable.idempotence=True + acks=all — Kafka itself dedupes producer retries, so consumers see each event at-least-once but never out of order within a partition.

Step 5: The CDC alternative (Debezium)

Polling works, but it puts query load on your primary DB and has ~200ms tail latency. The other option is Change Data Capture: point Debezium at the Postgres write-ahead log, watch the outbox table for inserts, and stream them straight to Kafka.

Architecturally it's the same pattern — same table, same atomic write — but the relay is replaced by a Kafka Connect worker reading the WAL. You get sub-50ms latency and zero query load on your DB. The trade-off is operational: you now run Debezium, manage replication slots, and care about WAL retention.

Rule of thumb: start with the polling relay. It's 80 lines of code and you can run it tomorrow. Move to Debezium when polling latency or DB load starts to hurt.

Common pitfalls (the ones that cost a weekend)

1. Forgetting the partial index. Without WHERE published_at IS NULL on the index, the relay's SELECT ... ORDER BY id turns into a sequential scan once you're past a few million rows. The relay falls behind, the outbox grows, the index gets even slower — death spiral. Always include the partial index from day one.

2. Letting the outbox grow forever. Even with the partial index, VACUUM has to walk the whole table. Run a daily job that deletes (or archives to cold storage) rows where published_at < now() - interval '7 days'. Keep enough history for replay; not enough to wreck autovacuum.

3. Treating consumers like exactly-once. The outbox guarantees at-least-once delivery, not exactly-once. A relay can publish a row, crash before UPDATE published_at, and a sibling relay republishes the same event. Consumers must be idempotent — typically by deduping on outbox_id in a header, or on a natural business key.

4. Holding the transaction open too long. Every relay batch holds row locks until the Kafka flush completes. If Kafka is slow, those locks pile up and block writes from your service. Mitigation: set a short request.timeout.ms on the producer, keep batches small (50-200 rows), and put a hard timeout on the entire transaction. Better one slow batch retried than your API going read-only because the relay is wedged.

Quick reference

Concern	What to do
Atomicity	INSERT business row + INSERT outbox row in one DB transaction
Concurrency	FOR UPDATE SKIP LOCKED — multiple relays, no coordination
Ordering	Use aggregate_id as Kafka message key; one partition per aggregate
Throughput	Batches of 50-200 rows; tune by p99 publish latency, not throughput
Cleanup	Daily job: delete WHERE published_at < now() - interval '7 days'
Latency budget	Polling: ~200ms. Debezium/CDC: ~50ms. Both at-least-once.
Consumer contract	Idempotent on outbox_id header or business key — always

Next steps

Three places to take this further once the basic pattern is humming:

Saga orchestration. Once your services emit reliable events, you can chain them into multi-step business workflows with compensation steps. The outbox is the foundation; saga is the choreography on top.
Event sourcing. If you find yourself writing every state change as both a row mutation and an outbox event, consider going further — store the events as the source of truth and project the orders table from them.
Schema registry. Outbox payloads are JSONB today. Lock them down with Avro or Protobuf in a schema registry before consumers start to drift. Future-you will be grateful.

That's the whole pattern. Two tables, one transaction, a 60-line relay. It is the cheapest reliability win in distributed systems — apply it before you reach for sagas, eventual consistency frameworks, or distributed transactions.