PostgreSQL 18 Async I/O: Tune for 3x Faster Reads

PostgreSQL 18 Async I/O: Tune for 3x Faster Reads

K
Kodetra Technologies·April 29, 2026·9 min read Advanced

Summary

Configure io_method, io_workers, and io_uring for big PostgreSQL read throughput gains.

Why PostgreSQL 18's Async I/O Changes Everything

PostgreSQL 18 ships the biggest read-path overhaul in over a decade. The new asynchronous I/O subsystem lets a single backend issue many disk reads in flight at once, instead of blocking on each page like every prior version. On NVMe and cloud block storage, that translates to 2-3x throughput improvements on cold sequential scans, vacuum, and analytics queries — without rewriting a single query.

If you run Postgres on AWS RDS, GCP Cloud SQL, Aurora, Azure, or your own bare metal, this is the upgrade that lets you stop over-provisioning IOPS to compensate for a synchronous engine. But the defaults are conservative, and on a many-core box they leave most of the win on the table. This guide walks through the architecture, the three I/O methods (sync, worker, io_uring), how to tune them for your workload, and the gotchas that bite people in production.

Prerequisites

  • PostgreSQL 18.0 or newer (released September 25, 2025)
  • Linux kernel 5.1+ if you want to use the io_uring backend
  • A build of Postgres compiled with --with-liburing for io_uring (most Linux packages already include it)
  • Superuser access to edit postgresql.conf and restart the cluster
  • Comfort reading EXPLAIN (ANALYZE, BUFFERS) output and basic pg_stat_io queries

How the New I/O Subsystem Works

In Postgres 17 and earlier, when a query needed a page that wasn't in shared_buffers, the backend made a blocking pread() syscall and stalled until the kernel returned the page. The only concurrency hint was posix_fadvise(), which asked the kernel to prefetch — useful, but limited.

Postgres 18 replaces that with a true I/O queue. Every backend pushes read requests onto a shared ring; an executor (worker process or io_uring submission) drives them; results stream back as buffers in shared memory. The query process keeps doing useful work — evaluating predicates on already-fetched pages, computing joins — while the next batch is in flight. That pipeline is the entire reason cold scans get faster.

Three concrete backends ship in 18:

  • sync — Legacy behavior. Reads still block, posix_fadvise() still prefetches. Use only if you hit a regression.
  • worker — Default. A pool of I/O worker processes performs reads on behalf of backends. Works on every platform, including macOS for development.
  • io_uring — Linux-only. Submits reads through a shared kernel ring buffer; minimal syscall overhead. Best raw throughput on modern kernels.

Step 1 — Pick the Right io_method for Your Stack

Run this query first to see what you're working with:

SHOW server_version;
SHOW io_method;
SHOW io_workers;
SHOW effective_io_concurrency;
SHOW shared_buffers;

Decision matrix:

EnvironmentRecommended io_methodWhy
Linux 5.1+ bare metal / NVMeio_uringLowest latency, fewest syscalls, peak IOPS
Linux 5.1+ on cloud (RDS, Aurora)worker (often forced)Managed services may disable io_uring
Linux <5.1 or older kernelsworkerio_uring not supported
macOS / Windows / devworkerOnly portable async option
Single-tenant analytics / OLAPio_uringCold scans dominate; biggest win
High-concurrency OLTPworkerMore predictable under churn

To enable io_uring, edit postgresql.conf:

# /etc/postgresql/18/main/postgresql.conf
io_method = 'io_uring'         # or 'worker', or 'sync'
io_workers = 8                 # only used when io_method='worker'
effective_io_concurrency = 32  # backend-level prefetch depth
maintenance_io_concurrency = 32
io_max_concurrency = -1        # -1 = auto-size from shared_buffers

Then restart (this is one of the few new GUCs that requires a full restart, not a reload):

sudo systemctl restart postgresql@18-main

Step 2 — Size io_workers for Your CPU

The default io_workers = 3 exists so that a tiny VM doesn't accidentally spawn 30 processes. It is wrong for almost every real server. The current consensus from EnterpriseDB, CYBERTEC, and the pgsql-hackers benchmarks is to set it to roughly 1/4 to 1/2 of available CPU threads, capped where your storage stops scaling.

Quick sizing recipe on Linux:

# Count logical CPUs
nproc
# Example output: 32

# Pick io_workers in the 1/4-1/2 range
echo "io_workers = 12"  >> /etc/postgresql/18/main/postgresql.conf

# Restart and verify
sudo systemctl restart postgresql@18-main
psql -c "SHOW io_workers;"

Watch for the failure mode: too many workers thrash the run queue and inflate context-switch overhead. The cleanest signal is pg_stat_activity.wait_event_type = 'IO' staying high after you raised io_workers. That usually means storage is the real bottleneck, not the worker pool — back the number down.


Step 3 — Tune effective_io_concurrency the Way It Was Always Meant to Be

Before 18, effective_io_concurrency only nudged posix_fadvise() for bitmap heap scans. In practice the parameter was inert for most workloads, which is why so many production tunings still leave it at 1.

In Postgres 18 it controls how many concurrent read requests a backend can have outstanding through the AIO subsystem. The new default is 16. For NVMe and modern cloud volumes, push it higher per-database or per-role for analytical workloads:

-- Per session for a heavy analytics query
SET effective_io_concurrency = 64;
EXPLAIN (ANALYZE, BUFFERS) SELECT ... ;

-- Per role (e.g., your reporting user)
ALTER ROLE bi_reader SET effective_io_concurrency = 64;

-- Per database
ALTER DATABASE warehouse SET effective_io_concurrency = 128;

maintenance_io_concurrency is the same idea for VACUUM, ANALYZE, CREATE INDEX, and pg_basebackup. Default is now 16. On a beefy machine, set it to 64 or 128 — vacuum will reclaim dead tuples noticeably faster on big tables.


Step 4 — Benchmark Before You Trust It

Numbers from blog posts are interesting; numbers from your hardware are decisive. Here's a minimal cold-cache benchmark you can run end-to-end.

First, build a table that actually exceeds shared_buffers. If your shared_buffers is 8 GB, you need at least ~12 GB of data on disk to force real reads.

-- 50 million rows, ~3 GB heap, ~4 GB with indexes
CREATE TABLE events (
    id          bigserial PRIMARY KEY,
    user_id     bigint    NOT NULL,
    event_type  text      NOT NULL,
    payload     jsonb     NOT NULL,
    created_at  timestamptz NOT NULL DEFAULT now()
);

INSERT INTO events (user_id, event_type, payload)
SELECT
    (random() * 1_000_000)::bigint,
    (ARRAY['click','view','purchase','signup'])[1 + (random()*3)::int],
    jsonb_build_object('ip', ('10.'|| (random()*255)::int ||'.0.1'),
                       'session', md5(random()::text))
FROM generate_series(1, 50_000_000);

CREATE INDEX ON events (user_id);
CREATE INDEX ON events (created_at);
ANALYZE events;

Now run a cold-cache scan. The trick is to drop the OS page cache between runs — otherwise your second run is in RAM and tells you nothing about disk.

# Drop OS page cache (Linux, requires root)
sync && echo 3 | sudo tee /proc/sys/vm/drop_caches

# Restart Postgres to clear shared_buffers
sudo systemctl restart postgresql@18-main

# Run the scan
psql -d testdb <<'SQL'
\timing on
EXPLAIN (ANALYZE, BUFFERS, SETTINGS)
SELECT count(*)
FROM events
WHERE event_type = 'purchase'
  AND created_at > now() - interval '90 days';
SQL

Capture three numbers: total runtime, Buffers: shared read=N, and I/O Timings. Then change io_method, restart, drop caches, and run again.

Representative results we've seen on a 16-vCPU c7gd.4xlarge with local NVMe (your numbers will differ):

io_methodCold scan timePages/sec readNotes
sync (PG17 baseline)47.2 s~104kSingle in-flight read
sync (PG18)44.8 s~110kSlight win from misc 18 fixes
worker, io_workers=1219.6 s~250k2.4x on cold reads
io_uring14.1 s~348k3.4x on cold reads

Step 5 — Watch It Run with pg_stat_io

Postgres 18 expanded pg_stat_io with async-specific columns. This is the single best view for confirming AIO is actually being used:

SELECT backend_type,
       object,
       context,
       reads,
       read_bytes,
       read_time,
       writes,
       write_bytes,
       writebacks,
       extends,
       hits,
       evictions
FROM pg_stat_io
WHERE reads > 0
ORDER BY read_bytes DESC
LIMIT 20;

Two things to verify:

  • backend_type = 'io worker' rows show up when io_method=worker — if they're empty, you're still on sync.
  • read_time / reads drops dramatically vs. PG17. If it doesn't, look at wait_event = 'DataFileRead' in pg_stat_activity — your storage is the limit, not the engine.

For long-running diagnostics, pair this with pg_stat_statements and capture deltas every 60 seconds. Tools like pganalyze, OtterTune, and Datadog's DBM already render pg_stat_io natively, so you don't have to write the dashboards yourself.


Step 6 — A Real-World Workload That Wins Big

Bulk analytics over a JSONB column on a 200 GB table is a classic AIO winner. Imagine an events table where product wants the daily count of purchase events per user cohort, joined to a dimension table:

WITH cohort AS (
  SELECT user_id, date_trunc('week', created_at) AS signup_week
  FROM users
  WHERE created_at >= '2026-01-01'
)
SELECT c.signup_week,
       date_trunc('day', e.created_at) AS day,
       count(*) FILTER (WHERE e.event_type = 'purchase') AS purchases,
       count(*) AS total_events,
       sum((e.payload->>'amount')::numeric) FILTER (WHERE e.event_type = 'purchase')
         AS revenue
FROM cohort c
JOIN events e USING (user_id)
WHERE e.created_at >= '2026-01-01'
GROUP BY 1, 2
ORDER BY 1, 2;

On PG17 this query did a parallel sequential scan of events with each worker blocking on its own reads. On PG18 with io_uring + effective_io_concurrency=64, the parallel workers also overlap reads, so total wall-clock drops 2.5-3x without changing the plan. Verify with:

EXPLAIN (ANALYZE, BUFFERS, SETTINGS, VERBOSE)
WITH cohort AS (...) SELECT ...;

Look for Parallel Seq Scan with a low I/O Timings: read=... figure — that's AIO doing its job.


Common Pitfalls and Gotchas

1. RDS / Aurora may pin you to worker

AWS RDS for PostgreSQL 18 supports the new AIO subsystem, but at the time of writing only io_method=worker is exposed in parameter groups. io_uring is not allowed because RDS's host kernels are configured without it. Don't plan a migration on benchmarks you ran on a self-managed EC2 — you won't get the io_uring numbers there.

2. Do not enable io_uring on Linux <5.1

Postgres will refuse to start if the kernel doesn't expose io_uring_setup. The error message is clear, but if you're running Ansible across a fleet, one stale node will fail to come back up. Always check uname -r in the playbook before flipping the GUC.

3. Connection poolers + io_workers math

io_workers are cluster-wide, not per-backend. If you run pgBouncer with 500 server connections, all of them share the same pool. Sizing io_workers to a quarter of your CPUs is fine — sizing it to half your connections is a recipe for thrashing. Tune for CPU, then verify under real concurrency.

4. Writes are still synchronous (mostly)

AIO in 18 is read-focused. WAL writes still go through the WAL writer, and most checkpoint writes are still synchronous. The headline 3x number is for reads. Don't expect a write-heavy OLTP system to magically speed up — the wins there are smaller and come from background writer changes, not AIO itself.

5. Watch for kernel-level rate limits

On cloud volumes (gp3, premium SSD, etc.) the bottleneck after AIO is almost always the volume's provisioned IOPS or throughput cap. AIO will saturate that cap faster than sync ever could, which means burst credits run out faster too. Move to provisioned IOPS or a larger volume tier before rolling AIO out to production.

6. shared_buffers still matters

AIO accelerates the path from disk to shared_buffers. It does not replace caching. If your hot working set fits in RAM, AIO is invisible — you were never disk-bound to begin with. Look at pg_stat_io hits-vs-reads ratio: if hits dominate, save the tuning effort for a real disk-bound query.


Quick Reference Table

ParameterDefault (PG18)Recommended (16-32 cores, NVMe)When to change
io_methodworkerio_uring on Linux 5.1+Linux bare metal/EC2; never on RDS
io_workers38-16 (¼-½ of vCPUs)Only used when io_method=worker
effective_io_concurrency1632-128 for analyticsPer-role/per-DB for BI workloads
maintenance_io_concurrency1664-128Speeds up VACUUM, REINDEX
io_max_concurrency-1 (auto)Leave at -1Only override for benchmarking
shared_buffers128 MB25% of RAMAlways tune this first

Next Steps

  • Run the benchmark in Step 4 against a staging copy of your production data — use pg_dump --schema-only + a synthetic data generator if you can't copy real rows.
  • Add pg_stat_io snapshots to your monitoring (pganalyze, Datadog DBM, or a simple cron + Prometheus exporter).
  • Read the official PG18 release notes for the full list of AIO-adjacent fixes — checkpointer, bgwriter, and parallel workers all got tuning.
  • If you're on RDS / Aurora, file a parameter-group change request to switch io_method=worker and bump io_workers; that alone is a free ~2x for cold reads.
  • Pair this with PG18's new B-Tree skip scan and UUIDv7 — the three together cover most read-side wins in the release.

PostgreSQL 18's async I/O is one of those rare upgrades where you flip a config knob and watch a graph drop. Tune it, measure it, and you'll have headroom for another two years before storage becomes the bottleneck again.

Comments

Subscribe to join the conversation...

Be the first to comment