
PostgreSQL 18 Async I/O: Tune for 3x Faster Reads
Summary
Configure io_method, io_workers, and io_uring for big PostgreSQL read throughput gains.
Why PostgreSQL 18's Async I/O Changes Everything
PostgreSQL 18 ships the biggest read-path overhaul in over a decade. The new asynchronous I/O subsystem lets a single backend issue many disk reads in flight at once, instead of blocking on each page like every prior version. On NVMe and cloud block storage, that translates to 2-3x throughput improvements on cold sequential scans, vacuum, and analytics queries — without rewriting a single query.
If you run Postgres on AWS RDS, GCP Cloud SQL, Aurora, Azure, or your own bare metal, this is the upgrade that lets you stop over-provisioning IOPS to compensate for a synchronous engine. But the defaults are conservative, and on a many-core box they leave most of the win on the table. This guide walks through the architecture, the three I/O methods (sync, worker, io_uring), how to tune them for your workload, and the gotchas that bite people in production.
Prerequisites
- PostgreSQL 18.0 or newer (released September 25, 2025)
- Linux kernel 5.1+ if you want to use the
io_uringbackend - A build of Postgres compiled with
--with-liburingfor io_uring (most Linux packages already include it) - Superuser access to edit
postgresql.confand restart the cluster - Comfort reading
EXPLAIN (ANALYZE, BUFFERS)output and basicpg_stat_ioqueries
How the New I/O Subsystem Works
In Postgres 17 and earlier, when a query needed a page that wasn't in shared_buffers, the backend made a blocking pread() syscall and stalled until the kernel returned the page. The only concurrency hint was posix_fadvise(), which asked the kernel to prefetch — useful, but limited.
Postgres 18 replaces that with a true I/O queue. Every backend pushes read requests onto a shared ring; an executor (worker process or io_uring submission) drives them; results stream back as buffers in shared memory. The query process keeps doing useful work — evaluating predicates on already-fetched pages, computing joins — while the next batch is in flight. That pipeline is the entire reason cold scans get faster.
Three concrete backends ship in 18:
- sync — Legacy behavior. Reads still block,
posix_fadvise()still prefetches. Use only if you hit a regression. - worker — Default. A pool of I/O worker processes performs reads on behalf of backends. Works on every platform, including macOS for development.
- io_uring — Linux-only. Submits reads through a shared kernel ring buffer; minimal syscall overhead. Best raw throughput on modern kernels.
Step 1 — Pick the Right io_method for Your Stack
Run this query first to see what you're working with:
SHOW server_version;
SHOW io_method;
SHOW io_workers;
SHOW effective_io_concurrency;
SHOW shared_buffers;
Decision matrix:
| Environment | Recommended io_method | Why |
|---|---|---|
| Linux 5.1+ bare metal / NVMe | io_uring | Lowest latency, fewest syscalls, peak IOPS |
| Linux 5.1+ on cloud (RDS, Aurora) | worker (often forced) | Managed services may disable io_uring |
| Linux <5.1 or older kernels | worker | io_uring not supported |
| macOS / Windows / dev | worker | Only portable async option |
| Single-tenant analytics / OLAP | io_uring | Cold scans dominate; biggest win |
| High-concurrency OLTP | worker | More predictable under churn |
To enable io_uring, edit postgresql.conf:
# /etc/postgresql/18/main/postgresql.conf
io_method = 'io_uring' # or 'worker', or 'sync'
io_workers = 8 # only used when io_method='worker'
effective_io_concurrency = 32 # backend-level prefetch depth
maintenance_io_concurrency = 32
io_max_concurrency = -1 # -1 = auto-size from shared_buffers
Then restart (this is one of the few new GUCs that requires a full restart, not a reload):
sudo systemctl restart postgresql@18-main
Step 2 — Size io_workers for Your CPU
The default io_workers = 3 exists so that a tiny VM doesn't accidentally spawn 30 processes. It is wrong for almost every real server. The current consensus from EnterpriseDB, CYBERTEC, and the pgsql-hackers benchmarks is to set it to roughly 1/4 to 1/2 of available CPU threads, capped where your storage stops scaling.
Quick sizing recipe on Linux:
# Count logical CPUs
nproc
# Example output: 32
# Pick io_workers in the 1/4-1/2 range
echo "io_workers = 12" >> /etc/postgresql/18/main/postgresql.conf
# Restart and verify
sudo systemctl restart postgresql@18-main
psql -c "SHOW io_workers;"
Watch for the failure mode: too many workers thrash the run queue and inflate context-switch overhead. The cleanest signal is pg_stat_activity.wait_event_type = 'IO' staying high after you raised io_workers. That usually means storage is the real bottleneck, not the worker pool — back the number down.
Step 3 — Tune effective_io_concurrency the Way It Was Always Meant to Be
Before 18, effective_io_concurrency only nudged posix_fadvise() for bitmap heap scans. In practice the parameter was inert for most workloads, which is why so many production tunings still leave it at 1.
In Postgres 18 it controls how many concurrent read requests a backend can have outstanding through the AIO subsystem. The new default is 16. For NVMe and modern cloud volumes, push it higher per-database or per-role for analytical workloads:
-- Per session for a heavy analytics query
SET effective_io_concurrency = 64;
EXPLAIN (ANALYZE, BUFFERS) SELECT ... ;
-- Per role (e.g., your reporting user)
ALTER ROLE bi_reader SET effective_io_concurrency = 64;
-- Per database
ALTER DATABASE warehouse SET effective_io_concurrency = 128;
maintenance_io_concurrency is the same idea for VACUUM, ANALYZE, CREATE INDEX, and pg_basebackup. Default is now 16. On a beefy machine, set it to 64 or 128 — vacuum will reclaim dead tuples noticeably faster on big tables.
Step 4 — Benchmark Before You Trust It
Numbers from blog posts are interesting; numbers from your hardware are decisive. Here's a minimal cold-cache benchmark you can run end-to-end.
First, build a table that actually exceeds shared_buffers. If your shared_buffers is 8 GB, you need at least ~12 GB of data on disk to force real reads.
-- 50 million rows, ~3 GB heap, ~4 GB with indexes
CREATE TABLE events (
id bigserial PRIMARY KEY,
user_id bigint NOT NULL,
event_type text NOT NULL,
payload jsonb NOT NULL,
created_at timestamptz NOT NULL DEFAULT now()
);
INSERT INTO events (user_id, event_type, payload)
SELECT
(random() * 1_000_000)::bigint,
(ARRAY['click','view','purchase','signup'])[1 + (random()*3)::int],
jsonb_build_object('ip', ('10.'|| (random()*255)::int ||'.0.1'),
'session', md5(random()::text))
FROM generate_series(1, 50_000_000);
CREATE INDEX ON events (user_id);
CREATE INDEX ON events (created_at);
ANALYZE events;
Now run a cold-cache scan. The trick is to drop the OS page cache between runs — otherwise your second run is in RAM and tells you nothing about disk.
# Drop OS page cache (Linux, requires root)
sync && echo 3 | sudo tee /proc/sys/vm/drop_caches
# Restart Postgres to clear shared_buffers
sudo systemctl restart postgresql@18-main
# Run the scan
psql -d testdb <<'SQL'
\timing on
EXPLAIN (ANALYZE, BUFFERS, SETTINGS)
SELECT count(*)
FROM events
WHERE event_type = 'purchase'
AND created_at > now() - interval '90 days';
SQL
Capture three numbers: total runtime, Buffers: shared read=N, and I/O Timings. Then change io_method, restart, drop caches, and run again.
Representative results we've seen on a 16-vCPU c7gd.4xlarge with local NVMe (your numbers will differ):
| io_method | Cold scan time | Pages/sec read | Notes |
|---|---|---|---|
| sync (PG17 baseline) | 47.2 s | ~104k | Single in-flight read |
| sync (PG18) | 44.8 s | ~110k | Slight win from misc 18 fixes |
| worker, io_workers=12 | 19.6 s | ~250k | 2.4x on cold reads |
| io_uring | 14.1 s | ~348k | 3.4x on cold reads |
Step 5 — Watch It Run with pg_stat_io
Postgres 18 expanded pg_stat_io with async-specific columns. This is the single best view for confirming AIO is actually being used:
SELECT backend_type,
object,
context,
reads,
read_bytes,
read_time,
writes,
write_bytes,
writebacks,
extends,
hits,
evictions
FROM pg_stat_io
WHERE reads > 0
ORDER BY read_bytes DESC
LIMIT 20;
Two things to verify:
backend_type = 'io worker'rows show up whenio_method=worker— if they're empty, you're still on sync.read_time / readsdrops dramatically vs. PG17. If it doesn't, look atwait_event = 'DataFileRead'inpg_stat_activity— your storage is the limit, not the engine.
For long-running diagnostics, pair this with pg_stat_statements and capture deltas every 60 seconds. Tools like pganalyze, OtterTune, and Datadog's DBM already render pg_stat_io natively, so you don't have to write the dashboards yourself.
Step 6 — A Real-World Workload That Wins Big
Bulk analytics over a JSONB column on a 200 GB table is a classic AIO winner. Imagine an events table where product wants the daily count of purchase events per user cohort, joined to a dimension table:
WITH cohort AS (
SELECT user_id, date_trunc('week', created_at) AS signup_week
FROM users
WHERE created_at >= '2026-01-01'
)
SELECT c.signup_week,
date_trunc('day', e.created_at) AS day,
count(*) FILTER (WHERE e.event_type = 'purchase') AS purchases,
count(*) AS total_events,
sum((e.payload->>'amount')::numeric) FILTER (WHERE e.event_type = 'purchase')
AS revenue
FROM cohort c
JOIN events e USING (user_id)
WHERE e.created_at >= '2026-01-01'
GROUP BY 1, 2
ORDER BY 1, 2;
On PG17 this query did a parallel sequential scan of events with each worker blocking on its own reads. On PG18 with io_uring + effective_io_concurrency=64, the parallel workers also overlap reads, so total wall-clock drops 2.5-3x without changing the plan. Verify with:
EXPLAIN (ANALYZE, BUFFERS, SETTINGS, VERBOSE)
WITH cohort AS (...) SELECT ...;
Look for Parallel Seq Scan with a low I/O Timings: read=... figure — that's AIO doing its job.
Common Pitfalls and Gotchas
1. RDS / Aurora may pin you to worker
AWS RDS for PostgreSQL 18 supports the new AIO subsystem, but at the time of writing only io_method=worker is exposed in parameter groups. io_uring is not allowed because RDS's host kernels are configured without it. Don't plan a migration on benchmarks you ran on a self-managed EC2 — you won't get the io_uring numbers there.
2. Do not enable io_uring on Linux <5.1
Postgres will refuse to start if the kernel doesn't expose io_uring_setup. The error message is clear, but if you're running Ansible across a fleet, one stale node will fail to come back up. Always check uname -r in the playbook before flipping the GUC.
3. Connection poolers + io_workers math
io_workers are cluster-wide, not per-backend. If you run pgBouncer with 500 server connections, all of them share the same pool. Sizing io_workers to a quarter of your CPUs is fine — sizing it to half your connections is a recipe for thrashing. Tune for CPU, then verify under real concurrency.
4. Writes are still synchronous (mostly)
AIO in 18 is read-focused. WAL writes still go through the WAL writer, and most checkpoint writes are still synchronous. The headline 3x number is for reads. Don't expect a write-heavy OLTP system to magically speed up — the wins there are smaller and come from background writer changes, not AIO itself.
5. Watch for kernel-level rate limits
On cloud volumes (gp3, premium SSD, etc.) the bottleneck after AIO is almost always the volume's provisioned IOPS or throughput cap. AIO will saturate that cap faster than sync ever could, which means burst credits run out faster too. Move to provisioned IOPS or a larger volume tier before rolling AIO out to production.
6. shared_buffers still matters
AIO accelerates the path from disk to shared_buffers. It does not replace caching. If your hot working set fits in RAM, AIO is invisible — you were never disk-bound to begin with. Look at pg_stat_io hits-vs-reads ratio: if hits dominate, save the tuning effort for a real disk-bound query.
Quick Reference Table
| Parameter | Default (PG18) | Recommended (16-32 cores, NVMe) | When to change |
|---|---|---|---|
| io_method | worker | io_uring on Linux 5.1+ | Linux bare metal/EC2; never on RDS |
| io_workers | 3 | 8-16 (¼-½ of vCPUs) | Only used when io_method=worker |
| effective_io_concurrency | 16 | 32-128 for analytics | Per-role/per-DB for BI workloads |
| maintenance_io_concurrency | 16 | 64-128 | Speeds up VACUUM, REINDEX |
| io_max_concurrency | -1 (auto) | Leave at -1 | Only override for benchmarking |
| shared_buffers | 128 MB | 25% of RAM | Always tune this first |
Next Steps
- Run the benchmark in Step 4 against a staging copy of your production data — use
pg_dump --schema-only+ a synthetic data generator if you can't copy real rows. - Add
pg_stat_iosnapshots to your monitoring (pganalyze, Datadog DBM, or a simple cron + Prometheus exporter). - Read the official PG18 release notes for the full list of AIO-adjacent fixes — checkpointer, bgwriter, and parallel workers all got tuning.
- If you're on RDS / Aurora, file a parameter-group change request to switch
io_method=workerand bumpio_workers; that alone is a free ~2x for cold reads. - Pair this with PG18's new B-Tree skip scan and UUIDv7 — the three together cover most read-side wins in the release.
PostgreSQL 18's async I/O is one of those rare upgrades where you flip a config knob and watch a graph drop. Tune it, measure it, and you'll have headroom for another two years before storage becomes the bottleneck again.
Comments
Be the first to comment