Querk · Field notes

Postgres Bloat: Autovacuum and VACUUM Tuning

Published June 30, 2026 · Querk — Postgres review pipeline

Why Postgres tables bloat: MVCC and dead tuples

Postgres never overwrites a row in place. Under its MVCC design, an UPDATE writes a new row version and marks the old one as a dead tuple, and a DELETE marks the row dead without immediately freeing its space. Those dead tuples stay on disk until they are cleaned up, so a table under heavy write traffic steadily accumulates wasted pages. This wasted space is what people mean by postgres bloat, and it hits both heap tables and their indexes, because every row version also leaves stale index entries behind.

Bloat is not only disk waste. A bloated table holds more pages than its live data needs, so sequential scans read more blocks, the buffer cache holds less useful data, and the planner's row estimates drift as statistics fall out of date. To see how stale statistics push the optimizer toward bad plans, the guide on reading a query plan covers the cost model in detail.

How autovacuum keeps dead tuples in check

The autovacuum daemon is the background process that reclaims dead tuples. A VACUUM marks dead-tuple space as reusable for future inserts and updates, refreshes the visibility map so index-only scans work, and advances transaction IDs to prevent wraparound. Note that an ordinary VACUUM returns space for reuse inside the table rather than shrinking the file back to the operating system.

Autovacuum fires per table once dead tuples cross a threshold: autovacuum_vacuum_threshold plus autovacuum_vacuum_scale_factor times the table's estimated row count. The default scale factor of 0.2 means a table must accumulate roughly twenty percent dead rows before a vacuum begins. On a hundred-million-row table that is twenty million dead tuples of bloat before anything happens, which is why default settings are often wrong for large, churn-heavy tables.

Spotting bloat before it hurts

The quickest signal lives in pg_stat_user_tables: compare n_dead_tup against n_live_tup, and check last_autovacuum to confirm the daemon is keeping up. The pgstattuple extension reports a more precise free-space and dead-tuple percentage per table or index when you need hard numbers. Watch for tables whose on-disk size keeps growing while row counts hold steady, and for index scans that touch far more pages than the rows they return. Pairing this with a query-level view from a pg_stat_statements triage workflow helps connect a slow statement to the bloated object underneath it.

Vacuum tuning, VACUUM, and REINDEX without downtime

Effective vacuum tuning usually means lowering the scale factor for your largest tables rather than changing it globally. Set per-table storage parameters so a big table vacuums at, say, one or two percent dead tuples, and raise autovacuum_vacuum_cost_limit or add workers so vacuum keeps pace with write volume. When an EXPLAIN plan points at a bloated index or heap, pasting it for a free analysis returns the concrete vacuum settings or index rebuild worth trying, the kind of diagnosis Querk is built for. Plain VACUUM runs online and rarely blocks reads or writes, while VACUUM FULL rewrites the whole table under an ACCESS EXCLUSIVE lock, so reserve it for maintenance windows or reach for an online repack tool. For index bloat specifically, REINDEX rebuilds the index, and REINDEX CONCURRENTLY does it without a long exclusive lock so production traffic keeps flowing.

Querk reviews a slow Postgres query in ~30 seconds — index DDL, rewrites, write-path impact, and a verification command. Paste a query →