How Postgres Parallel Query Execution Works
What a parallel query actually does
A normal query runs inside a single backend process, using one CPU core no matter how many sit idle. Parallel query in Postgres breaks that ceiling: the planner can hand part of the work to a set of background parallel workers that scan and process rows at the same time as the original backend, which is called the leader. The results funnel back through a Gather node (or a Gather Merge, when sorted order has to be preserved), which collects tuples from every worker and the leader before passing them up the plan tree. Parallelism arrived in version 9.6 and has widened with almost every major release since, so the exact set of plan shapes that qualify depends partly on which Postgres you run.
Which operations can go parallel
Not every node parallelizes, and the ones that do split work differently. Parallel Seq Scan is the classic case: each worker reads a disjoint slice of the heap. Btree index scans, index-only scans, and bitmap heap scans can run in parallel too. Joins participate as well, though with an important distinction: hash join has a genuinely parallel-aware form, a Parallel Hash Join in which workers cooperate to build and probe one shared hash table, while nested loop and merge join are not parallel-aware nodes themselves and simply execute inside a worker over the parallelized outer relation. Aggregates split into a Partial Aggregate computed per worker and a Finalize Aggregate that merges the partial states above the Gather. Parallel Append spreads the child plans of a partitioned table or a UNION across workers. The planner refuses to parallelize anything marked PARALLEL UNSAFE, most data-modifying statements, queries that lock rows with FOR UPDATE, and CTEs referenced more than once. Functions default to parallel-unsafe unless explicitly labeled, so a single custom function in a WHERE clause can quietly disable the whole feature for that query.
The GUCs that control it
The most consequential knob is max_parallel_workers_per_gather, which caps how many workers one Gather may request; it defaults to 2, so a query that looks embarrassingly parallel still uses only two extra cores until you raise it. That request is bounded in turn by max_parallel_workers (the per-cluster pool, default 8) and max_worker_processes (the hard ceiling on all background workers). A table qualifies for a parallel scan only once it exceeds min_parallel_table_scan_size (8MB) or min_parallel_index_scan_size, and the planner weighs parallel_setup_cost and parallel_tuple_cost against the expected saving before committing. You can override the size heuristic per table with ALTER TABLE ... SET (parallel_workers = N). Whether the estimate is right depends on accurate statistics and the cost model, which reading a query plan from EXPLAIN covers in depth.
When it helps and when it hurts
Parallelism pays off on large analytical scans and aggregations where CPU is the bottleneck and cores sit idle: counting, summing, or joining millions of rows. It is the wrong tool for short transactional lookups, where the fixed cost of spawning workers dwarfs the query itself, and it can backfire under heavy concurrency, where many sessions each grabbing workers drain the shared pool and starve one another. Watch for a line like "Workers Planned: 4 / Workers Launched: 2" in EXPLAIN ANALYZE; that gap means the pool was already exhausted at run time. When a slow plan never goes parallel at all, Querk points at the GUC or the parallel-unsafe operator holding it back and suggests the index or rewrite that changes the arithmetic, so it helps to paste the plan into the free analyzer before reaching for the config file. For large tables the right index can matter as much as the worker count, a trade-off that the read-versus-write cost of composite indexes digs into.
