P50 28s Analyze →
↩ All field notes
Querk · Field notes

Postgres as a Queue: The Locking Hotspot Pattern

Published June 23, 2026 · Querk — Postgres review pipeline

The Allure of Postgres Queues

For many engineering teams, the "Postgres as a queue" pattern is the ultimate path of least resistance. Using `SELECT ... FOR UPDATE SKIP LOCKED` feels like a stroke of genius: you get transactional integrity, ACID compliance, and zero infrastructure overhead by leveraging the database you’re already running. It works perfectly in staging and handles low-volume production loads with ease.

However, once your throughput crosses the 500 jobs-per-second threshold, the cracks begin to show. You don’t get a loud crash; you get a silent, creeping degradation of performance that manifests as latency spikes and connection pool exhaustion.

The Locking Hotspot Pattern

The culprit is the `SELECT ... FOR UPDATE` mechanism. When a worker process executes this query, Postgres places an exclusive lock on the specific rows it intends to process. While `SKIP LOCKED` prevents other workers from waiting on those specific rows, it does not prevent them from scanning the same index pages to find the next available job.

At high concurrency, your worker processes are constantly contending for access to the same index pages. Even though they are "skipping" locked rows, the underlying B-tree index remains a bottleneck. As you scale to hundreds of requests per second, the overhead of managing these row-level locks and the constant index page contention causes CPU usage to spike on the database primary. Your database is no longer just storing data; it is spending the majority of its cycles managing lock contention, leading to the "locking hotspot" phenomenon.

Why Throughput Stalls at 500 Jobs/Sec

Once you hit this wall, adding more worker processes is often counterproductive. Increasing the number of concurrent connections only increases the pressure on the lock manager and the index pages. You’ll notice that your `pg_stat_activity` shows a high number of connections in an "active" state, but your actual job throughput remains flat or begins to drop.

The database becomes the single point of failure for your entire asynchronous processing architecture. Because these locks are held within a transaction, any delay in your application logic—such as an external API call or a slow data transformation—keeps the transaction open, holding the lock longer and exacerbating the bottleneck.

Moving Beyond the Database

To move past this limit, you must decouple your job queuing from your primary transactional database. The goal is to move the queuing logic to a system designed for high-throughput, ephemeral data—a system that doesn’t require heavy-duty ACID locking for every single operation.

If you are building logistics-heavy applications where reliability is non-negotiable but throughput is essential, consider offloading your message processing to a dedicated broker. Querk provides a high-performance, developer-first infrastructure designed to handle the massive scale of modern logistics data without the locking overhead of a traditional relational database. Visit https://querk.io to see how we help engineering teams offload their heavy-duty queuing tasks and regain control over their database performance.

Querk reviews a slow Postgres query in ~30 seconds — index DDL, rewrites, write-path impact, and a verification command. Paste a query →