Are your database read operations unexpectedly slowing down as your workload scales? Many organizations running PostgreSQL-based systems encounter performance bottlenecks that aren’t immediately obvious. When many concurrent read operations access tables with numerous partitions or indexes, they can even exhaust PostgreSQL’s fast path locking mechanism, forcing the system to use shared memory locks. The switch from fast path to shared memory locking creates lightweight lock (LWLock) contention in the lock manager, affecting database performance even during read-only operations.
In this post, we explore how read-heavy workloads can trigger LWLock contention by exceeding fast path locking limits, an issue that can occur in any system based on the PostgreSQL engine and its locking mechanism. For demonstration purposes, we use Amazon Aurora PostgreSQL-Compatible Edition in this post. Through practical experiments, we demonstrate how partition scanning, index usage, and complex joins affect locking behavior. We also show how to identify when your workload is transitioning to slow path locking and implement specific techniques to optimize query design and schema structure for better performance.
As your read-heavy workload scales, database administrators (DBAs) need to monitor and address LWLock contention to keep your database healthy. Similarly, developers must avoid patterns that trigger excessive contention. When your database transitions from fast path to slow path locking, throughput can drop significantly—our tests in this post show up to a 34 percent performance difference. You can identify this throughput drop by monitoring wait events such as LWLock:lock_manager and examining the pg_locks view to see if the fast path slots are getting exhausted for backend processes running the workload. To address these bottlenecks, you need strategies such as effective partition pruning, careful index management, and simplified join patterns that keep your workload within PostgreSQL’s 16-slot fast path per backend process limit.
Understanding LWLock contention
LWLock is a synchronization primitive in PostgreSQL used to coordinate access to shared memory structures. Unlike heavyweight locks (used for user-driven database object-level locking), LWLocks are lightweight and optimized for high performance, providing low overhead while managing concurrent access to shared data.
LWLock contention occurs when multiple processes compete to acquire the same LWLock on lock data structure in shared memory, leading to delays. This contention typically happens when many backend processes need to access a heavily shared resource, such as:
- Buffer Manager – Protects shared buffers during read/write operations
- Lock Manager – Coordinates access to lock-related data structures
- WAL management – Synchronizes writes to the write-ahead log (WAL)
LWLock contention can increase as the number of concurrent database connections grows. This is especially true in high-throughput environments such that Aurora PostgreSQL-Compatible Edition is capable of supporting workloads that involve heavy parallelism, tables with many partitions, or tables with many indexes.
While executing a SQL query on a table, PostgreSQL tries to acquire a lock on the table and associated indexes. If this is a partitioned table, a lock is acquired on the table partitions to be accessed by a SQL query. To make locking faster and efficient, fast path locking is used for less restrictive weak locks (such as AccessShareLock
, RowShareLock
, or RowExclusiveLock)
when the system can quickly confirm there’s no conflict with other locks.
Fast path locking: Operational mechanics
In PostgreSQL, the fast path locking mechanism lets nonconflicting operations bypass the shared memory lock hash table and its associated LWLocks. Fast path locking is designed for the most common use case: frequent, concurrent queries that acquire weak locks as long as no conflicting strong locks exist on the same relation.
Here’s how fast path locking works:
- Per-session cache – Each backend process allocates up to 16 slots (
FP_LOCK_SLOTS_PER_BACKEND = 16
by default) in its privatePGPROC
structure to store fast path locks. - Quick fast path eligibility check – When you run
SELECT * FROM my_table;
, PostgreSQL grabs a tiny per-backend LWLock (MyProc->fpInfoLock
) to check if fast path locking mechanism can be used for current SQL query. It verifies that:- The lock mode is a qualified weak mode.
- No other session holds a conflicting lock.
- The backend hasn’t already used all 16 slots of local backend memory array.
- Local grant – If the above fast path eligibility check passes,
FastPathGrantRelationLock()
stores the lock in the backend’s local cache. No shared memory-based LWLock guarding the shared memory lock hash table is acquired, and the function returns immediately with success.
In practice, this means the first 16 unique tables (or indexes) a transaction touches incur almost zero lock manager overhead.
The fast path cache is small, and when you exceed 16 locks or request a stronger lock mode, PostgreSQL must fall back to the slow path:
- All requests for fast path locks in excess of 16 already acquired fast path locks will be migrated to the shared lock table by using
FastPathTransferRelationLocks()
. - The lock tag (which includes relation OID and lock mode) is hashed to one of 16 lock partitions of the shared memory lock hash table.
- PostgreSQL then acquires the partition LWLock (
LWLockAcquire(partitionLock, LW_EXCLUSIVE)
), updates the shared hash table, and releases the LWLock.
From that point on, additional lock acquisitions—from tables to indexes—go through the lock manager, producing LWLock:LockManager
wait events under concurrency.
By understanding the transition from fast path optimization to slow path contention, you can design queries and schemas that stay within the fast path limits, avoiding the lock manager bottleneck altogether.
Solution overview
The following sections detail LWLock contention through three controlled experiments:
- Observe locks on a partitioned table
- Observe locks on a nonpartitioned table with multiple unused or unnecessary indexes
- Observe locking behavior with a multi-join query
Each controlled experiment demonstrates schema setup, workload execution, lock monitoring through PostgreSQL system views, and analysis of concurrency impacts using pgbench. All controlled experiments were conducted on Aurora PostgreSQL-Compatible (compatible with PostgreSQL 16.6) by using db.r7g.4xlarge instances.
Prerequisites
Before you begin, make sure you have the following:
- An AWS account with access to Aurora PostgreSQL-Compatible database
- Access to the AWS Management Console
- An Amazon Elastic Compute Cloud (Amazon EC2) instance with connectivity to the Aurora PostgreSQL instance
- A PostgreSQL client (such as psql) installed on the Amazon EC2 instance
- pgbench installed on the Amazon EC2 instance
- AWS Identity and Access Management (IAM) permissions to create and manage Aurora PostgreSQL clusters
Controlled experiment 1: Observe locks on a partitioned table
In this experiment, we investigate PostgreSQL’s locking behavior when working with partitioned tables, focusing on three key test scenarios:
- First, we query all partitions of an
orders
table to observe how PostgreSQL handles locks across the entire partition set - Then, we examine a more targeted approach using partition pruning to access specific partitions
- Finally, we stress test both approaches under high concurrency using pgbench to simulate real-world workloads
Through these queries, we demonstrate how PostgreSQL’s fast path lock optimization works, what happens when fast path slots are exhausted, and how partition pruning can significantly improve performance under concurrent workloads. You use a partitioned orders
table where data is partitioned by month using the order_ts
timestamp column.
This experiment will reveal important insights about:
- How PostgreSQL manages locks even during read-only operations
- The impact of fast path vs. slow path locks
- How partition pruning can reduce lock contention
- Performance implications for high-concurrency environments
Schema preparation
Using your preferred PostgreSQL client (such as psql) connected to your Aurora PostgreSQL instance from your EC2 instance, create a partitioned orders
table with 12 monthly child partitions. Run the following SQL code:
Test 1: Query all partitions of the orders table to observe locking behavior
Now, start a transaction and query all partitions (unpruned partitions) of the partitioned orders
table. When we query without partition pruning, PostgreSQL must access every partition, significantly increasing lock overhead. To begin this test, open a new connection to the Aurora PostgreSQL database, and run the following commands (we’ll call this session 1):
The SQL statements will initiate a transaction scanning all 12 partitions. Keep the transaction open to preserve locks by not executing COMMIT
, ROLLBACK
or End
commands.
While the transaction in session 1 remains open, open a second session (we’ll call this session 2) and run the following SQL query to check state of locks in the database:
Notice that the fastpath
column with values t
(True) and f
(False) in the preceding output has f for the last 10 rows, and the total number of rows returned is 26. The value t means that 16 fast path slots have been exhausted, and the remaining partition/index AccessShareLock
have been migrated to a shared memory lock hash table (slow path). When the transaction is complete, these locks will be released.
Test 2: Query specific partitions of the orders table by using partition pruning to observe change in locking behavior
In session 1 of previous test, run a query that uses the partition pruning approach inside a new transaction as shown below. If you continue to touch more partitions, a subsequent number of fast path locks will be acquired.
While the transaction in session 1 remains open, check the state of locks in the database by executing following SQL statement in session 2 created in previous test:
The output shows that all the locks are fast path locks denoted by value t
(True) for the column fastpath
, removing the need for acquiring slow path locks.
This demonstrates fast path optimization but doesn’t explain scenarios involving concurrency, where each backend either exhausts its fast path slots or remains within the 16-slot limit. Let’s dive deep into this specific scenario. We’ll use pgbench to simulate a multiuser workload
Test 3.1: Query all partitions of the orders table under high concurrency to observe change in locking behavior
To simulate a high-concurrency read workload that accesses unpruned partitions, use the following pgbench command. This command continuously issues SELECT count(*) FROM orders
queries across multiple threads. In this test, we evaluate how PostgreSQL’s fast path lock optimization behaves under high concurrency when transactions exhaust fast path slots, forcing lock acquisition through the main lock manager (triggering LWLock:LockManager
waits).
In pgbench
, the -c
and -j
options are used to control the concurrency and parallelism of the benchmark workload. The -c
option specifies the number of concurrent clients, meaning how many simulated user sessions or database connections will be active at the same time. This number determines the level of load applied to the PostgreSQL database. The -j
option defines the number of worker threads that pgbench
uses to manage these client connections. Each thread handles a subset of the total clients, and the workload is evenly distributed across threads, allowing pgbench
to better use multicore systems and avoid bottlenecks on the client side.
To run pgbench command without entering credentials at runtime, you can set the following environment variables: PGHOST
for the Aurora cluster endpoint, PGPORT
for the port number (such as, 5432), PGDATABASE
for the database name (such as, postgres), PGUSER
for the database user, and PGPASSWORD
for the database user password.
The preceding query simulates 100 concurrent clients (-c 100
) running transactions defined in transaction.sql for 15 minutes or 900 seconds (-T 900
).
The transaction.sql
file contains the following SQL along with setting the search path to experiment_1
schema:
Execute the following pgbench command from your session 1 terminal from previous test. This command will complete the test in 15 minutes. While this command is running, you can monitor database wait events from Amazon CloudWatch Database Insights.
In this workload, the average transactions per second (tps) achieved was 46,672, and 42 million transactions were processed in 15 minutes of test time.
The following screenshot from CloudWatch Database Insights shows the database experiencing high load, with active sessions exceeding the instance’s CPU capacity and significant lock contention.
The preceding screenshot shows that the Aurora PostgreSQL 16.6 cluster running on a db.r7g.4xlarge instance has elevated database load driven by both CPU utilization and lock contention. The average active sessions (AAS) sustained at approximately 21—higher than the instance’s 16 vCPU capacity. Although 66 percent of the load was attributable to CPU utilization, a significant 34 percent was spent waiting on LWLock:LockManager
, indicating contention for internal PostgreSQL lock structures.
Next, we’ll use partition pruning method and evaluate the performance.
Test 3.2: Query specific partitions of the orders table by using partition pruning under high concurrency
To contrast with our previous test using unpruned partitions, we now demonstrate how partition pruning can significantly improve performance. For this workload, the transaction.sql
file, used for this controlled experiment, contains PL/pgSQL, which is used instead of simple SQL queries to provide efficient partition pruning in PostgreSQL when dealing with partitioned tables and runtime-generated values. You can also use SQL as shown in the following query, but the filter on order_ts
is derived from a randomly generated date within a Common Table Expression (CTE), which means the query planner which creates an optimal execution plan, can’t determine order_ts
value at query planning time. As a result, PostgreSQL must consider all partitions, leading to unnecessary locking and scanning of every partition. However, by switching to a PL/pgSQL block that computes the random date and constructs the query dynamically by using EXECUTE
, the actual date value is injected directly into the SQL string. This transforms the filter into a constant from the query planner’s perspective, enabling effective partition pruning and making sure only the relevant partition is accessed and locked.
Following is the CTE based SQL query described above which can lock all partitions:
Use the following SQL using the PL/pgSQL approach to use effective partition pruning as described above:
Follow the same steps as in the previous test and run the pgbench command from the session 1 terminal. This command will complete the test in 15 minutes, and you can monitor database wait events from CloudWatch Database Insights while the command is executing.
As shown in the pgbench output above, the workload achieved an average transactions per second of 59,255 and approximately 53.3 million transactions were processed in 15 minutes of test duration. In the absence of lock contentions, the system processed 11 million additional transactions.
The following screenshot from CloudWatch Database Insights shows improved database performance after implementing partition pruning, with stable load patterns and without any lock contention.
With the introduction of partition pruning in the preceding workload, the performance of the Aurora PostgreSQL 16.6 cluster has significantly improved. Previously, the workload was characterized by high LWLock:LockManager
wait events, consuming nearly 34 percent of database load alongside CPU utilization.
In contrast, the current workload performance shows a well-balanced workload: Average active sessions (AAS) hovers comfortably under the Max vCPU threshold, and waits are minimal—only a small fraction of Timeout:SpinDelay
is observed. CPU is now the dominant contributor to load (as expected in well-optimized OLTP systems), and lock contention has been greatly reduced. This implies that partition pruning successfully reduced the number of locks acquired, allowing each session to access only the relevant partition and limited per-session fast path locks, and drastically improving concurrency. With partition pruning, the AAS remained under the maximum vCPU threshold.
Controlled experiment 2: Observe locks on a nonpartitioned table with multiple unused or unnecessary indexes
In this experiment, we examine how PostgreSQL handles locking behavior on a nonpartitioned table with multiple B-tree indexes. We are using nonpartiotioned table here to keep focus on impact of having unused or unnecessary indexes. Using an items
table that represents an ecommerce or inventory system, we’ll explore two key test scenarios:
- First, we perform a simple query using index-only scan to observe:
- How PostgreSQL manages locks across multiple unused or unnecessary indexes
- What happens when fast path slots are exhausted
- The impact of having 20 B-tree indexes on lock acquisition
- Then, we stress test the system under high concurrency to demonstrate:
- How excessive indexes affect lock manager behavior
- The performance impact of lock contention with multiple indexes
- The relationship between index count and
LWLock:LockManager
waits
Through these tests, we reveal important insights about index-related lock overhead and provide practical guidance for index management in high-concurrency environments.
Schema preparation
Use the following SQL code to create the items table schema and its associated indexes in your Aurora PostgreSQL database:
The SQL code above creates an ecommerce items
table with 26 columns for product details (such as SKU
, price
, and inventory
) and 20 B-tree indexes on commonly queried columns and column combinations.
Test 1: Query a nonpartitioned table with multiple indexes
To begin our examination of how excess indexes impact PostgreSQL’s locking behavior, let’s perform our first test by querying the items
table. We’ll run a simple name lookup query to observe how PostgreSQL handles lock management across all 20 indexes, even though most of them are unnecessary for this query. This will help us understand the baseline locking overhead created by maintaining excessive indexes. Start a transaction and query specific columns of the items
table by using the index access path:
For the preceding SQL query, the query planner chose the index-only scan path. We are selecting only the name
column from the items
table. Now, in another session, you’ll observe locking behavior.
After executing the SQL query in another session, notice following in the output below, the indexes which acquired an AccessShareLock
apart from the items
table, primary key, and index (idx_items_name
) used by the planner for query execution. The total number of rows returned is 22, fast path slots are exhausted, and six locks are moved to a shared memory lock table. If we had more indexes on this table, those indexes would require AccessShareLock
as well and would be placed in a shared memory lock table because fast path slots are exhausted. During high concurrency workloads on this table, performance would degrade because of contention created on the shared memory lock table.
Test 2: Query a nonpartitioned table with multiple indexes under high concurrency
To understand how these unnecessary indexes affect performance at scale, let’s stress test our nonpartitioned items
table under high concurrency. Like our earlier experiment with partitioned tables (in Controlled experiment 1), we use pgbench to simulate multiple concurrent users accessing the table simultaneously.
Using the same pgbench command from our first experiment, run the following from your session 1 terminal. This test will run for five minutes. While the test is running, you can monitor database wait events from CloudWatch Database Insights.
The transaction.sql
file contains the same name lookup SQL query for items
table as the previous test above.
The following screenshot from CloudWatch Database Insights illustrates how excessive indexes on the items
table create significant LWLock:LockManager
waits, resulting in increased database load and CPU utilization.
The LWLock:LockManager
waits observed here are primarily driven by the excessive number of indexes on the items
table. Even with no data, PostgreSQL incurs overhead during query planning and execution because it must examine all 20 indexes, acquire related locks, and access catalog metadata. Due to high concurrency, high number of locks were involved, database sessions exhausted fast path locks, forcing backend processes to fall back to the main lock manager, which introduced additional contention. This resulted in increased CPU usage caused by repeated catalog scans and elevated database load from lock acquisition overhead. Reducing the number of unnecessary indexes would not only decrease query planning complexity but also help preserve fast path locking, improving system efficiency under high concurrency workloads.
Controlled experiment 3: Observe locking behavior with a multi-join query
In this experiment, we investigate how PostgreSQL manages locks when executing complex queries across multiple related tables. We’ll use a realistic ecommerce database schema to explore two key test scenarios:
- First, we examine a single multi-join query that:
- Retrieves a user’s cart contents, item prices, order status, and payment details in a single read-only operation.
- Connects six interrelated tables (
users
,carts
,cart_items
,items
,orders
, andpayments
). - Demonstrates how PostgreSQL handles locks across multiple tables and their indexes.
This query provides an opportunity to observe how multiple table joins influence the cumulative lock footprint. Because each table has its own primary key and foreign key indexes, PostgreSQL can potentially use fast path locks for these simple reads, avoiding the overhead of acquiring entries in the shared memory lock table.
- Then, we stress test the system under high concurrency to:
- Observe how multiple sessions executing complex joins affect lock management.
- Measure the performance impact of lock acquisition across multiple indexed tables.
- Demonstrate how join complexity influences both CPU utilization and lock contention.
Through these tests, we reveal important insights about:
- Lock management in complex, interconnected table structures
- The relationship between table joins, indexes, and lock overhead
- Performance considerations for multi-join queries in high-concurrency environments
Schema preparation
Use the following SQL code to create the ecommerce schema, which includes six interconnected tables (users
, carts
, cart_items
, items
, orders
, and payments
) with their associated indexes and foreign key constraints. The schema is intentionally kept comprehensive to simulate a real-world ecommerce database, including multiple indexes per table and proper referential integrity constraints:
Test 1: Execute a multi-join query across multiple indexed tables
To begin our examination of how PostgreSQL handles locks during complex join operations, let’s execute our first test using a multi-join query that retrieves a user’s complete shopping cart information. This query demonstrates how PostgreSQL manages locks across multiple tables and their associated indexes. In session 1, start a transaction and execute this query:
While keeping the transaction above open in session 1, run the following query in session 2 to examine how PostgreSQL manages locks across all the tables and indexes involved in our multi-join query:
In the output above, we can see that PostgreSQL acquired a total of 39 locks. Looking at the fastpath
column, 23 rows show f (false), indicating these locks had to use the slower path through the shared memory lock table instead of the fast path. This demonstrates that even a seemingly simple nested join query can encounter significant lock contention when fast path slots are exhausted.
Test 2: Execute a multi-join query under high concurrency
To understand how these complex joins in the above SQL query perform at scale, let’s stress test our ecommerce schema under high concurrency. Like our previous experiments, we use pgbench to simulate multiple concurrent users executing our multi-join query simultaneously.
Using the same pgbench command from our earlier experiments (in Controlled experiments 1 and 2), run the following from your session 1 terminal. This test will run for five minutes:
The transaction.sql
file contains the same multi-join SQL query as in the previous test.While the test is running, you can monitor database wait events from CloudWatch Database Insights.
The following screenshot from CloudWatch Database Insights demonstrates how multi-join queries across heavily indexed tables create a dual performance impact, with LWLock:LockManager
contention consuming 20 percent of active sessions and CPU utilization approaching saturation at 80 percent.
The database load graph reveals significant LWLock:LockManager
contention (20 percent of AAS) and high CPU utilization (80 percent) because of multi-join queries operating across tables with several indexes. Each join forces PostgreSQL to obtain AccessShareLock
locks on indexes, exhausting fast path locks and falling back to the slower main lock manager. Repeated catalog scans during query planning drive CPU utilization near saturation. This lock manager bottleneck stems from the combined overhead of maintaining locks across multiple indexed tables during join planning. Reducing redundant indexes and simplifying join patterns will alleviate both the lock contention and CPU pressure visible in the workload.
Key mitigation strategies for managing PostgreSQL lock contention
Consider the following key mitigation strategies for reducing PostgreSQL lock contention:
- Optimized access to partitioned tables:
- Enable partition pruning – Use explicit date ranges (
WHERE order_ts BETWEEN X AND Y
) instead of full-table scans. Learn more about partition pruning from PostgreSQL documentation. - Avoid dynamic SQL without constants – Replace CTEs with PL/pgSQL blocks (as in this post’s first controlled experiment) to force pruning.
- Limit partition count – Reduce partitions where possible (for example, consider using quarterly partitions instead of monthly) to stay within fast path limits.
- Enable partition pruning – Use explicit date ranges (
- Index rationalization:
- Audit and remove unused indexes – Use
pg_stat_user_indexes
to identify low-usage indexes. - Consolidate redundant indexes – Replace single-column indexes with composite indexes (for example, (
category
,subcategory
,price
) instead of three separate indexes). - Avoid overindexing – Cap indexes per table (for example, less than or equal to 10) unless critical for OLTP.
- Audit and remove unused indexes – Use
- Schema design adjustments:
- Avoid unnecessary partitioning – Only partition tables exceeding 100 GB or with clear access patterns. For detailed guidance on when and how to implement table partitioning, see Improve performance and manageability of large PostgreSQL tables by migrating to partitioned tables on Amazon Aurora and Amazon RDS.
- Use covering indexes – Add
INCLUDE
columns to avoid table access (reduces relation locks). Learn more about implementing covering indexes and index-only scans in the PostgreSQL documentation. - Normalize high-concurrency tables – Split wide tables (for example,
items
) to reduce index sprawl.
- Query optimization:
- Simplify joins – Break multi-join queries into materialized views or staged queries. For implementing materialized views, see the PostgreSQL documentation
- Batch small reads – Combine small lookups (for example,
IN (...)
clauses) to reduce lock frequency.
- PostgreSQL tuning:
- Adjust max_locks_per_transaction – Increase (for example, from 256 to 512) if partitioning is unavoidable (monitor memory) and excess locks are moved to the shared-memory lock hash table.
- Monitor fast path usage – Track
pg_locks
to identify slots exhaustion.
Clean up
To avoid incurring future charges related to implementing the solution in this blog post, delete the resources you created:
- Delete the test schemas and tables you created in the controlled experiments.
- If you created a dedicated Aurora PostgreSQL cluster for testing, delete it.
- Remove associated snapshots, if you no longer need them.
Conclusion
LWLock contention in PostgreSQL read-heavy workloads stems from exceeding fast path locking limits, which is triggered by unpruned partitions, redundant indexes, and complex joins. The controlled experiments in this post demonstrated the following:
- Partition pruning reduced lock overhead, resulting in a 34 percent performance improvement (from 46,000 tps to 59,000 tps) by confining locks to fast path slots.
- Each unused index added lock pressure, forcing slow path fallback even in empty tables.
- Multi-join queries amplify contention, with 60 percent of locks spilling to slow path in tested scenarios.
By prioritizing partition-aware queries, rigorous index hygiene, and join simplification, teams can maintain fast path efficiency and provide linear read scalability on Amazon Aurora PostgreSQL and Amazon RDS for PostgreSQL. As databases grow, these optimizations become foundational to harnessing the elasticity of AWS without lock bottlenecks.
For more information about performance tuning and designing scalable PostgreSQL workloads on Aurora, see Essential concepts for Aurora PostgreSQL tuning.