Everything you don’t need to know about Amazon Aurora DSQL: Part 1 – Setting the scene

Launched in 2014, Amazon Aurora is a relational database engine that combines the speed and availability of high-end commercial databases with the simplicity and cost-effectiveness of open source databases, offering MySQL and PostgreSQL compatibility with enhanced performance. At re:Invent 2024, AWS further advanced its customer-centric innovation by announcing Amazon Aurora DSQL—a new serverless SQL database with a cloud-based design and optimized for transaction processing. In this blog post series I share technical details about Aurora DSQL, diving into the underlying components and the design choices made throughout the development of the service.

In this post, I dive deep into fundamental concepts that are important to comprehend the benefits of Aurora DSQL, its feature set, and its underlying components.

Amazon Aurora DSQL implements a multi-Region, active-active, distributed database engine, facilitating both read and write workloads in all Regions of a database cluster. This service is serverless and PostgreSQL-compatible, offering a subset of its feature set. You’re charged for the resources you consume and the amount of data they store.

Availability and durability in the cloud

Cloud services rely on critical measures such as availability and durability to ensure their reliability, performance, and trustworthiness. Availability refers to a system or data’s accessibility, often measured as a percentage of uptime. For instance, a database with 99.99% availability would experience less than an hour of downtime annually. Durability ensures long-term data preservation without loss or corruption. At Amazon Web Services (AWS), services are designed with both high availability and durability in mind so customers can focus on innovation rather than infrastructure management. Aurora DSQL offers a Service Level Agreement (SLA) of 99.999% availability for a multi-Region setup, which resembles an allowed downtime of 5 minutes and 16 seconds per year. It offers data durability by storing the data in multiple Regions, ensuring that no data is lost after a successful commit.

Although availability and durability are instrumental for cloud services, understanding the specific type of workload a database is designed to handle is equally important for optimal performance, which brings us to the distinction between online transaction processing (OLTP) and online analytical processing (OLAP) systems.

OLTP and OLAP

OLTP excels at handling high-volume, day-to-day transactions with rapid query processing. For example, an ecommerce platform processing customer orders would typically rely on an OLTP database. OLAP, conversely, is optimized for complex queries and data analysis, making it ideal for business intelligence (BI) applications where teams analyze vast datasets to derive actionable insights. Aurora DSQL is designed for OLTP workloads.

Beyond these foundational metrics of availability and durability, the success of a database system also depends heavily on how well it handles specific types of workloads, particularly in terms of OLTP and OLAP operations.

ACID

Atomicity, consistency, isolation, durability (ACID) is a set of properties a database must have to maintain data integrity and reliability in its transactions.

Atomicity ensures that a transaction is treated as a single unit, resulting in its complete success or failure.
Consistency implies that a transaction transitions the database from one valid state to another. This concept is distinct from other consistency models, such as read-after-write consistency, where “consistency” refers to the expectation of reading the data that the client has just written, not the database state itself.
Isolation ensures that concurrent transactions have the same effect as sequential execution.
Durability guarantees that committed changes aren’t lost.

Aurora DSQL provides interactive ACID-compliant transactions. “Interactive” means that transactions can be initiated, data requested, retrieved, and additional code executed. The practical implementation of these principles, particularly isolation, requires sophisticated concurrency control mechanisms.

Database isolation levels

Database isolation levels define the visibility of changes made by one transaction to other transactions. The standard isolation levels include:

Read uncommitted – Allows transactions to read data that hasn’t yet been committed.
Read committed – Ensures that a transaction only reads data that has been committed, potentially allowing non-repeatable reads and phantom reads. A phantom read happens when a transaction retrieves a set of rows based on a certain condition, but before the transaction is complete, another transaction inserts or deletes rows that affect that condition. For instance, if transaction A runs a query to count all orders placed today, then transaction B adds a new order and commits. If transaction A runs the same count query again, it will see this new “phantom” row.
Repeatable read – Guarantees that if a transaction reads a row, it will retrieve the same data on subsequent reads within the same transaction, preventing non-repeatable reads but still allowing phantom reads. This is because it holds locks on individual rows but doesn’t lock the “gaps” where new rows might be inserted.
Snapshot isolation – Allows transactions to view a consistent snapshot of the database as it was at the beginning of the transaction, preventing dirty reads, non-repeatable reads, and phantom reads without the overhead of full serializability. The concept behind it is that a transaction T commits if no writes have been accepted in T’s write set with timestamps between the start and the commit time of T. In other words, if there have been changes made to the rows that the current transaction also intends to write to, the transaction aborts.
Serializable – Provides the highest level of isolation by executing transactions one after another, preventing dirty reads, non-repeatable reads, and phantom reads. The concept behind it is that a transaction T commits if no writes have been accepted in T’s read set with timestamps between the start and the commit time of T. In other words, if the rows that the current transaction is reading have been modified, the transaction aborts.

Amazon Aurora DSQL is operating on snapshot isolation level, the explanations on a transaction-type level will follow in a later blog post.

Concurrency control

Modern databases often implement more sophisticated concurrency control mechanisms, with Multi-Version Concurrency Control (MVCC) being one of the most widely adopted solutions.

MVCC is a technique employed in databases to manage concurrent access and modifications of data. MVCC reduces the need for locks by creating distinct versions of data, allowing readers and writers to operate concurrently without blocking each other. In PostgreSQL, this is achieved by maintaining multiple versions of each row, each containing hidden system columns that record the transaction responsible for its creation (xmin) and deletion or update (xmax). When a transaction updates data, it creates a new version of the row while preserving the original version. This is done by marking the old version with an xmax value while the new row gets a new xmin (matching the xmax of the old version) and no xmax value is set for this new entry. When other transactions read the data, they view the version that was valid at their start time, determined by these transaction IDs, disregarding changes made by transactions still in progress. This facilitates simultaneous reading and writing without blocking, while a background process eventually removes obsolete row versions that are no longer accessible to any transaction.

Amazon Aurora DSQL also uses MVCC, which entails storing multiple data versions within the database.

Within MVCC implementations, databases employ different high-level approaches to handling concurrent modifications: pessimistic and optimistic locking schemes. Pessimistic concurrency control proactively locks resources to prevent conflicts, thereby preventing concurrent transactions from modifying data that has already been altered or locked within the current transaction. However, this approach can lead to performance bottlenecks due to excessive resource locking. Conversely, optimistic concurrency control assumes conflicts are infrequent and checks for them at transaction commit time. It comprises three phases:

All SQL statements from the client are processed, and all writes are logged locally within the transaction.
Upon client commit, the database evaluates the transaction’s reads and mutations to determine its commit ability. Two validation methods are possible: forward validation and backward validation.
1. Forward validation checks for conflicts with any currently in-flight transactions.
2. Backward validation checks for conflicts with previously committed transactions.
The changes are subsequently written to storage. If no conflicts are detected, changes are written to storage, otherwise, they’re aborted

Amazon Aurora DSQL and standard PostgreSQL both use MVCC but choose different approaches to locking. Amazon Aurora DSQL employs optimistic concurrency control with backward validation while standard PostgreSQL uses a pessimistic approach.

Conclusion

In this post, I explored fundamental database concepts and their application to Amazon Aurora DSQL, providing a preliminary overview of the service. Aurora DSQL offers a Service Level Agreement (SLA) of 99.999% availability for a multi-Region setup, provides interactive ACID-compliant transactions, operates on snapshot isolation level while using MVCC with optimistic concurrency control. In the subsequent post of this series, I will provide a comprehensive understanding of the service itself, its advantages and limitations, and the underlying architecture.