NEVO

Transactions & Concurrency Control

In database systems, transactions are fundamental units of work that consist of multiple operations executed as a single logical unit. Transactions ensure data consistency, integrity, and reliability by following the ACID properties (Atomicity, Consistency, Isolation, Durability).

What is a Transaction?

A transaction is a sequence of database operations (such as reads and writes) that are executed as a single unit of work. Transactions are used to ensure that multiple operations on the database are performed atomically, either entirely or not at all. The primary goal of transactions is to maintain data integrity and consistency.

Consider a banking application where transferring money from one account to another involves multiple steps like deducting the amount from one account and crediting it to another. These steps need to be executed as a single transaction to ensure that the money transfer is consistent and reliable.

ACID Properties of Transactions

Transactions in database systems adhere to the ACID properties, which define the characteristics of a reliable transaction:

1. Atomicity

Atomicity ensures that a transaction is treated as a single unit of work that either completes entirely or is rolled back entirely. If any part of the transaction fails, the entire transaction is aborted, and the database is left unchanged. For example, a bank transfering money from one account to another should either complete successfully or not at all.

2. Consistency

Consistency ensures that a transaction takes the database from one consistent state to another consistent state. It enforces data integrity rules and constraints, preventing the database from being left in an inconsistent state. For example, a transaction that updates a student's grade should ensure that the grade remains within a valid range.

3. Isolation

Isolation ensures that the intermediate state of a transaction is not visible to other transactions until the transaction is completed. It prevents interference between concurrent transactions, maintaining data integrity and consistency. For example, A transaction updating a product's price should not be affected by another transaction updating the same product's quantity.

4. Durability

Durability ensures that once a transaction is committed, its changes are permanent and persist even in the event of system failures. The changes made by a committed transaction are stored in non-volatile memory, ensuring data reliability. For example, a transaction updating a customer's address should be durable to prevent data loss.

States of a Transaction

Transactions in a database system go through different states during their lifecycle. Understanding these states is crucial for implementing transaction management and recovery mechanisms. The typical states of a transaction are:

1. Active

The transaction is in the active state when it is being executed. In this state, the transaction is performing read and write operations on the database.

2. Partially Committed

The transaction enters the partially committed state after executing its final operation but before it is committed. In this state, the transaction is waiting for a signal to commit or abort.

3. Committed

The transaction is in the committed state after it has successfully completed all its operations and has been permanently saved to the database. In this state, the changes made by the transaction are visible to other transactions.

4. Aborted

The transaction enters the aborted state if it encounters an error or is explicitly rolled back. In this state, the changes made by the transaction are undone, and the database is restored to its state before the transaction began.

5. Failed

The transaction is in the failed state if it cannot be completed due to an error or system failure. In this state, the transaction is terminated, and its changes are rolled back.

Concurrency Control in Database Systems

Concurrency control is the process of managing simultaneous access to the database by multiple transactions to ensure data consistency and integrity. In a multi-user environment, concurrent transactions can lead to issues like lost updates, uncommitted data, and inconsistent reads. Concurrency control mechanisms prevent these problems by coordinating the execution of transactions. For example, Multiple users updating the same bank account balance simultaneously should not result in incorrect balances.

Common Concurrency Control Techniques

Several techniques are used to manage concurrency in database systems. Some of the common concurrency control mechanisms include:

1. Locking

Locking is a technique that restricts access to data by acquiring locks on database objects. Locks prevent other transactions from modifying the data until the lock is released. There are different types of locks, such as shared locks and exclusive locks, to control read and write operations.

2. Two-Phase Locking (2PL)

Two-Phase Locking is a concurrency control protocol that ensures serializability by acquiring locks in two phases: a growing phase where locks are acquired and a shrinking phase where locks are released. This protocol prevents conflicts between transactions and ensures that transactions are executed in a serializable order.

3. Timestamp Ordering

Timestamp Ordering is a concurrency control technique that assigns a unique timestamp to each transaction based on its start time. Transactions are ordered based on their timestamps, and conflicts are resolved by comparing timestamps. This technique ensures that transactions are executed in a serializable order.

4. Optimistic Concurrency Control

Optimistic Concurrency Control is a technique that assumes conflicts between transactions are rare. Transactions are executed without acquiring locks, and conflicts are detected at the end of the transaction. If a conflict is detected, the transaction is rolled back and re-executed. This technique reduces lock contention and improves concurrency.

5. Multi-Version Concurrency Control (MVCC)

Multi-Version Concurrency Control is a technique that maintains multiple versions of data to allow concurrent read and write operations. Each transaction sees a consistent snapshot of the database based on its start time, ensuring that transactions do not interfere with each other. This technique is commonly used in databases like PostgreSQL and Oracle.

Problem Caused by Concurrency

Concurrent transactions can lead to several problems in a database system, including:

When two transactions update the same data concurrently, one transaction's changes may be lost if the other transaction commits first.
A transaction reads data that has been modified by another transaction but not yet committed, leading to inconsistent reads.
Transactions may interfere with each other, causing conflicts and violating the ACID properties of transactions.
Concurrent transactions may result in deadlocks, where transactions wait indefinitely for resources held by other transactions.
Inefficient use of system resources due to contention for locks and delays in transaction processing.

Special thanks to Prince Kumar Prasad for contributing to this guide on Nevo Code.