Database Transactions Notes | DBMS SPPU 2024

Introduction to Transaction Management

In modern Database Management Systems (DBMS), multiple users access and update data simultaneously. To ensure that the database remains accurate and reliable during these concurrent operations, the system uses a concept called Transaction Management.

Transaction Management is the core mechanism that guarantees data consistency, especially in environments where hardware failures, software crashes, or network interruptions can occur. This comprehensive guide covers the fundamental concepts of transactions, the strict rules they must follow (ACID properties), how multiple transactions are scheduled, and the methods used to control concurrent executions.

Basic Concept of a Transaction

Definition

A transaction in a DBMS is a logical unit of work that contains one or more SQL operations (such as Read, Write, Update, or Delete). A transaction must be executed entirely, or not at all. It transforms the database from one consistent state to another consistent state.

Real-Life Example

Consider a banking system where Person A wants to transfer 1000 rupees to Person B. This transfer process is a single transaction, but it involves multiple database operations:

Read the account balance of A.
Deduct 1000 rupees from A's account.
Write the new balance back to A's account.
Read the account balance of B.
Add 1000 rupees to B's account.
Write the new balance back to B's account.

If the system crashes after step 3, Person A will lose 1000 rupees, but Person B will never receive it. To prevent this, the DBMS treats all six steps as a single transaction. Either all six steps complete successfully, or none of them do.

Hinglish Explanation: Transaction ka matlab ek poora task hota hai jismein chote-chote operations hote hain. Jaise bank se paise bhejna. System yeh dhyan rakhta hai ki ya toh poore paise transfer ho jayein, ya fir transaction cancel ho jaye aur paise wapas aa jayein. Beech mein process nahi rukna chahiye.

Transaction Operations

The core operations performed inside a transaction are:

Read(X): Reads the value of data item X from the database and stores it in a temporary memory variable.
Write(X): Writes the modified value of data item X from the temporary memory back into the database.
Commit: A signal that the transaction has finished its execution successfully. The changes are permanently saved in the database.
Abort / Rollback: A signal that an error has occurred. The transaction is cancelled, and all partial changes made to the database are undone.

Properties of Transactions: The ACID Properties

To maintain the integrity of the database, every transaction must strictly follow four fundamental properties, collectively known as the ACID properties.

1. Atomicity

Atomicity is the "all-or-nothing" rule. It states that a transaction is an indivisible unit of work. Either all the operations of the transaction are executed and saved into the database, or none of them are. There is no concept of a partially completed transaction.

Responsibility: The Transaction Management component of the DBMS is responsible for ensuring Atomicity.

2. Consistency

Consistency ensures that a transaction takes the database from one valid state to another valid state. The data must strictly follow all predefined rules, constraints, and triggers before and after the transaction.

Responsibility: The Application Programmer and the DBMS Integrity constraints are responsible for ensuring Consistency.

3. Isolation

In a multi-user system, hundreds of transactions run concurrently. Isolation ensures that multiple transactions can execute at the same time without interfering with each other. The intermediate state of a transaction must remain completely invisible to other transactions until the original transaction is committed.

Responsibility: The Concurrency Control Manager is responsible for ensuring Isolation.

4. Durability

Durability guarantees that once a transaction completes successfully and commits, the changes made to the database become permanent. Even if a system failure, power loss, or crash occurs immediately after the commit, the data will survive and be available when the system restarts.

Responsibility: The Recovery Manager is responsible for ensuring Durability.

Hinglish Explanation: ACID properties database ki safety ke chaar pillars hain. Atomicity (poora kaam hoga ya bilkul nahi), Consistency (data ke rules break nahi honge), Isolation (ek transaction doosre ko disturb nahi karega), aur Durability (ek baar data save ho gaya, toh system crash hone par bhi delete nahi hoga).

Concept of Schedule in DBMS

When multiple transactions are executed concurrently in a database, the system must decide the exact order in which the operations of these transactions will be executed by the CPU. This execution sequence is called a Schedule.

Definition

A Schedule (or history) is a chronological sequence of operations (Read, Write, Commit, Abort) from one or more transactions. It preserves the order of operations within each individual transaction.

Types of Schedules

1. Serial Schedule

In a serial schedule, transactions are executed completely one after the other. A new transaction does not begin until the currently running transaction has finished (committed or aborted).

Features: There is absolutely no overlapping or interleaving of operations.
Advantages: It is highly secure and guarantees database consistency.
Disadvantages: It is extremely inefficient. The CPU remains idle while waiting for disk input/output, leading to poor system performance and long waiting times for users.

2. Non-Serial (Concurrent) Schedule

In a non-serial schedule, the operations of multiple transactions are interleaved (mixed together) to utilize the CPU efficiently.

Features: Multiple transactions progress at the same time.
Advantages: Maximizes resource utilization, reduces waiting time, and increases system throughput.
Disadvantages: If not controlled properly, it can lead to data inconsistency and errors like the lost update problem.

Hinglish Explanation: Schedule ek time-table ki tarah hai jo batata hai ki kaunsa read ya write operation pehle execute hoga. Serial schedule matlab line mein ek ke baad ek kaam karna. Non-serial matlab sabka kaam mix karke ek sath chalana taaki samay bache.

Serializability: Conflict and View

Because non-serial schedules are fast but risky, database systems use a concept called "Serializability" to ensure safety.

What is Serializability?

Serializability is a theoretical concept used to check if a non-serial schedule is correct. A non-serial schedule is considered "Serializable" if its final outcome is exactly the same as the outcome of a purely Serial schedule. If a schedule is serializable, it is guaranteed to leave the database in a consistent state.

There are two primary ways to test for serializability: Conflict Serializability and View Serializability.

1. Conflict Serializability

This method focuses on the order of conflicting operations. Two operations from different transactions are said to be in conflict if they satisfy all three of these conditions:

They belong to two different transactions.
They operate on the exact same data item.
At least one of the operations is a Write operation.

Types of Conflicts:

Read-Write (R-W) Conflict: T1 reads X, and T2 writes X.
Write-Read (W-R) Conflict: T1 writes X, and T2 reads X. (Also known as Dirty Read).
Write-Write (W-W) Conflict: T1 writes X, and T2 writes X. (Also known as Blind Write).

Conflict Serializable Schedule:

A schedule is conflict serializable if we can transform it into a serial schedule by swapping non-conflicting operations. To test this mathematically, database systems construct a Precedence Graph (Serialization Graph). If the graph contains no cycles (loops), the schedule is Conflict Serializable.

2. View Serializability

View serializability is a slightly less restrictive method than conflict serializability. Two schedules are considered "View Equivalent" if they present the same "view" of the data to the transactions.

Conditions for View Equivalence:

For two schedules (S1 and S2) to be view equivalent, they must meet three conditions for every data item:

Initial Read: If T1 reads the initial value of X in S1, then T1 must also read the initial value of X in S2.
Updated Read: If T1 reads a value of X written by T2 in S1, then T1 must also read the value written by T2 in S2.
Final Write: If T1 performs the final write operation on X in S1, then T1 must also perform the final write operation on X in S2.

If a schedule is view equivalent to any serial schedule, it is called a View Serializable Schedule. Every conflict serializable schedule is view serializable, but a view serializable schedule may not always be conflict serializable.

Hinglish Explanation: Serializability ek testing method hai. Iska kaam ye check karna hai ki agar hum multiple transactions ko ek sath (mix karke) run karte hain, toh kya unka final result waisa hi aayega jaisa unhe ek-ek karke run karne par aata? Agar haan, toh wo schedule safe aur serializable hai.

Cascaded Aborts, Recoverable and Non-recoverable Schedules

When transactions run concurrently, the failure of one transaction can directly impact others. Based on how they handle failures, schedules are classified into different types.

1. Non-Recoverable Schedules

A schedule is non-recoverable if a transaction T2 reads data written by another transaction T1, and T2 commits before T1 commits.

The Problem: If T1 later fails and aborts, the value read by T2 becomes invalid. But since T2 has already committed, the database cannot undo T2's actions. The database is now permanently in an inconsistent state.

2. Recoverable Schedules

A schedule is recoverable if, whenever a transaction T2 reads data written by T1, the commit operation of T1 appears before the commit operation of T2.

Advantage: If T1 fails and rolls back, the system can also roll back T2 before it commits, saving the database from corruption.

3. Cascading Rollbacks (Cascaded Aborts)

Even in recoverable schedules, a problem can occur called a cascading rollback. This happens when the failure of a single transaction causes a chain reaction, forcing multiple dependent transactions to abort and roll back.

Scenario: T1 writes X. T2 reads X and writes Y. T3 reads Y. If T1 aborts, the system must abort T2 (because it read T1's uncommitted data), and then it must abort T3 (because it read T2's data).
Disadvantage: This wastes a massive amount of CPU time and system resources because completed work has to be undone.

Cascadeless Schedules

To prevent cascaded aborts, DBMS uses Cascadeless Schedules. In a cascadeless schedule, a transaction is only allowed to read data that has already been committed by other transactions. It strictly prevents reading uncommitted data.

Hinglish Explanation: Recoverable schedule ka rule simple hai: Jo transaction data de raha hai, usko pehle commit hona chahiye. Agar T1 ne T2 ko data diya, toh T1 pehle save (commit) hoga. Cascaded abort ka matlab hai ek transaction fail hone ki wajah se baaki aage wale saare dependent transactions ka bhi fail ho jana (chain reaction).

Concurrency Control: Need and Importance

What is Concurrency?

Concurrency means executing multiple transactions at the same physical time using an interleaved approach.

Need for Concurrency Control

While concurrent execution is highly desirable for system performance, it can lead to severe database anomalies if multiple transactions attempt to update the same data simultaneously. Concurrency Control is the management of these parallel operations.

We need concurrency control to prevent the following major problems:

Lost Update Problem (Write-Write Conflict):

Occurs when two transactions read the same data, modify it, and write it back. The second transaction overwrites the changes made by the first, causing the first transaction's update to be entirely lost.

Dirty Read Problem (Write-Read Conflict):

Occurs when a transaction reads data that has been updated by another transaction that has not yet committed. If the first transaction aborts, the data read by the second transaction becomes completely invalid (dirty).

Unrepeatable Read Problem:

Occurs when a transaction reads the same data item twice during its execution, but gets different values because another transaction modified the data between the two reads.

Phantom Read Problem:

Occurs when a transaction executes a query to read a set of rows, but another transaction inserts or deletes rows matching that query before the first transaction finishes.

Locking Methods for Concurrency Control

To control concurrency and solve the problems mentioned above, database systems use Locking Protocols. A lock is a variable associated with a data item that controls whether multiple transactions can access the data at the same time.

Types of Locks

1. Shared Lock (S-Lock) / Read Lock

If a transaction places a Shared Lock on a data item, it is only allowed to Read that data. It cannot update it.

Key Rule: Multiple transactions can hold a Shared Lock on the same data item simultaneously. Reading data does not cause conflicts.

2. Exclusive Lock (X-Lock) / Write Lock

If a transaction places an Exclusive Lock on a data item, it can both Read and Write (update) that data.

Key Rule: If an Exclusive Lock is placed on a data item by one transaction, no other transaction can place any lock (Shared or Exclusive) on that data item until the first transaction releases the lock.

Hinglish Explanation: Locking mechanism ek security guard ki tarah kaam karta hai. 'Shared Lock' ka matlab hai ki sab log data dekh sakte hain, par change koi nahi karega. 'Exclusive lock' ka matlab hai ki sirf ek hi user data ko dekh aur change kar sakta hai, baaki sabko wait karna padega.

Two-Phase Locking Protocol (2PL)

The Two-Phase Locking protocol is the most widely used locking method in DBMS. It guarantees conflict serializability by ensuring that transactions acquire and release locks in two distinct phases.

Phase 1: Growing Phase (Lock Acquisition)

In this phase, a transaction can acquire new locks (both shared and exclusive) on data items as it needs them.
However, the transaction is strictly prohibited from releasing any locks during this phase.

Lock Point:

The exact moment when a transaction has acquired all the locks it will ever need is called the Lock Point. It marks the end of the growing phase.

Phase 2: Shrinking Phase (Lock Release)

In this phase, the transaction begins to release the locks it acquired previously.
Once a transaction releases its first lock, it enters the shrinking phase. It is strictly prohibited from acquiring any new locks.

Advantages of 2PL:

It guarantees that all schedules produced will be conflict serializable, maintaining complete data consistency.

Disadvantages of 2PL:

It does not prevent deadlocks. A deadlock can occur if two transactions are indefinitely waiting for each other to release locks.

Comparison Tables

Serial vs. Non-Serial Schedule

Feature	Serial Schedule	Non-Serial Schedule
Execution Style	One transaction executes entirely before the next begins.	Operations of multiple transactions are interleaved.
CPU Utilization	Very poor; CPU sits idle during I/O operations.	Excellent; keeps the CPU busy continuously.
Throughput	Low throughput.	High throughput.
Consistency	Always guarantees database consistency naturally.	Requires Concurrency Control to guarantee consistency.
Waiting Time	High waiting time for subsequent transactions.	Lower waiting time for users.

Conflict vs. View Serializability

Feature	Conflict Serializability	View Serializability
Focus	Focuses strictly on the order of conflicting Read/Write operations.	Focuses on ensuring both schedules read and write the same values.
Strictness	Highly strict.	Less strict than Conflict Serializability.
Checking Difficulty	Easy to check using a Precedence Graph.	NP-Complete problem; mathematically very difficult to verify in large systems.
Relationship	All conflict serializable schedules are also view serializable.	View serializable schedules are not always conflict serializable (e.g., in the presence of Blind Writes).

Important Points to Remember

A transaction is a logical unit of work; it is not a physical file. It must transition the database between consistent states.
ACID stands for Atomicity, Consistency, Isolation, and Durability.
The primary goal of Transaction Management is to allow concurrent access to data while maintaining the illusion that each user is accessing the database alone.
Serializability is the benchmark for correctness in concurrent execution.
Cascadeless schedules save system resources by strictly reading only committed data.
Two-Phase Locking (2PL) is the standard method for guaranteeing serializability, working through distinct growing and shrinking phases.

Short Conclusion

Transaction Management and Concurrency Control are the backbone of any robust Database Management System. By strictly enforcing the ACID properties and utilizing advanced scheduling and locking mechanisms like Two-Phase Locking, a DBMS ensures that massive amounts of data can be processed rapidly and safely by thousands of users simultaneously. Understanding these concepts is essential for designing systems that are both highly performant and secure against data corruption.

SEO Keywords Section

Search keywords related to this topic:

Basic concept of transaction in DBMS, DBMS Transaction Management, ACID properties of database transactions, Atomicity Consistency Isolation Durability, DBMS schedule concept, Serial schedule vs non serial schedule, Serializability in DBMS, Conflict serializability and View serializability difference, Cascaded aborts cascading rollbacks, Recoverable and non recoverable schedules, Need for concurrency control in database, Lost update problem dirty read problem, Locking methods in concurrency control, Shared lock and exclusive lock, Two phase locking protocol 2PL, DBMS engineering notes, beginner guide to transaction management, database locking mechanisms explained.

Unit 4 – Database Transactions

Introduction to Transaction Management

Comments (0)