Chapter 5 – Transaction Processing and Management – Introduction to Database Management Systems

Chapter 5






Transaction Processing and Management

Transactions make processing of complex logical operations reliable. Without them, it is hard to imagine serious business applications functioning correctly.

Chapter Highlights

  • Transaction
  • Recovery Principles
  • Transaction Models
  • Two-phase Commit Protocol
  • Concurrency Issues, Locking and Two-phase Locking

5.1 TRANSACTION

5.1.1 Transactions - Need and Mechanisms

We have discussed the concept of transaction earlier. It is now time to take a look at the same in greater detail. We define a transaction as follows.

A transaction is a logical unit of work.

Let us understand the concept of transaction with a simple example. We had earlier taken the case of funds transfer from one account to another to illustrate the concept. Let us now assume that we have a database that tracks the number of items that a firm has sold so far. There are two tables in this database: Sales and Total. The Sales table contains the item code and quantity sold. The Total table contains the item code and the total quantity of that item sold so far. Of course, immediate questions arise, such as: (a) Is the Total table itself not redundant? (b) Does it not violate the normalisation process? Well, for the time being, we shall ignore such questions and focus on understanding the idea behind transactions.

Imagine that three units of a product, with code P5, have just been sold. Now, we need to update the Sales and Total tables with this information. The following two simple SQL queries should accomplish this task:

INSERT INTO Sales (Item_code, Quantity)

(‘P5’, 3)

UPDATE Total

SET Quantity = Quantity + 3

WHERE Item_code = ‘P5’

At this stage, it is important to clarify that we are considering SQL (and because of that, RDBMS) just as an example. The concept of transactions is not at all restricted to RDBMS. It is a generic concept which covers all kinds of databases. However, because our focus in this book is RDBMS in general, we will discuss transactions with respect to RDBMS.

As already known, there would be an application program which would execute the above two queries. There may be a number of instructions before, in-between, and after these two SQL statements. This idea is shown in Fig. 5.1.

Let us imagine that after the execution of the INSERT statement, but before the execution of the UPDATE statement a problem such as crashing of the hard disk, power failure or simply an arithmetic overflow occurs. What would be the result? Obviously the Sales table would get updated, but the Total table would not. This would mean that the total units sold for item P5 would show an incorrect figure as it would not consider the latest sales transaction. (We reiterate that this is bad database design and contrived example. A better mechanism is to get rid of the Total table and use an aggregate function such as SUM to find out the total units of an item sold from the Sales table itself).

Fig. 5.1 Program to update Sales and Total tables without a transaction

Any program must guard itself against such situations. It is the concept of transaction that helps a program in achieving this objective. We can make a program transaction-enabled, which means that it will execute as a logical unit of work. Thus preventing the database from reaching an inconsistent state. With the use of transactions, a database can always attain a correct state. Let us write the same program such that it now uses the concept of transaction. Fig.5.2 shows the modified program.

How does a transaction-enabled program differ from the earlier one? The changes are summarised below.

  1. The program now starts with a BEGIN TRANSACTION statement. This statement informs the database manager that the program involves a transaction, and that the transaction should take effect immediately. We shall soon study the implications of this declaration.
  2. Following the INSERT statement, we now have an IF statement. This statement checks to see if the INSERT operation was successful. If it fails for any reason, the program passes control to a paragraph/section named Trans-Error. However, in the case of success, the program passes through and executes the next statement, which is UPDATE.
  3. After the UPDATE statement, we have inserted a similar IF statement. In the case of errors, the program jumps to Trans-Error, as before. Otherwise, it jumps to another paragraph/section named Trans-Success.
  4. In Trans-Error, we perform a ROLLBACK operation and terminate the program. On the other hand, in Trans-Success, we perform a COMMIT and end the program.

Transaction is often an overused and a misunderstood term in computer technology. In daily life, we refer to any kind of operation as a transaction. For example, if we lend money to someone, we call it a transaction. In technical terms, a transaction must be a logical and meaningful unit of work which must be done completely or not at all.

Fig. 5.2 Transaction-enabled program to update Sales and Total tables

Of course, there are better ways to write a program to make it more structured. More specifically, we can get rid of the GO TO statements by using nested IF-ELSE or other constructs. However, we shall ignore these for the sake of simplicity.

The important aspect of the above process is that only when both the SQL statements (i.e. INSERT and UPDATE), that update the database, complete successfully do we perform a COMMIT. If any one of them fails, we immediately perform a ROLLBACK.

In other words, we now treat the INSERT and UPDATE operations as one logical unit of work. This is precisely what we wanted to achieve. Thus, creating a sales record is no longer a two-step process for the external world. It is actually an atomic operation. Internally, the DBMS manages this two-step process in a transaction-enabled environment to achieve the state of atomicity (or consistency).

One very interesting fact is often overlooked. Do we realise that the database is actually not consistent for a temporary period? Note that if the INSERT statement is successful before the UPDATE statement, the database is not consistent. In this case, we have updated just one table and not the other. Therefore, the database is not accurate. However, the transaction hides this temporary inconsistency and provides an abstraction that makes the overall operation atomic and consistent. It is another matter that within a transaction, the database could temporarily attain an inconsistent state. The idea is shown in Fig5.3.

Fig. 5.3 Transaction hides the internal inconsistencies from the external world

A transaction is not a single database operation but a sequence of such operations. What is significant, though, is that it transforms the database from one consistent state to another. The initial consistent state of a database is indicated by the BEGIN TRANSACTION declaration, and the final consistent state by the END TRANSACTION (or COMMIT/ROLLBACK) operation. As we have mentioned earlier, any inconsistent state that the database attains during the execution of the transaction is not known to the outside world. The transaction hides them.

5.1.2 Transaction Processing (TP) Monitor

Various types of errors can occur during the execution of a program, such as power failure, nonnumeric data error, memory error and so on. Therefore, we cannot expect that things would not go wrong and therefore, have to prepare for such situations. In fact, we need to assume that things might go wrong at times, and provide for such problems. A Transaction Processing (TP) system (also called Transaction Processing monitor (TP monitor)), provides for the next best option. The TP monitor is a special program that oversees transaction processing in a DBMS. It allows the system to avoid or recover from problematic situations. It ensures an all-or-none operation. Thus, the TP monitor ensures that while a transaction may consist of many steps, it is a single atomic operation as far as the outside world's perspective is concerned.

When it sees a COMMIT statement, the TP monitor informs the DBMS that the database is in a consistent state In effect, it asks the DBMS to make the updates made during the lifetime of the transaction permanent on the physical database. However, when it encounters a ROLLBACK statement, it realises that there is some error. Hence, it asks the DBMS to rollback or undo all the changes made as a part of the transaction. This concept is shown in Fig. 5.4. Here we must clarify one technical detail. When we say that the DBMS makes changes permanent to the database, we do not imply that the actual tables are updated immediately. Instead, what we mean is that the DBMS guarantees that the updates would not be lost. Internally, it may keep the data in its buffers and provide other mechanisms (discussed later) to ensure this.

Fig. 5.4 Effect of Commit and Rollback on the database

Although it is fine to know about the COMMIT and ROLLBACK operations conceptually, it is also important to understand how they are achieved in real life. A system log or journal is used for this purpose. This is a file that contains the details of all modifications to data (i.e. INSERT, UPDATE and DELETE statements), and the before-and-after images of the objects being updated. We shall soon study this in more detail.

For the time being, we should note that whenever there is a ROLLBACK operation, the system restores the corresponding log entry. In other words, it restores an object to its last consistent state.

Another important point that must be stressed is that although a transaction makes a series of database operations atomic, individual statements must be atomic in the first place! Thus if we put two UPDATE statements inside a transaction, the TP monitor guarantees that the result of these two statements is atomic. However, it does not give any assurance regarding the atomicity of these two statements themselves. For example, consider the following two statements.

UPDATE Employee

SET Salary = Salary + (Salary * 0.10)

WHERE Department = ‘Projects’

UPDATE Projects

SET Cost = Cost + 10000

WHERE Department = ‘Projects’

The first UPDATE statement increments the salaries of the people working in the Projects department by 10 percent. The second UPDATE statement increases the cost incurred in this department by Rs. 10000. If we embed these two statements inside a transaction, we will be sure that both of them will execute successfully, or none of them will execute at all. However, what we have to note here is that the TP monitor does not specify anything about the atomicity of the individual UPDATE statements themselves! In other words, if there are 10 records in the Employee table with Department = ‘Projects’, it may so happen that after the first five of the 10 rows are updated with the new salary figure, some problem occurs. Because of this, the updates to the remaining five rows do not get implemented. In other words, the first five rows are successfully updated, but not the remaining five! This is shown in Fig. 5.5.

Fig. 5.5 Atomicity of a single update operation

TP monitors do not provide any guarantees regarding such atomicity. Ensuring that the portions of an individual statement are atomic is the responsibility of DBMS. It must ensure that one statement is executed in its entirety or not at all. It cannot allow a statement to make partial updates successful and the remaining ones unsuccessful. Note that we are referring to the INSERT, UPDATE and DELETE statements together as database updates.

Another point that should be noted is that one program can contain one or more transactions. Therefore, there may be multiple COMMIT and ROLLBACK statements in the same program. These are not necessarily related to each other. Fig. 5.6 shows an example of a program that contains two transactions. The first transaction is successful and, therefore, committed. The second transaction is unsuccessful and, therefore, rolled back.

Fig. 5.6 One program can contain multiple transactions

This is quite interesting. Inside one program, not only can there be multiple transactions, but they can have different status, that is success (COMMIT) and failure (ROLL BACK).

5.1.3 Transaction Properties

Transactions are said to depict the ACID properties. Each alphabet in ACID stands for a keyword, as follows:

  • A stands for Atomic
  • C stands for Consistent
  • I stands for Isolation
  • D stands for Durable

Let us understand the above.

  • Atomic: A transaction is atomic in nature. This means that it executes exactly once, and ensures that either all the updates to database are completed successfully, or none are implemented at all.
  • Consistent: We have studied this earlier. A transaction ensures that a database is in a consistent state after it completes (either successfully or in an aborted state). Furthermore, we know that the database consistency is end-to-end (i.e. a database is consistent before a transaction starts and after it ends, but during the lifetime of a transaction, it may not be consistent).
  • Isolation: A transaction provides isolation from other transactions. In other words, there could be multiple simultaneous ongoing transactions in a database. However, one transaction never interferes with another ongoing transaction. All transactions are isolated from each other, that is, one transaction does not see the intermediate states of another transaction. For example, if two transactions—T1 and T2—are taking place at the same time in a database system, then one must not know the result of the other while the other is still not complete. However, let us assume that T1 completes before T2. Then, T2 should be able to know what happened in T1. However, T1 still should not know what is happening in T2. It should be allowed access only after transaction T2 is over.
  • Durable: A transaction is a unit of recovery. We shall study this soon. If a transaction is successful, its updates persist even if there is some sort of system crash or another kind of problem. If a transaction fails, the system is guaranteed to be in the same state as it was before the transaction started.

5.2 Recovery

5.2.1 Classification of Recovery

The concept of recovery is closely related to transaction processing. It refers to the recovery of the database itself. Using recovery, we can restore a database to a consistent state. We can perform recovery operations by making copies of the same information elsewhere in the system, so that any missing piece of information can be reconstructed. Note that this is completely different from the ideas related to data redundancy and normalisation.

We can classify recovery as shown in Fig. 5.7.

Let us briefly explain the meaning of these items.

  • Transaction recovery: This term refers to the recovery of data from transaction failures. Whatever we have discussed so far falls into this category. More specifically, transaction recovery talks of recovery of data related to a single transaction. It is not linked with the global system or its state.
  • System recovery: This category refers to more serious, global failures. System recovery is linked with the entire system as a whole. It is not associated with a single transaction. Clearly, its implications are far more serious as compared to transaction recovery. System recovery is classified further into two subcategories, as we shall discuss soon.

Two-phase commit is like a unified effort for achieving something together (like the definition of ideal democracy). When everybody agrees to do something, the result is much better! In two-phase commit, precisely the same thing happens. All concerned done or even if one participant says no, the action is cancelled. It is a collective decision.

Fig. 5.7 Classification of recovery

We have studied transaction recovery in detail. Hence, we shall not repeat the discussion. We shall now take a look at system recovery.

5.2.2 System Recovery

System recovery can be classified into two categories, as follows.

  • Failure recovery: This is necessary when a system fails because of reasons such as power failure. However, failure recovery is not associated with the media, such as the hard disk. Failure recovery is necessary to recover from soft failures so that the associated transactions are not affected.
  • Media recovery: The failures in this category are caused because the media that hold the data, such as the hard disk, fail. This affects the database, the media, and the transactions that were being processed the time of the media failure. Media recovery is, thus, an attempt to recover from such problems.

Let us now discuss the two categories in detail.

5.2.2.1 Failure recovery System failures caused by reasons such as power failure, loss of main memory contents or loss of data in the database buffers need to be examined and dealt with. System recovery is the solution to such problems. When a system failure occurs, invariably the status of all the ongoing transactions is lost. Therefore, a rollback needs to be performed immediately.

Another interesting situation can also occur as a result of system failure. Imagine that a transaction was successful. So, we do a COMMIT on the transaction, and rest assured that the database is in a consistent state. However, as soon as the COMMIT statement is executed, there is a system failure. What would happen? It is possible that although the application program believes that the data is committed or made permanent on the hard disk, in reality, things are different. It may so happen that the data is still in the database buffers as the system failure (say disk crash) occurred before the TP monitor could apply the result of the transaction to the physical database on the disk. This is shown in Fig. 5.8.

Fig. 5.8 System failure after COMMIT but before physical updates to the database

The result of such a failure is that the database may be in an inconsistent state when the system recovers. How can a system take care of such situations and deal with the emanating problems?

The only way to recover from such failures is to write these values to the log before the system failure occurs such that the system restarts, the TP monitor should be able to find these values from the log and write them onto the database. Thus, when a COMMIT statement is encountered, the system should first write the updated values to the log. It should then make attempts to write these updated values to the database. This technique, wherein the log entries precede the database changes, is called a write-ahead log.

At this juncture, another question arises—how would the system know which transactions should be redone and which ones should be undone? For this purpose, the system keeps taking periodic checkpoints (also called synch-points). The taking of check points is a two-step process:

(a) The contents of the database buffers are written (called as force-written) to the database.

(b) When this is successful, the system writes special checkpoint records to the log. A checkpoint record is a list of transactions that were in progress when the checkpoint was being taken.

We shall illustrate this concept with the help of an example shown in Fig.5.9. Suppose we have five transactions—A, B, C, D and E. These transactions start and end at different times. Also, two time slots are important, namely:

  • TC: This is the time when the last checkpoint was taken.
  • TF: This is the time when the system failure occurred.

Fig. 5.9 Process of taking checkpoints

Armed with this information, we will draw a hypothetical chart that shows the progress and status of the five transactions A, B, C, D and E.

Let us now examine what is happening with the five transactions.

  • Transaction A: This transaction started before the checkpoint was taken at time TC. It successfully commited before the system failure occurred. A checkpoint of this transaction was taken at time TC.
  • Transaction B: This transaction started and also got completed before the checkpoint was taken at time TC.
  • Transaction C: This transaction started before the checkpoint was taken at point TC and was not completed when the system failure occurred at time TF.
  • Transaction D: This transaction started after the checkpoint was taken at time TC and was not completed when the system failure occurred at time TF.
  • Transaction E: This transaction started after the checkpoint was taken at time TC and was completed before the system failure occurred at time TF.

Based on the above descriptions, we need to decide which transactions need no action when the computer starts functioning again and which ones need some action (in the form of a transaction undo or redo).

No action needed : Transaction B
Undo needed : Transactions C and D
Redo needed : Transactions A and E

The above is an informal but effective way of identifying transactions requiring a redo operation or undo operation.

Having identified what needs to be done with all the transactions in the system informally, let us now figure out how we can overcome the possible problems. In other words, we need to define an algorithm for performing the undo and redo operations in a more formal way. This algorithm is shown in Fig.5.10. For ease of understanding, we also provide examples wherever applicable.

Fig. 5.10 Algorithm for identifying transactions that need an undo or redo operation

We can see that the formal algorithm also yields the same list of undo and redo transactions as was found by the informal procedure earlier.

Having identified the transactions that need an undo operation or a redo operation, we do the following:

(a)  Work backwards in the system log (i.e. from point TF backwards) and undo the transactions that belong to the undo list (i.e. transactions C and D). In simple terms, this means that any data update to any tables that occurred as a part of these two transactions need to be reversed. This is called backward recovery.

(b)  In the next step redo the transactions that belong to the redo list (i.e. transactions A and E). For this, move in the forward direction from the time the last checkpoint was taken (i.e. time TC). To do so, apply the database updates specified in the log since the time the last checkpoint was taken. This is called forward recovery.

It is important to state that only when the system recovery is complete can the system accept any new work.

5.2.2.2 Media recovery Media recovery is not strictly related to DBMS. However, because media failures can affect a DBMS and its processing severely, we will discuss it in brief.

Media recovery refers to the recovery attempts undertaken when a storage medium, such as a hard disk, that holds a database crashes. Usually, there is a backup procedure that ensures that the database is backed up once a day or so (the frequency can change, depending on requirements). What happens to the transactions that have taken place since the last backup when a media failure occurs? How do we recover any data lost because of such a failure?

For this purpose, a two-step process is used, as follows:

  1. Firstly, we need to restore or reload a database from its backup or dump. This would take the database into the state it was in when the last backup was taken.
  2. The system log is then used to redo all the changes. For this purpose, we need to consider all the activities in the system log since the time the last backup was taken.

This also makes one point clear—the media for transaction processing, system logs and backups must be different. If they are the same, then there is little hope of recovering from any sort of failure.

Note that in the case of media failures, there is no place for undo activities. This is because such transactions are completely lost. In other words, there is nothing to undo. Instead, what we require is a dump/restore or unload/load utility.

  • The dump portion of the utility backs up a database as and when needed.
  • The restore portion recreates a database from an earlier dump.

Fig. 5.11 illustrates these concepts.

We use recovery principles quite often in our daily life. For example, when we submit original documents for any reason (e.g. insurance claim, getting a visa), we take a photocopy of the same as a precaution. Database recovery operations do the same thing. They take a copy of the data that exists before any update operations are performed in a transaction-enabled environment, so that we can back out in the case of any problems.

Fig. 5.11 Dump and restore concepts

5.3 TRANSACTION MODELS

Transaction models can be classified into three major types: flat, chained and nested, as shown in Fig. 5.12.

Fig. 5.12 Transaction models

These models are based on questions like:

  • When should a transaction start?
  • When should a transaction end?
  • When should the effects of a transaction be made visible to the outside world?
  • What is the right way of recovery in the case of transaction failures?

Depending on the answers to these questions, let us discuss the three transaction models in brief.

5.3.1 Flat Transactions

Flat transactions involve three main steps:

  • The transaction begins with an identifier such as BEGIN TRANSACTION, which identifies that a transaction has begun.
  • In the second step (which is actually the main reason why the transaction is required), the actual operations in the transaction (such as database reads, updates, deletions or insertions) take place.
  • If the second step is successfully completed in all respects, the third step finalises the transaction with the help of an identifier such as COMMIT. However, if even a single operation in the second step fails, the third step cancels the whole transaction, which is signified by a ROLLBACK operation.

This is shown in Fig. 5.13.

Fig. 5.13 Flat transaction

As evident, the approach taken by flat transaction is quite simple and straightforward. Therefore, it is very widely used in real practice. However, it also has some weaknesses. For example, suppose a transaction needs to update one million records in a single database. Assume that the transaction is almost through and has updated 999900 out of the one million records when an error occures. Obviously, to maintain consistency, the transaction must be aborted (i.e. rolled back) and all the database updates done up to this point will need to be discarded! Such an experience can be quite wasteful and frustrating.

5.3.2 Chained Transactions

In chained transactions, we have many mini-transactions within a main transaction. Therefore, extending our earlier example, we can have one mini-transaction for every 1000 records, and thus commit the database changes after every 1000th update call in an update operation involving one million records. With this scheme, in the event of any error, we need not roll back the whole transaction - we need to roll back the last 999 updates at the most. Clearly, this is highly convenient. This is shown in Fig. 5.14.

Fig 5.14 Chained transaction

We will realise that this is quite similar to taking periodic checkpoints. The only difference between the checkpoints that we had discussed earlier and these checkpoints is that the former is handled at a system level (i.e. by the DBMS), whereas the latter is an application-level feature in which the application programmer provides for periodic checkpoints.

5.3.3 Nested Transactions

The concept of nested transactions introduces a hierarchical relationship of master and sub-transactions. Here, a master (higher level) transaction can start one or more sub (lower level) transactions. Each of the sub-transactions can, in turn, instantiate one or more sub-sub-level transaction, and so on. The advantage of this approach is that in the event of the failure of a sub-transaction (or sub-sub-transaction), the immediate higher-level transaction can trap it and make an attempt to redo it using an alternative approach. This helps the application developers write more granular (smaller) transactions. This is shown in Fig. 5.15

Fig. 5.15 Nested transaction

5.4 TWO-PHASE COMMIT

At times, nested transactions can span across databases. For example, one top-level transaction may consist of one sub-transaction on the DB2 database and another on the IMS database. Thus, only when the sub-transactions on both the databases are successfully completed can we consider the whole top-level (master) transaction successful.

How do we make sure that not only the individual sub-transactions but also the overall top-level transaction is successful? We must have some sort of mechanism to ensure this. It is for this reason that a special technique called two-phase commit is used in such cases. The two-phase commit protocol is used to synchronise updates on two or more databases that run on physically different computers. This ensures that either all of them succeed or all of them fail. Thus, it is a transaction of transactions.

The database as a whole, therefore, is not left in an inconsistent state. To achieve this, a central coordinator (one of the database machines) is designated, which coordinates the synchronisation of the commit/rollback operations. Other participants in the two-phase commit protocol have the right to either say OK, I can COMMIT now or No, I cannot COMMIT now. Accordingly, the central coordinator takes a decision about committing the transaction or rolling it back. For committing, the central coordinator must get an OK from all the participants. Even if one of the participants cannot commit because of reason, all the participants must roll the transaction back. This works as follows:

  • Phase 1 (Prepare phase): In this phase, the central coordinator sends a Prepare to COMMIT message to all the participants of the transaction. In response, each participant sends either a Ready to COMMIT or Cannot COMMIT message back to the central coordinator, depending on whether they can go ahead with the committing of the transaction or not. This is shown in Fig. 5.16
  • Phase 2 (Commit phase): Only if the central coordinator receives a Ready to COMMIT reply from all the participants in Phase 1, can it send a COMMIT message to all the participants. Otherwise, it sends a ROLLBACK message to all the participants. Assuming that the central coordinator has sent a COMMIT message, all the participants now commit the transaction and send a Completed message back to the coordinator. If any of the participants fails in the commit process, for any reason, that participant sends back a ROLLBACK message to the central coordinator. If the central coordinator receives even one ROLLBACK, it asks all the participants to roll back. Otherwise, the whole transaction is considered as successful. This is shown in Fig. 5.17

Fig. 5.16 Two phase commit – Prepare phase

Fig. 5.17 Two phase commit – Commit phase

5.5 CONCURRENCY PROBLEMS

A database can have multiple transactions running at the same time. This is called concurrency.

These concurrent transactions may perform various database updates simultaneously. Therefore, a mechanism is required to ensure that these transactions do not interfere with each other.

A mechanism which ensures that simultaneous execution of more than one transaction does not lead to any database inconstancies is called concurrency control mechanism.

Suppose transaction A is updating table X. What if another transaction, B, also updates the same table X a bit later, before transaction A is complete? Clearly, this would lead to a lot of problems and as a result. Table X may not retain its integrity. This is precisely the kind of problem that can occur in the absence of concurrency control mechanisms. Any such problem is called concurrency problem.

Classically, concurrency problems are classified into four categories, as shown in Fig. 5.18

Fig. 5.18 Concurrency problems

Now we shall discuss these problems which are best explained with the help of examples, rather than explanations. So, we shall use examples to illustrate the possible problems in the various categories of concurrency problems.

5.5.1 Lost Update Problem

Consider the situation depicted in Fig. 5.19

The sequence of events is as follows:

  1. At time t1, transaction A reads a row r of a table.
  2. At time t2, transaction B also reads the same row r.
  3. At time t3, transaction A updates a column value in row r.
  4. At time t4, transaction B updates the same column value in row r.

Fig. 5.19 Lost update problem

The result of this sequence of events is that the update made by transaction A in step 3 is overwritten or lost because of the update made by transaction B in step 4. Therefore, such a problem it is termed as the lost update problem.

Let us illustrate this with the help of an example in SQL as shown in Fig. 5.20.

We can see that at times t1 and t2, transactions A and B respectively read (SELECT) the value of the column Salary from table Employee for Emp_number 101. The value turns out to be 10000. At time t3, transaction A increments the salary value to 12000, and commits its changes. Next, at time t4, transaction B sets the salary value to 14000 without even looking at what transaction A has done in step 3.

It can be argued that even if these (i.e. A and B) were not concurrent transactions, (say transaction A happens today and transaction B happens a week later) the result would have been the same. That is, transaction B would have anyway overwritten the updates of transaction A. Then, is this a concurrency problem at all? Well, it is. Ideally, any update operation should know what it is trying to update and also the value before it begins updating. That is the problem here. More specifically, transaction B believed that the value of the Salary column for this employee was 10000 (whereas it was actually 12000) and then set it to 14000. So, was the intention to add 4000 to the value that it believed to be current? If that was the case, would it have added just 2000 if it knew that the current value is actually 12000 and not 10000 or did it want to add 4000 to whatever was the current value (in which case, the final value should have been 16000)? All this creates confusion.

Fig. 5.20 Lost update – SQL example

Of course, such problems could also be avoided by writing the queries in another fashion. For instance, transaction A could have written the following query:

UPDATE Employee

SET Salary = Salary * 1.20

WHERE Emp_Number = 101;

COMMIT;

Next, transaction B could have written the following query:

UPDATE Employee

SET Salary = Salary * 1.40

WHERE Emp_Number = 101;

COMMIT;

Now, the intention of both the transactions–A and B–is quite clear. The two transactions wanted to increment the current value of salary by 20% and 40% respectively, whatever the current value may be.

To summarise, the updates made by transaction A are overwritten or lost in this problem.

5.5.2 Dirty (Uncommitted) Read Problem

The dirty (uncommitted) read problem, also called uncommitted dependency problem, is caused when a transaction (say A) retrieves, or even worse, updates a row updated by another uncommitted transaction (say B). We need to discuss two cases in this context.

Case 1

Fig. 5.21 illustrates the first case. Here, transaction A retrieves a row updated but not yet committed by transaction B.

Fig. 5.21 Dirty (Uncommitted) read problem: Case 1

The sequence of events is as follows.

  1. At time t1, transaction B updates a row r of a table.
  2. At time t2, transaction A reads the same row r.
  3. At time t3, transaction B does a ROLLBACK.

The problem with this scheme is that at time t2, transaction A has retrieved a row that has an uncommitted value. After transaction B does a ROLLBACK at time t3, this uncommitted value disappears from the table. Therefore, if transaction A performs any action based on this value, it would be faulty.

In other words, at time t2, transaction A believes that the value in the row is as before time t1, whereas, in reality, it has seen a non-existent value, which is going to disappear soon.

It is also worth noting that transaction B cannot be blamed for this problem. It has faithfully performed an update, only to realise later that the update operation should not have been executed. Therefore, in its own right, it has done a ROLLBACK. It does not even know that transaction A would suffer because of this.

Fig. 5.22 shows an example of this problem.

Fig. 5.22 Dirty (Uncommitted) read problem: Case 1 – SQL example

We can see that at time t1, transaction B updates the salary for employee number 101 to 14000. However, it neither commits the transaction, nor rolls it back. Let us assume further that at time t2, transaction A selects the same row, thus fetching a value of 14000. For some reason, the update performed by transaction B at time t1 has problems. Therefore, at time t3, transaction B performs a ROLLBACK. Thus, the value of salary again becomes 10000. Now imagine that transaction A performs a few operations, such as taking a decision based on the value of salary for this employee, at the non-existent time t4. Note that this decision is based on the assumption that the value of the salary is 14000, whereas it actually is 10000. Thus, we can see that transaction A performs a dirty read, also called as an uncommitted read, at time t2. This transaction subsequently gets rolled back and thus transaction A it is misled.

Case 2

This problem is quite similar to the problem in the earlier situation except for one difference. Recall that in the earlier case, at time t2, transaction A performs a dirty (uncommitted) read. In other words, it retrieves a value that is not committed, and which subsequently gets rolled back. Now, the problem is worse. Transaction A does not retrieve a value that is not committed – it updates it once more after which transaction B overwrites it anyway, by doing a ROLLBACK.

The sequence of events is shown in Fig. 5.23.

Fig. 5.23 Dirty (Uncommitted) read problem: Case 2

As we can see, there are two updates now, one over the other, in quick succession. After this, the transaction that made the first update (transaction B) does a ROLLBACK. As a result, the update done by transaction A at time t2 is lost. At the end of time t3, the database would go back to the state that it was in prior to time t1. Transaction B thinks that it has rolled back its update done at time t1. Instead, it has unknowingly rolled back the update done by transaction A at time t2.

Let us illustrate this problem with the help of an example, as depicted in Fig. 5.24.

Fig. 5.24 Dirty (Uncommitted) read problem: Case 2 – SQL example

The initial value of salary for employee number 101 is 10000. At time t1, transaction B updates it to 14000 without committing its changes. At time t2, transaction A overwrites it with 12000, again without performing a COMMIT. At time t3, transaction B performs a ROLLBACK, undoing the changes done by both transactions A and B at times t1 and t2. Thus, the database goes back to the state, which it was in prior to time t1. In other words, salary is again 10000.

5.5.3 Non-Repeatable Read Problem

The non-repeatable read problem, also known as the inconsistent analysis problem, occurs when a transaction sees two different values for the same row within its lifetime. For example let us assume that transaction A retrieves a row from a table, after which transaction B updates the same row and commits its changes. Immediately afterwards, transaction A once again retrieves the same row from the same table. Because transaction B has updated the row in the meantime, transaction A would now see a value that is different from the value it had retrieved earlier. We can see that transaction B cannot repeat its reading operation (actually, the results of its read operation do not repeat). Hence the name non-repeatable read.

Fig. 5.25 shows the idea behind a non-repeatable read problem.

Fig. 5.25 Non-repeatable read problem

The sequence of events is as follows.

  1. At time t1, transaction A reads a row r of a table.
  2. At time t2, transaction B updates the same row r, and performs a COMMIT operation.
  3. At time t3, transaction A again attempts to retrieve the same row r. Because transaction B has changed the value of the row r at time t2, transaction A will now get a value that is different from what it had obtained at time t1.

Note that this is quite different from the dirty (uncommitted) read problem, studied earlier. Here, we are talking about one transaction A being fooled because the other transaction B has committed its changes during the lifetime of A. This can be a problem, if transaction A needs to perform some sort of decision-making operation based on the old value. For example, what if transaction A reads the database, performs some operations, and again reads the database to derive a new value based on the old value. It would get inconsistent results.

We shall consider two examples to illustrate this problem, since it can be quite easily confused with the dirty (uncommitted) read problem. Fig. 5.26 shows the first example.

Fig. 5.26 Non-repeatable read problem: SQL example

Transaction A reads the salary value as 10000 and as 15000 at times t1 and t3 respectively. These two steps can take place within a few seconds. In the meantime, transaction B has changed the value of the salary from 10000 to 15000 at time t2. This change fooled transaction A.

Let us consider a more detailed example, which can explain the problem in depth, as illustrated in Fig. 5.27. We shall avoid SQL syntaxes to save space.

Fig. 5.27 Non-repeatable read problem: second example

We consider three variables—X, Y, and Z. Transaction A wants to add the values of these three variables to calculate the sum. It performs three read and summation operations at time t1, t2 and t8. In the meantime, transaction B reads and updates the value of Z at times t3 and t4 and that of X at times t5 and t6. It commits its changes at time t7. Therefore, between times t2 and t8, the state of the database changes considerably. Thus, transaction A calculates the sum of X, Y and Z as 210 based on the original values of X (50) and Y (100) and the changed value of Z (60). However, it does not realise that X is also changed to 70 and, therefore, the sum of X, Y and Z should actually be 70 + 100 + 60 = 230, and not 210.

5.5.4 Phantom Read Problem

The phantom read problem (also called as phantom insert problem) is a special case of the non-repeatable read problem. We know that in the example the non-repeatable read problem occurs when transaction B does some updates before transaction A completes its reading operations. In the phantom insert problem, however, transaction A is fooled because transaction B inserts a row before it completes its reading operation. In principle, it is similar to the non-repeatable read problem. This is shown in Fig. 5.28.

Fig. 5.28 Phantom read problem

In the given case, the sequence of events is as follows.

  1. At time t1, transaction A reads some rows of a table.
  2. At time t2, transaction B inserts a new row in the table, and performs a COMMIT operation.
  3. At time t3, transaction A again attempts to retrieve the same rows as it had done at time t1. Because transaction B has inserted a new row at time t2, transaction A will now get a value that is different from what it had obtained at time t1.

Let us understand this with a simple example, as shown in Fig. 5.29.

Fig. 5.29 Phantom read problem: SQL example

Transaction A calculates the average of salary at times t1 and t3. At time t1, the average salary figure turns out to be 10000 and at time t3, all of a sudden, it turns out to be 11,000. The reason for this change is a phantom row inserted by transaction B at time t2.

5.6 LOCKING

The concept of locking is used to solve concurrency problems. The idea behind this concepts is that when one transaction needs an object (for example, a row in a database table), it should acquire a lock on that object. In other words, it should lock the other transactions in the system out of that object. This would disallow those transactions from making any changes to that object. In effect, it can do any changes that it wants, without encountering any problems. Locks can be classified into two types: exclusive (X), also called write and shared (S), also called read. This is shown in Fig. 5.30.

Fig. 5.30 Types of database locks

The properties of these types of locks need to be clarified. Let us assume that we have two transactions, A and B, in our database system.

  • If transaction A locks a row r in the table in the exclusive mode (X), then transaction B or any other transaction cannot acquire either an exclusive lock (X) or a shared lock (S) on the same row.
  • If transaction A locks a row r in the table in the shared mode (S), then transaction B or any other transaction cannot acquire an exclusive lock (X) on the same row, but it can acquire a shared lock (S) on the same row.

(Note that although we talk of “locking a row", in reality it means either locking a table or locking a page of rows.)

We can prepare a matrix to show these results, as in Table 5.1.

Table 5.1 Locking possibilities

The way this table should be read is as follows:

A column indicates a lock that is already acquired. A row indicates a lock that is being requested.

For example, we shall consider two samples.

(a)  If transaction A holds an exclusive lock (X) on a row r (See column titled X) and transaction B requests for a shared lock (S) on the same row r (See column titled S) then it is not allowed.

(b)  If transaction A holds a shared lock (S) on a row r (See column titled S) and transaction B requests for a shared lock (S) on the same row r (See column titled S), then it is allowed.

Database recovery can be implemented in various ways. Some database technologies make updates to the back up of the tables, and not the tables themselves until a transaction is committed. Upon commit, they apply these changes from the backup to the actual tables. Some other databases apply changes directly to the database tables, and in the case of failures, read the backups for restoration of old values. How this is implemented is not so important as long as we are able to recover data in some manner.

The other combinations can be similarly understood.

In general, a transaction needing to simply retrieve data from a table should acquire a shared (S) lock. However, if it intends to make any updates (i.e. inserts/updates/deletes), then it should acquire an exclusive (X) lock.

If the locking request of transaction B is denied because transaction A has locked it as per the table shown earlier, then transaction B goes into a wait state. It is important that the duration of the wait is limited; it must not be infinite, otherwise, it may lead to system problems and delayed responses.

An exclusive (X) lock is held till the end of the transaction (i.e. until the transaction holding the lock performs a commit or a rollback). A shared (S) lock is also generally held till the end of the transaction, but not always.

5.7 CONCURRENCY PROBLEMS REVISITED

Armed with the facility of locking, let us revisit our concurrency problems to see if they can be solved now. We shall not discuss these problems in detail, as they have been covered earlier. We shall simply focus on the possible solutions to these problems.

5.7.1 Lost Update Problem Revisited

Armed with locking, consider the problem depicted in Fig. 5.31.

Fig. 5.31 Lost update problem revisited

There is no change in events between times t1 and t2. Note that transactions A and B implicitly acquire a shared (S) lock on row r at times t1 and t2 respectively, as they want to perform a read operation. At time t3, transaction A makes an attempt to update row r. This causes transaction A to automatically request for an exclusive (X) lock. However, because transaction B already has a shared (S) lock on the row, this request of transaction A is not fulfilled. At time t4, transaction B wants to perform an update, and therefore, requests for an exclusive (X) lock. This request is also not granted, because transaction A possesses a shared (S) lock since time t1.

As a result, both transactions A and B wait from time t3 and t4 respectively. More specifically:

  • Transaction A is waiting for transaction B to release something (i.e. a lock on row r).
  • Transaction B is waiting for transaction A to release something (i.e. a lock on row r).

This situation is peculiar, and is defined as follows.

A deadlock occurs when two or more transactions wait at the same
time for others to release their locks, so that they can
proceed.

In other words, this situation is called as deadlock and is also known as deadly embrace.

We shall discuss deadlocks in detail soon. For now, we realise that although locking seems to have solved the problem of concurrency in the case of lost update, it has also introduced a new one – that of the deadlock.

5.7.2 Dirty (Uncommitted) Read Problem Revisited

We know that the dirty (uncommitted) read problem is caused when a transaction (say A) retrieves, or even worse, updates a row updated by another uncommitted transaction (say B). Let us revisit this with the introduction of locking as shown in Fig. 5.32.

The sequence of events is as follows.

  1. At time t1, transaction B performs an update on row r, thereby acquiring an exclusive (X) lock.
  2. At time t2, transaction A needs to perform a read operation on row r. Therefore, it requests for a shared (S) lock on the row. This request is not be granted, as transaction B still holds an exclusive (X) lock.
  3. At time t3, transaction A is waiting for transaction B to release its lock, and free row r. Transaction B obliges, by performing a ROLLBACK operation, which causes the lock to be released.
  4. At time t4, transaction A obtains the shared (S) lock that it was looking for. It can now proceed with its data retrieval operation.

Locking of tables can provide significant benefits in terms of ensuring that we are not causing any accidental problems due to concurrency issues. When a user locks a table, another user cannot update it, thus preventing any actions that would cause the database to reach an inconsistent or invalid state.

Fig. 5.32 Dirty (Uncommitted) read problem revisited

We can see that locking solves the problem of dirty (uncommitted) read.

If transaction A needs to perform an update, rather than a read, at time t2, things do not change much. The only thing that would change is that transaction A would now request for an exclusive (X) lock, instead of a shared (S) lock. This request would be granted only after transaction B releases its lock at time t3. As we can see, this is quite similar to the steps discussed above.

5.7.3 Non-repeatable Read Problem Revisited

We know that the non-repeatable read problem, also known as the inconsistent analysis problem, occurs when a transaction sees two different values for the same row within its lifetime. Fig. 5.33. shows the non-repeatable read problem after incorporating the locking mechanisms.

The sequence of events is as follows.

  1. At time t1, transaction A reads row r of a table. Therefore, it implicitly acquires a shared (S) lock on the row.
  2. At time t2, transaction B wants to update the same row r. Therefore, it requests for an exclusive (X) lock on the row. However, because transaction A has already acquired a shared (S) lock, transaction B must wait.
  3. At time t3, transaction A again retrieves the same row r. Because it already holds a shared (S) lock on the row, this is not a problem at all. Note that transaction B is waiting.
  4. At time t4, transaction A releases its shared (S) lock. Transaction B is still waiting.
  5. At time t5, transaction A does not hold a lock on the row. Therefore, transaction B would be granted an exclusive (X) lock on the row, as desired. Transaction B can now perform an update operation, as required.

Fig. 5.33 Non-repeatable read problem revisited

We can see that locking has solved the problem of non-repeatable read as well.

5.7.4 Phantom Read Problem Revisited

We know that the phantom read problem (also called as phantom insert problem) is a special case of the non-repeatable read problem. In the phantom insert problem, transaction A is fooled because transaction B inserts a row before it completes its reading operation. Otherwise, it is similar to the non-repeatable read problem. We show the effect of locking on this problem in Fig. 5.34.

Fig. 5.34 Phantom read problem revisited

The sequence of events is exactly the same as was in the case of the non-repeatable read problem. Hence, we will not describe it, except pointing out that the only change is that transaction B intends to insert a row, rather than updating one. Other things remain the same.

5.8 DEADLOCKS

We have touched upon the concept of deadlocks earlier. Deadlock is a specific case of concurrency problem. A concurrency problem can be resolved and yet it can lead to a deadlock. There are several ways in which we can illustrate a deadlock. For the present, we shall depict a very simple case.

  • Transaction A holds a lock on Table T1. Another transaction B also wants to acquire a lock on the same table.
  • Transaction B holds a lock on Table T2. Transaction A also wants to acquire a lock on the same table.

This situation is depicted in Fig. 5.35.

Fig. 5.35 Concept of Deadlock

The thick lines show a lock that is already acquired (e.g. transaction A on Table T1 and transaction B on Table T2). The dotted lines show a lock that is desired (e.g. transaction A on Table T2 and transaction B on Table T1).

We will realise that unless transaction B gives up its lock on Table T2, transaction A cannot proceed. On the other hand, unless transaction A gives up its lock on Table T1, transaction B cannot proceed.

Thus, both the transactions A and B depend on each other to release their respective locks. This can never happen automatically. Hence, it is a condition in which a transaction waits endlessly for resources held by another. This is precisely what is called a deadlock. We can illustrate this with the help of a time graph, as shown in Fig. 5.36.

Fig. 5.36 Deadlock

The sequence of events is as follows.

  1. At time t1, transaction A acquires an exclusive (X) lock on Table T1.
  2. At time t2, transaction B acquires an exclusive (X) lock on Table T2.
  3. At time t3, transaction A requests for an exclusive (X) lock on Table T2.
  4. At time t4, transaction B requests for an exclusive (X) lock on Table T1.
  5. At time t5, transaction A enters into a wait state, as it cannot get a lock on Table T2.
  6. At time t6, transaction B enters into a wait state, as it cannot get a lock on Table T1.

Thus, we enter into a deadlock.

How do we resolve a deadlock? For this purpose, one of the transactions (called the victim) needs to be identified/detected. This transaction is then rolled back. This causes all the locks that it has acquired to be released. After this, the other transactions that have been waiting because of the deadlock can proceed.

Thus, in our example, we must roll back either transaction A or transaction B. This would free the system resources for the other transaction and allow it to complete. After the completion, we can restart the rolled back transaction from scratch. Of course, when such an action is taken, it is always advisable that the users of the transaction being rolled back be quickly informed about this.

5.9 TRANSACTION SERIALISABILITY

If the result of the execution of a set of transactions is the same as executing them serially, one-by-one, then these transactions are serialisable.

In other words, a given set of transactions executes perfectly if it is serialisable.

The following theory is proposed in the context of serialisable transactions.

(a) Individually, these transactions are supposed to be all right. This is because they transform the database from one consistent state to another.

(b)  Therefore, it should be all right to execute them one after the other in any order. This is because all these transactions are independent of each other.

(c)  Therefore, an interleaved (or combined) execution is also acceptable if the result is the same as the serial execution of these transactions. That is, a set of these transactions is then serialisable.

If we revisit the set of transactions that we have been discussing so far, we would note that none of them was serialisable. That is, transactions A and B were being executed in an interleaved manner. By using the concept of locking, we actually forced these sets of transactions to become serialisable. That is, we enforced executions such as First A then B or First B then A. No longer did we allow Some A some B or Some B some A.

Any execution of a set of transactions (either interleaved or serialised) is called as its schedule.

If we execute the transactions strictly in a sequence, one at a time, then the schedule is called a serial schedule. On the other hand, if there is an overlap of transactions, then the schedule is called interleaved schedule or non-serial schedule. Two transactions are equivalent if they produce the same results, regardless of the order in which they execute, i.e. they move the database from one consistent state to another.

It is extremely important to note that two different transaction schedules (either serial or interleaved) involving the same transactions may yield different results, and yet they are correct! For example, consider two transactions, A and B, as shown in Fig. 5.37.

Fig. 5.37 Different transaction schedules

Let us assume that these transactions are operating on a value T whose initial value is 50. Note Fig. 5.38 to see how dramatically different the results are, if the execution is First A then B versus First B then A.

Fig. 5.38 Different transaction schedules may produce different results

The reason we state that either sequence of transactions (i.e. either of the schedules) is acceptable, is because both leave the database in a consistent state. Transaction management does not have a say about which transaction should start first, that is which transaction should get precedence. This is a decision left best to the software application.

5.10 TWO-PHASE LOCKING

Two-phase locking is a protocol that guarantees that if all the transactions obey it, all the possible interleaved schedules become serialisable.

Before we proceed, the most important thing to remember is that this protocol is completely different from the two-phase commit protocol. Except for the similarities in naming, these two protocols are unrelated in theory as well as practice.

In other words, this protocol ensures serialisability of any schedule, provided it adheres to the conditions specified by the protocol. This protocol mandates the following.

(i)  A transaction must acquire a lock on any object that it intends to open (i.e. work with).

(ii)  After releasing a lock, a transaction must never attempt to acquire any more locks.

Thus, a transaction consists of two phases:

(a)  In the acquire phase, a transaction goes on acquiring locks as and when it needs to open new objects.

(b)  The release phase is indicated by a single COMMIT or ROLLBACK operation.

This protocol is shown in Fig. 5.39.

Fig. 5.39 Two-phase locking protocol

5.11 ISOLATION LEVELS

Isolation level is defined as the degree of interference that a transaction can tolerate in a concurrent operation.

The higher the isolation, the lesser is the interference and, therefore, lesser is the concurrency. But higher isolation also awaits lesser overall system throughput. This is shown in Table 5.2.

Table 5.2 Effect of isolation levels

Isolation level Interference Throughput
Low High High
High Low Low

If a transaction has to be serialisable, then it must not be interrupted at all. In other words, it must not be interfered with and the degree of interference must be 0. Thus, the isolation level must be a maximum. This is fine in theory, but almost impossible to achieve in practice.

SQL provides four possible isolation levels. A transaction could be assigned one of these four possible isolation levels before it starts. The business requirements and priority of a transaction together determine the isolation level that must be chosen for that transaction. The four isolation levels are read uncommitted, read committed, repeatable read and serialisable. They are shown in Fig. 5.40.

These four isolation levels are shown in the increasing order of isolation in Table 5.3. As we can see, at one end, uncommitted read offers the lowest isolation (and highest interference). At the other extreme, serialisable offers the highest isolation (and lowest interference).

Fig. 5.40 Isolation levels

Table 5.3 Effect of isolation levels

Isolation level Isolation Interference
Uncommitted read Lowest Highest
Committed read Medium High
Repeatable read High Medium
serialisable Highest Lowest

Let us now describe the effects of these isolation levels.

  • Uncommitted read: This isolation level offers the lowest isolation. As a result, the same query can return different results if executed multiple times, regardless of whether the changes are committed or not.
  • Committed read: This isolation level allows the same query to return different rows, but only if the transaction that made the updates to the database committed its changes.
  • Repeatable read: This isolation level allows phantom inserts, but prevents everything else (including the repeatable read problem).
  • Serialisable: This isolation level offers the highest isolation. It treats every transaction as a separate entity and does not provide any space for the occurance of concurrency problems. In other words, it makes a scheme of transactions completely serialisable.

We tabulate the effects of various isolation levels on concurrency problems in Table 5.4.

Table 5.4 Effect of isolation levels on concurrency problems

We shall provide one example that shows how to interpret this table.

If the isolation level is uncommitted read, then the lost update concurrency problem cannot occur. However, the other three problems can occur. Similarly, the other three isolation levels can be understood.

ACID properties

Acquire phase

Atomicity

Backward recovery

Chained transaction

Checkpoint record

Commit

Commit phase

Concurrency problems

Concurrent transaction

Consistency

Coordinator

Deadlock

Deadly embrace

Dirty (Uncommitted) read problem

Durability

Exclusive (X) lock

Failure recovery

Flat transaction

Force-writing

Forward recovery

Inconsistent analysis problem

Interleaved schedule

Isolation

Isolation level

Journal

Locke

Lost update problem

Media recovery

Nested transaction

Non-repeatable read problem

Non-serial schedule

Participant

Phantom insert problem

Phantom read problem

Prepare phase

Read committed

Read lock

Read uncommitted

Release phase

Repeatable read

Rollback

Schedule

Serial schedule

Serialisable

Serialisable transaction

Shared (S) lock

Synch-point

System log

System recovery

Transaction

Transaction Processing (TP) monitor

Transaction recovery

Two-phase commit

Two-phase locking

Write lock

Write-ahead log

  • A transaction is a logical unit of work. It signifies that all the operations must be executed in entirety or not at all. A transaction must not execute partially.
  • A transaction consists of distinct steps, namely Begin transaction, Commit/Rollback, and End transaction
  • If a transaction is committed, all the database changes made inside the transaction are made permanent.
  • If a transaction is rolled back, all the database changes made inside the transaction are undone.
  • A Transaction Processing (TP) monitor ensures that transactions are performed as expected.
  • A system log or journal is used internally to record database operations. It is used while committing or rolling back a transaction.
  • A transaction has four properties, together called as ACID.
  • The letter A in ACID stands for atomicity. It means that a transaction must execute exactly once completely or not at all.
  • The letter C in ACID stands for consistency. It means that when it ends a transaction must leave the database in a consistent state.
  • The letter I in ACID stands for isolation. It means that transactions must not interfere with each other – they must be completely isolated from each other.
  • The letter D in ACID stands for durability. It means that a transaction must make its changes permanent to the database when it ends.
  • Database recovery consists of transaction recovery and system recovery.
  • Transaction recovery deals with individual transactions.
  • System recovery deals with all the transactions in a system as a whole.
  • System recovery is sub-classified into failure recovery and media recovery.
  • Failure recovery deals with soft errors, such as power failure.
  • Media recovery deals with disk errors.
  • DBMS takes frequent checkpoints, which are used in the recovery process.
  • Transactions can be flat, chained or nested.
  • Two-phase commit protocol is used to perform multiple transactions that execute on different databases/computers.
  • Concurrency means multiple transactions running at the same time.
  • Concurrent execution of transactions can lead to four possible problems, called as concurrency problems. These are: lost update, dirty (uncommitted) read, non-repeatable read and phantom read.
  • In the lost update problem, one transaction overwrites the changes of another transaction.
  • In the dirty (uncommitted) read problem, one transaction reads an uncommitted value of another transaction.
  • In the non-repeatable read problem, one transaction sees two different values for the same row in two successive executions.
  • In the phantom read problem, one transaction inserts a row in the table while the other transaction is half-way through its browsing of the table.
  • Locking helps solve concurrency problems.
  • Locks can be shared (S) or exclusive (X).
  • If a transaction obtains a shared lock on a row, it means that the transaction wants to read that row.
  • If a transaction obtains an exclusive lock on a row, it means that the transaction wants to update that row.
  • Deadlock is a specific concurrency problem wherein two transactions depend on each other for something.
  • Transaction serialisability ensures that the transactions are being executed successfully.
  • Two-phase locking protocol guarantees that a set of transactions becomes serialisable.
  • Isolation level is the degree of interference that a transaction can tolerate in a concurrent operation.
  • SQL defines four isolation levels: read uncommitted, read committed, repeatable read, and serialisable.

  1. A transaction is a logical unit of work.
  2. A COMMIT statement indicates the end of a transaction.
  3. Transactions are said to depict the ACID properties.
  4. If a transaction executes only halfway, the system becomes unstable.
  5. In nested transaction we have many mini-transactions within a main transaction.
  6. A database can have multiple transactions running at the same time. This is called concurrency.
  7. The non-repeatable read problem is also known as the inconsistent analysis problem.
  8. A deadlock occurs when two or more transactions wait at the same time for others to release their locks, so that they can proceed.
  9. Locking solves the problem of phantom read.
  10. If there is an overlap of transactions, then the schedule is called as serial schedule.

1.  A transaction processing system is also called as _________.

(a) TP monitor

(b) transaction monitor

(c) processing monitor

(d) monitor

2.   _________ is/are used for COMMIT and ROLLBACK operations.

(a) System log

(b) Journal

(c) Both a and b

(d) None of a, b or c

3.  Failure recovery and media recovery fall under _________.

(a) transaction recovery

(b) system recovery

(c) database recovery

(d) none of a, b and c

4.  Check-points are sometimes also called as ________.

(a) crash-points

(b) sync-points

(c) function-points

(d) none of a, b or c

5.  Flat, chained, nested are the types of ________.

(a) locks

(b) transaction models

(c) database models

(d) system models

6.  In __________ transactions, we have hierarchical relationship between master and child transactions.

(a) chained

(b) flat

(c) overlapped

(d) nested

7.  The phantom read problem is a special case of __________.

(a) non-repeatable read problem

(b) uncommitted read problem

(c) lost update problem

(d) nested read problem

8.  If we execute the transaction strictly in sequence, one at a time, then the schedule is called a __________.

(a) serial schedule

(b) interleaved schedule

(c) non-serial schedule

(d) simple schedule

9.  __________ is a protocol that guarantees that if all the transactions obey it, all the possible interleaved schedules become serialisable.

(a) Single-phase locking

(b) Two-phase locking

(c) Isolation levels

(d) Scheduling

10.  _________ is defined as the degree of interference that a transaction can tolerate in a concurrent environment

(a) Key

(b) Locking

(c) Transaction

(d) Isolations level

  1. Explain the concept of transaction processing monitor.
  2. Explain ACID properties of transactions.
  3. Explain recovery types with short description.
  4. What are the various transaction models?
  5. Explain the concept of concurrency problems.
  6. Describe the lost update problem.
  7. Explain the dirty read problem.
  8. What is the non-repeatable read problem?
  9. Explain locking mechanism with short descriptions.
  10. Describe the concept of deadlock.

  1. Is the concept of a transaction needed in a single-user, single-database environment?
  2. Does the concept of transaction apply only to computer-based operations? Is the idea of a transaction important even in our daily life?
  3. Investigate whether your bank ensures transaction management.
  4. Would concurrency problems occur if there are only two transactions in the system, and both are performing read-only operation? Why?
  5. Which concurrency problems can happen if one transaction is reading and another is updating data?
  6. Can we do logging by using a database? What are the pros and cons of this approach?
  7. Find out the various isolation levels defined in the different DBMS products such as Oracle, SQL Server and DB2.
  8. How can one code nested transactions programmatically? Write a short algorithm for it.
  9. What are the various TP monitors available in the market?
  10. Think of a practical situation where two-phase commit would be needed.