Appendix A: Data Structures – Introduction to Database Management Systems

Appendix A

Data Structures

We have referred to several data structures, such as lists, queues, binary trees, and so on. There is a very interesting theoretical basis behind these data structures. In this appendix, we shall study some of the most common data structures with several examples in the C programming language. Those not familiar with C can skip the programming examples, but are recommended to study the discussions of the various data structures.

A.1 LINKED LISTS

A linked list consists of data items that are arranged in a sequential order in such a manner that every data item contains the address of the next item in the list. In other words a data item points to the next item in the list. Therefore, an item contains data as well as a pointer to the next item in the list. The basic concept is illustrated in Fig. A.1.

Fig. A.1Linked list

As we can see, the first item contains data (shown as A), and a pointer to the next item, that is, B. Similarly, the second item contains data as A and a pointer to the next item, that is, C. The third item contains data as C, but its pointer is set to 0 (that is, null). This is because it is the last item in the list, and it has nothing more to point to. The null indicates this fact.

Because of this architecture, where an item points to the next item in the forward direction, all the items are linked together to form a list of items. Because of this property, such a linked list is also called as a simple linked list or a single linked list.

Linked list provides useful facility for storing data items in such a manner that the list can expand or shrink dynamically. For example, suppose that we need to store the details of employees in a linked list. It is possible that some employees resign, and new ones get added. Also, some employees who had resigned earlier may rejoin. Suppose we need to insert or delete records for these employees in/from the list as and when they join or resign.

What we could do in such a situation is to maintain a linked list of employee records. An employee record would contain the details of the employee, say the Employee ID, Employee name, and Department. Additionally, it would also contain a pointer to the next employee record in the list. Suppose that currently we just have three employees, whose records are shown in Table A.1.

Table A.1 Employee records

         Employee ID Employee Name Department
004 Sachin Statements
007 Mithoo PROMO
012 Javed AR

Then a possible linked list implementation of this situation is shown in Fig. A.2.

Fig. A.2Employee records implemented as a linked list

Let us now transform this list into a C language syntax. The result is shown in Fig. A.3. We have not shown the data values to keep things simple. We shall study this in subsequent examples.

Fig. A.3 Linked list in C

We would realise that several operations on this linked list must be allowed. We should be able to insert an employee at the beginning, in the middle, or at the end of the list. We should be able to modify the values of any employee record. Similarly, we should be able to search for, update, or delete employee records.

We shall not provide the C language code for such a linked list. Instead, we shall wait until we discuss Doubly linked lists.

Figure A.4 shows what happens if we want to insert a new employee record at the first position of the list. See how all records shift one position ahead.

Fig. A.4 Inserting new record at the beginning of the linked list

Figure A.5 shows what happens if we insert a new record in-between the existing linked list records.

Figure A.6 shows the effect of inserting a record at the end of the linked list.

Let us now consider the linked list as consisting of four records as shown in the last diagram. Figure A.7 shows the effect of deleting the last record.

Note that the deletion of items in a list is logical, and not physical. That is, the record to be deleted (that is, Radhika in this case) is not physically removed, but instead, the mere adjustments of pointer of Javed causes the removal of Radhika automatically from the list.

Now suppose we delete the middle record. The result is shown in Fig. A.8.

Fig. A.5 Inserting new record in the middle of the linked list

Fig. A.6 Inserting new record at the end of the linked list

Fig. A.7 Deleting record from the end of the linked list

Fig. A.8 Deleting a record from the middle of the linked list

Instead, if we had deleted the first record from the linked list, the result would have been as shown in Fig. A.9.

Fig. A.9 Deleting a record from the end of the linked list

In all these operations, we need to adjust the next pointer of the structure declared in C, manipulating it so that the items in the list are added or deleted appropriately.

A.2 DOUBLY LINKED LISTS

A doubly linked list, also called two-way linked list, consists of data as well as pointers to the next item, as well as the previous item. Remember that a single linked list contains just one pointer (usually in the forward direction). Figure A.10 depicts the concept of a doubly linked list.

Fig. A.10 Doubly linked list

As shown in the diagram, every item in the list has two links or pointers:

One link points to the next item in the list in the forward direction.

The second link points to the previous item in the list in the backward direction.

As before, the first item in the list (shown here as A) has nothing to point back to. This is because it is the very first item in the list. Because we are using the pointers in both directions, we have shown that the previous pointer of A points to 0, that is, null. We had not done so in the case of a single linked list, because there was no previous pointer at all. Similarly, the last item in the list (shown here as C) has no more items to point to. Therefore, the next pointer of the last item in the list contains a dummy value of 0. The other (previous and next) pointers point to the next or the previous item in the list, as appropriate.

Using two links rather than just one provides us with two main advantages:

1.

A single linked list has pointers only in one direction (which is usually the forward direction). Therefore, a single linked list cannot be traversed backwards, starting with the last item in the list. In contrast, a doubly linked list can be read in either direction, as follows:

(a) We can start with the first item in the list, and traverse up to the last item in the list in the forward direction. In the example, it means reading the three items in the order A-B-C.

(b) We can start with the last item in the list, and traverse it back to the first item in the list in the backward direction. In our example, it means reading the three items in the order C-B-A.

 

This property of doubly linked lists can be useful in practical situations. It not only simplifies the management of the list, but also can help in reading the records of a file or a database, if they are implemented as a doubly linked list.

2.

If there is some problem that causes one of the links (either forward or backward) to fail or to become invalid, that link can be constructed with the help of the other link, which is still intact. Note that this is not possible in the case of a single linked list, because there is only one list, and if it is corrupted, there is no way of re-creating it. Recall our discussion of maintaining pointers in both directions in the Library Management System.

A.2.1 Doubly Linked List Operations

In principle, there is not a major difference between creating a doubly linked list and creating a single linked list. But recall that now we need to maintain two links, in place of just one link. As a result, the basic organisation of a doubly linked list must allow for maintaining two links. We need to focus on the following operations in relation with a doubly linked list:

1. Inserting an item in the list

2. Deleting an item from the list

3. Displaying all the items in the list

4. Searching for an item in the list

The following simple C structure can be used for describing these operations:

struct linked_list

{

int emp_id;

char emp_name[30];

char department[10];

struct linked_list *next;

struct linked_list *previous;

};

A.2.1.1Inserting an item There are three situations in which a new item can be inserted into a doubly linked list:

1. Insert a new item at the beginning of the list

Figure A.11 shows the insertion of an item to the list at its end. We know that the original list was A-B-C. Because A is the first item in the list, the previous pointer of A is 0.

Fig. A.11 Insertion of a new item at the beginning of the doubly linked list

The item K is being inserted at the first position of the list. Consequently, the previous pointer of the item that was the first item in the list (that is, A) now points to K. Also, because this is a doubly linked list, the next pointer of K points to A. The remaining items of the list are unchanged. In other words, A points to B and B points to C. The next pointer of C continues to be 0, which means nothing.

2. Insert a new item in the middle of the list

Let us study the same original linked list, which consisted of three items, A, B and C. Imagine further that the new item K is now to be added in between A and B. We depict the result in Fig. A.12. As we can see, it consists of two mini-operations:

(a)

The next pointer of A now points to K. Additionally, the previous pointer of K now points to A.

(b)

The next pointer of K points to B. Similarly, the previous pointer of B points to K.

Note that the pointers between B and C remain unchanged.

Fig. A.12 Insertion of a new item in the middle of a doubly linked list

3. Insert a new item at the end of the list

This is perhaps the simplest of the three insertion possibilities. Here, the item K will be inserted at the end of the list, that is, after C. Thus, the next pointer for item C will not be 0 (null) now. Rather, it will point to K. By the same logic, the previous pointer of K will point to C. Last but not the least, the next pointer of K will now be 0. This is depicted in Fig. A.13.

Fig. A.13 Insertion of a new item at the end of a doubly linked list

A.2.1.2 Deleting an item The deletion of an item also has three possible situations, exactly similar to those of the insertion of an item to a doubly linked list. The item to be deleted can be the very first one, somewhere in the middle, or at the end of the linked list. We shall now examine these situations.

1.   Delete the first item from the list

Let us think about the original list, that is, A-B-C. In order to delete the first item (that is, A) from the list, we need to set the next pointer of A to 0, and the previous pointer of B also to 0. (Remember that A being the first item in the list, its previous pointer is already 0). After this operation, item B automatically becomes the first item in the list. This is depicted in Fig. A.14.

Fig. A.14 Deletion of the first item in a doubly linked list

2.   Delete an item in the middle of the list

The deletion of an item somewhere in the middle of a list is more complex than deleting the first or the last item of the list. This is due to the fact that here we need to have the adjustment of the previous and the next pointers of the two items that are adjacent to the item that we want to delete. In this case, from the list A-B-C, we are deleting item B. As a result, the following changes are needed:

(a)

The next pointer of A, which points to B at the moment, should now point to C.

(b)

The previous pointer of B is pointing to A. Instead, the previous pointer of C should now point to A.

In order to do this, both the previous pointer and the next pointer of B should be set to 0. This automatically causes the deletion of item B from the list. This is illustrated in Fig. A.15.

Fig. A.15 Deletion of an item from the middle of a doubly linked list

3.   Delete the last item in the list

The deletion of the last item in the list (that is, C in our example) is perhaps the simplest of all deletion operations. To perform this task, we need to reset the next pointer of B to 0, which normally points to C. This causes two operations to be performed: B is now the last item in the list; and C is automatically deleted from the list. Similarly, C's previous pointer should be reset to 0. This operation is depicted in Fig. A.16.

Because there is nothing to diagrammatically show them, we shall depict the other two operations (that is, displaying all the items in the list and searching for a specific item in the list) with the help of a C program. For simplicity, we shall automatically sort the list while inserting a new item in the list. That is, a new item in the list is automatically added in its right ordered location. For simplicity, we shall consider that the linked list contains just one data item: the employee ID.

Fig. A.16 Deletion of the last item of a doubly linked list

The program using doubly linked list is shown in Fig. A.17.

Fig. A.17 Doubly linked list program _ Part A

Fig. A.17 Doubly linked list program _ Part B

Fig. A.17 Doubly linked list program _ Part C

Fig. A.17 Doubly linked list program _ Part D

Fig. A.17 Doubly linked list program _ Part E

Fig. A.17 Doubly linked list program _ Part F

A.3 QUEUES

Queuing up for anything is generally disliked. A queue is very similar to the manner in which we (are supposed to) queue up at reservation counters or at bank cashiers' desks. A queue has two important properties:

A queue is a linear, sequential list of items.
The items in a queue must be accessed in the order First In First Out (FIFO).

In other words, the first item added to a queue must also be the first one to be retrieved. The second item added to a queue is also the second one to be retrieved, and so on. Finally, the last item to be added to the queue is also the last one to be retrieved.

Remember that (unlike queues of humans) we can neither store nor retrieve the items in a queue arbitrarily or in any random order.

Imagine that we are creating a queue of items. To do so, let us consider two very simple operations, write and read. The write operation inserts a new item at the end of the queue, whereas the read item reads one item from the queue (that is, the first item of the queue). If we go on using these two operations with a few insertions and a few retrievals, the resulting values of the queue would look as shown in Table A.2.

Table A.2 Queue operations

   Operation Contents of the queue after the operation
write (p) p
write (q) p q
write (r) p q r
            read (returns p) q r
write (s) q r s
            read (returns q) r s
            read (returns r) s
write (t) s t

As we can see, the operations on a queue are always sequential and are FIFO. Moreover, when one item is read from the queue, it is automatically destroyed. In other words, a read operation on a queue is destructive, unlike what happens with other data structures, such as linked lists. Therefore, if the programmer needs to implement a non-destructive reading of a queue, the values, as they are read (and therefore, automatically removed) from the queue, must be stored somewhere else for later access/retrieval.

In order to implement the write and read operations on a queue, two pointers, start and end are needed. One pointer (start) points at the current beginning of the queue (that is, at the item that is the first in the queue at that point of time). The other pointer (end) points at the current end of the queue (that is, at the last storage location available in the queue for a new item). To repeat, insertions into and retrievals from a queue, as discussed earlier, would have the effects on the pointers as shown in Fig. A.18.

Fig. A.18 Pointer movements because of queue operations _ Part A

Fig. A.18 Pointer movements because of queue operations _ Part B

Figure A.19 shows the C language implementation for the operations of a queue.

Fig. A.19 Queue program _ Part A

Fig. A.19 Queue program _ Part B

Fig. A.19 Queue program _ Part C

A.3.1 Circular Lists/queues

Interestingly, a list/queue can be circular. In such a case, it is called as a circular linked list or a circular queue. We may have got an impression that when the items from a queue get deleted, the space for those items is reclaimed. However, it is not true. Those queue slots remain empty. This problem is solved by circular linked lists, which are usually in the form of circular queues.

Put simply, rather than using a linear approach, a circular list takes the circular approach. In other words, items in a circular list are arranged to form a circular shape. This is also the reason why a circular list does not have a beginning or end at all! The representation of a circular queue is as shown in Fig. A.20.

Fig. A.20 Circular queue

What is the effect of making a queue circular, as against keeping it linear?

Well, the functions queue_store and queue_retrieve would have to be altered in the case of a circular queue, as shown in Fig. A.21.

Fig. A.21 Circular queue implementation

A.4 BINARY TREES

There is another interesting data structure by the name tree. Trees come in many forms and shapes (just like real trees!) However, the binary tree is a special case, because when it is in a sorted order, it allows us to perform quick search, insertion and deletion operations.

Note that trees in computer science grow from top to bottom, unlike real trees, which grow from bottom to top.

An item in any tree has two pointers or links: one to a left member, and another to a right member. This concept is shown in Fig. A.22.

Fig. A.22 Binary tree

In this tree, we say that d is the root of the tree. A data item located on the tree (here a through g) is called as a node of the tree. Any portion of a tree (e-f-g) is called as a sub-tree. A node that does not have any sub-tree is called as a terminal node. For example, a, c, e and g are all examples of terminal nodes, but b, d and f are not. The height of the tree is the number of the vertical node positions. In the example, the height is 3. Accessing the items in a tree is called the process of tree traversal.

Items of a tree can be retrieved, inserted or deleted in any order. Note that every sub-tree of a tree is itself a tree! Clearly, a tree is a recursive (that is, self-repeating) data structure, and therefore, recursive programming techniques are quite commonly used with trees. But it is not mandatory that tree-related programming must always be recursive. Non-recursive tree programming is also possible, but is harder to code and comprehend, as compared to the recursive code version.

A tree can be traversed in three ways:

In-order tree traversal: In this type of traversal, the left sub-tree is traversed first, followed by the root, and finally by the right sub-tree. Thus, the in-order traversal of our tree shown earlier would yield the sequence a-b-c-d-e-f-g.

Pre-order tree traversal: In this traversal, the root is traversed first, followed by the left sub-tree, followed by the right sub-tree. Thus, the pre-order traversal of our sample tree would yield the sequence d-b-a-c-f-e-g.

Post-order tree traversal: In this type, the left sub-tree is accessed first, followed by the right sub-tree, and finally followed by the root. Thus, the post-order traversal of our sample tree shown earlier would yield the sequence a-c-b-e-g-f-d.

Although a tree need not necessarily be in a sorted form, generally, all the practical applications require the use of a sorted tree. We shall also consider sorted trees. Moreover, we shall consider only in-order trees. Note that the other two tree types are simply variations of this type.

The actual view of a tree that has pointers or links between the various items is shown in the form of another tree in Fig. A.23. This view should be kept in mind while working with trees in computer programs. Note how the next or previous pointers have a value 0 for the nodes that do not point to other nodes in the tree, similar to the empty pointers in linked lists discussed earlier.

Fig. A.23 Another view of the same tree

We can represent each node in the tree in the form of a C structure, as shown in Fig. A.24.

Fig. A.24 Tree represented as a C structure

We can build a sorted binary tree as shown in Fig. A.25.

Fig. A.25 Building a sorted binary tree

Once the tree is created, it can be traversed in either of the three ways discussed earlier (in-order, pre-order and post-order). The in-order traversal logic is shown in Fig. A.26.

Fig. A.26 In-order retrieval of binary tree

Note that this logic is also recursive in nature. How would this program work? To understand this, refer to the same sample tree shown in Fig. A.27.

Fig. A.27 Sample binary tree

Let us understand the steps of execution.

  1. First, the program checks if the root exists. In this case, the root (d) does exist. So, the program proceeds further.
  2. Now, the function calls itself, passing the left node of itself (b). Since this is a recursive call, it would enter the in_order function once more, and call itself once again, passing the new left node of itself (a). This also being a recursive call, it will result into a call to itself passing the new left node of itself (null, this time). This call to itself will now cause the if condition (to check if the root is null) to become true. As a result, the recursion ends here. The functions are now executed in the reverse order (with values a and b). This will display a and b.
  3. Next, the following function call executes:
    in_order (root->right);
  4. This causes a call to itself with the value c now, so it will display c.
  5. At this stage, the traversal from d to the left sub-tree is over. So, the function will display d.
  6. Now, the right sub-tree of the root (e-f-g) gets displayed similar to the manner in which the left sub-tree was displayed earlier (by using recursive function calls). We will not elaborate on it, as it is exactly similar to the way a-b-c were displayed earlier.

Figure A.28 depicts the pre-order traversal.

Fig. A.28 Pre-order traversal of the tree

Similarly, the function for a post-order traversal would be as shown in Fig. A.29.

Fig. A.29 Post-order traversal of the tree