Chapter 2 – Introduction to Database Systems – Introduction to Database Management Systems

Chapter 2






Introduction to Database Systems

A Database Management System (DBMS) acts as the interface between data stored on the disk and its users. In that sense, it creates a boundary between data and its users.

Chapter Highlights

  • Overview of Database Management Systems (DBMS)
  • Need for DBMS
  • Basics of Structured Query Language (SQL)
  • Dynamics and Embedded SQL
  • Database Models

2.1 WHAT IS DBMS?

We have studied the concepts of data item (also called field), record and file. We have seen information regarding all the employees being stored in a file called “employee-file". The file contained many records, one per employee. Each employee record consisted of data items such as employee number, name and basic pay. Similar examples of files could be student-file, purchase-order-file, invoice-file and so on.

In typical business environments, it is always essential to be able to produce the right information at the right time with minimum efforts. Assume that a manufacturer of goods uses 10 such different files (one for suppliers, one for customers, one for accounts, etc.). It might not be very easy to answer a query such as:

How many of our customers have credit balance with us for over a month now and whose average purchases from December last year to February this year have been above average?

You can imagine the complexity involved in providing this information. It is not that such a report cannot be generated at all. It certainly can be produced, however, it will require a lot of effort. We shall see the reasons behind this and also study what better systems exist. More specifically, we shall study how a database is a better solution than a set of files. Also, a Database Management System (DBMS) scores over File Management System (FMS) on many counts. We shall explore the reasons behind this. Fig. 2.1 shows this evolution.

The DBMS has evolved over the years from being a simple means of arranging data to a much more sophisticated organisation and retrieval of data as and when required, in real-time. We shall study the different types of databases and understand the differences between them. Relational Database Management Systems (RDBMS), about which we shall study too, have become the most popular of them all for many reasons. Our aim is also to see how we can access data from a RDBMS using the Structured Query Language (SQL).

Fig. 2.1 Files, FMS, Databases, DBMS

2.2 FILE MANAGEMENT SYSTEMS (FMS)

We can imagine that a personnel department would use many files in addition to the employee file shown earlier. In fact, the early years of data processing was based on these ideas. Large organisations often had many end-users. Each end-user had a variety of tasks for the computer. For instance, one manager could request for a report of employees who joined after a certain date, another could need information about a specific department, the third could ask about employees in a particular department in a particular grade, and so on. Each such request meant that one or more programs had to be written. Moreover, since there would be many such files in an organisation (for instance, the attendance file, pay-slips file etc), a complex query would mean interaction with more than one file in a single program. Thus, even a simple request could take a few hours of programming effort. This was frustrating, although it was better than a manual system.

The other problem in using files was the tight locking of programs and files. For instance, if someone in the personnel department decided to add a field called Blood group to the employee file shown earlier, all programs interacting with the employee file would need a change to accommodate this new field. This was extremely annoying. Since writing new programs is time-consuming and changing existing ones is even more so, changes to a file-based system were approached with a lot of caution.

As the use of computers in an organisation grew, different departments needed to carry out their data processing jobs independent of each other. The obvious consequence of this was a lot of data duplication or data redundancy.

Data duplication or data redundancy means multiple (and possibly different) copies of the same data items.

This led to further related problems. For instance, suppose the personnel, payroll and administration departments have different employee files. This meant three problems:

1. Extra efforts to enter the duplicate data.
2. Additional storage requirements.
3. Different values for the same data items, also called as data inconsistency.

The last problem was the most serious one. For example, each department might follow a different convention for representing information. Thus, one may not know if John Jameson, Jameson J, and J Jameson referred to the same employee or three different people! This problem is shown in Fig. 2.2.

Interestingly enough, the duplication of information was more serious when not done! Imagine that J Jameson left the job. The personnel department would update its employee file. But if one of the other two departments does not, because of a misplaced memo, the consequences could be comical or even damaging. Jameson might continue to receive his salary, or be informed that his chair was being replaced with a modern-cushioned one!

Database systems became popular in the 1960s when IBM introduced the IMS and IDMS database systems. IMS is a hierarchical database, whereas IDMS is network.

Fig. 2.2 Data inconsistency

The file-based approach is shown in Fig. 2.3. As the figure shows, a variety of application programs interact with a number of files. They are tightly coupled with each other, so that a change in one file could necessitate a change in many programs as explained earlier.

Fig. 2.3 File-based approach

The most important problem was in the area of data consistency. Imagine that a customer invoice is entering the system. It will update customer record for the outstanding amount that the customer owes. It will also add a record to the invoices register or sales daybook file. It may also add it to the sales statistics file, which maintains product-wise, quarter-wise sales figures. Also imagine that after the system updates the customer record, and before it updates the other two files, the system gets switched off, say due to power cut. What should be done in this case? If the invoice is discarded altogether, the data will be inconsistent (first file updated, but the other two not updated). If, on the other hand, we allow the invoice to be reentered, all the three files will be updated, thus duplicating the updating of the customer record. This is a serious problem and FMS cannot solve it.

FMS suffers from the possibility of lack of data consistency.

DBMS solves such problems by defining a new concept called transaction in which all these updates are encapsulated. DBMS has a mechanism to execute the transaction entirely or not execute it at all. This principle of atomicity generates consistency, which is absent in FMS.

DBMS solves the problem of the possibility of lack of data consistency by using the concept of transaction.

We shall discuss these concepts later in great detail.

2.3 DATABASE MANAGEMENT SYSTEMS (DBMS)

Although using files was a satisfactory approach for small organisations and businesses, it was not quite easy to work with for larger establishments. Hence, a need for storing information centrally and using it as and when needed was felt. This would take care of the problems with files.

A scheme of grouping a set of integrated data files together is called as database.

As we shall see, the term integrated is extremely important.

A Database Management System is a set of prewritten programs that are used to store, update and retrieve a database.

The most important change brought about by DBMS is that the programs no longer interact with the data files directly. Instead, they communicate with the DBMS, which acts as a middle agency. It controls the flow of information from and to the database, as shown in Fig. 2.4.

Fig. 2.4 DBMS approach

If we compare this figure with the earlier one, the initial reaction might be that an extra layer of complexity has been added. However, this extra layer is not a cause for worry as it is completely transparent to the end user and, in fact, it helps.

As the figure shows, the files are integrated. This means that there is no duplication of data. Instead, all the files are stored together. They are managed by DBMS. In fact, the user or programmer does not even know about the files used by DBMS. DBMS internally uses data structures such as chains, pointers and indexes. However, the user or programmer need not worry about the internal details of how the data is stored such as whether it is on one disk or more, on which sectors, in a continuous pattern or in chunks, in what data structures (e.g. chains/indexes) and so on.

If the user wants to find all the invoices in which the value is > $500, DBMS can produce the result. It may use the indexes on invoice value to achieve this, or it may go through the invoices record sequentially. The user need not worry. Only the category of people called Data Base Administrator (DBA) need to know the details of data storage. This is because they are concerned with the performance and security aspects of DBMS.

This is how DBMS hides all the complexities involved in maintaining files and provides a common and simple interface. There is another interesting consequence illustrated in Fig. 2.4. In it, User 2 is interacting directly with the database, without needing to use an application program. This is possible since DBMS provides a set of commands for interacting with and manipulating the database. At the same time, User 1 wants to access/manipulate the database, which is not directly possible by using its set of DBMS commands. Therefore, the user's interaction is through an application program. The third case is a batch program that executes without a user, sitting and interacting with the program/database continuously through a terminal. The batch program executes on its own, once scheduled to run at a specific time. Thus, online (simple and complex) as well as batch data processing is easily handled by DBMS.

IBM's DB2 rules the world of mainframe databases. Oracle is immensely popular in the non-IBM world.

2.4 FMS VERSUS DBMS

Let us summarise the advantages offered by DBMS over FMS.

1. Quicker response – By using DBMS, in many cases, even end-users can prepare simple reports; extract data directly without the need to write an application program. This facilitates quicker responses to queries and also reduces a lot of unnecessary programming effort. This is not possible in the case of FMS. Extracting data from files requires the writing of programs. The idea is illustrated in Fig. 2.5.

Fig. 2.5 Report generation using DBMS and FMS

2. Data independence – One of the most significant features of DBMS that we have already discussed is that it frees the users and/or programmers from the responsibility of knowing the physical details of data. The users need not worry about the size, layout and storage of data. They can concentrate on what they want (that is, the logical view), rather than how to get it (that is, the physical view). The DBMS approach handles these internal details on behalf of the users.

The separation of physical and logical views of data is called as data independence.

Another example of data independence is the way the fields or data items are stored physically one after another in a record and the way they are presented to different programs. In fact, once the fields are defined, the program can create their own logical records by just stating the names of the data fields in different sequence. The DBMS internally extracts the required fields in the desired sequence and presents them to the application programs. Thus, different programs can have different views of the same record. The application programmer has only to bother about the logical record and not to worry about how it is stored. This lends adaptability to DBMS. Fig. 2.6 illustrates the idea.

Fig. 2.6 Data independence

Changes to a DBMS are far simpler than those to a file-based system. For instance, in this approach adding the Blood group field to the employee record contained in a database is very simple. In contrast to this, a similar change in a file-based system would be much more complicated. The main reason for this is, of course, that the application programs in a DBMS are written with the logical organisation of the database in view.

3. Security – The security and protection of sensitive information are very important issues. This is handled by a DBMS easily and efficiently. Permissions and access rights can be defined for one user or a group of users so that any unauthorised user cannot update or even read sensitive information. Implementing this needs a lot of effort in a file-based system.
4. Data sharing _ DBMS offers features that allow easy data sharing. This comes in four aspects/combinations, as shown in Table 2.1.

Table 2.1 Data sharing

                Application Data
                Old Old
                Old New
                New Old
                New New

In simple terms, old (i.e. existing) applications can share old or new data. Additionally, new applications can also use old or new data. File systems also allow data sharing. However, it is not as powerful or rich in terms of features as DBMS.

5. Data redundancy – We have discussed this earlier. DBMS provides good features so that the redundancy in data can be controlled or minimised, if not completely eliminated. Thus, data duplication is controlled.
6. Data consistency – We also discussed this point earlier. DBMS features allow us to make sure that data in a database are consistent. For example, let us assume that an employee E1 is working in department D5 is recorded at two places in a file (or in two separate files) and that the FMS is not aware of this duplication. (In other words, the redundancy is not controlled). In such a case, it may easily so happen that one of these entries is updated to say that employee E1 now works in department D8. However, the other entry is not modified. This leads to data inconsistency. This can be taken care of in DBMS.
7. Transactions – The concept of transactions is perhaps the most important aspect of DBMS. This simply does not exist in a file-based system, as discussed earlier.

A transaction is a set of related operations that must be performed completely or not at all.

For example, if 100 dollars are to be transferred from one account to another, debiting one and crediting the other are two activities that must be done to complete this transaction. If only one goes through, it will lead to chaos. The idea is shown in Fig. 2.7.

Fig. 2.7 Importance of transactions

For example, the money would disappear from the first account, but would not be credited to the other account. This is shown in Fig. 2.8.

Fig. 2.8 Problems due to lack of transactional capabilities

As a result of this half-successful operation, although account number 7890 would be correctly debited with USD 100, account number 1234 would not be credited with that much amount, leading to chaos.

DBMS supports the concept of transactions and thus allows data integrity. This is extremely important for any business application. The way this works is as follows.

Typically DBMS encapsulates all the database updates between two instructions Start-transaction and End-transaction. Start-transaction denotes the beginning of a transaction and End-transaction denotes the end of it. All the database updates between these two should be treated as atomic, that is, they are done completely or not done at all.

If the transaction goes through without any problems, it is committed (i.e. all the updates take place).
However, if it does not go through it, it is rolled back (i.e. all the updates done so far have to be undone before we can start again!).

In our earlier example, if step 1 (i.e. debiting an account) is successful, but step 2 (i.e. crediting the corresponding account) is not successful, then the complete transaction would be rolled back. That is, the DBMS would cancel the effect of step 1 as well. The data would thus assume the original state, that is, the one before the debit step happened. This is shown in Fig. 2.9.

Fig. 2.9 Concept of a transaction

Note that when step 2 fails, both the accounts retain the opening balances as the closing balances. That is, the effect of the debit step is cancelled automatically.

How is this done technically?

To be able to undo database changes, a log of all database changes is kept. The log contains a chronological list of database changes. For undoing a transaction (that is, for rolling it back), the log contains a copy of every database record before it was changed. Such records are called before image records.

Roll back: When a transaction needs to be rolled back, the DBMS simply reads all the before image records for that transaction from the log and applies them to the database.
Commit: For committing a transaction, of course, the DBMS might discard all the before image records – and also write them in a file for future reference.

For instance, consider our earlier example of transfer of 100 dollars from one account to another. When 100 dollars are taken out from one account, DBMS would write the record as it was - before subtracting 100 from its balance – to the log file, and then make changes to it in the database. Now if for some reason, there was some failure and the transaction was aborted, the DBMS has to simply take the before image record of the account and write it on to the database to take it to its original state. This will clarify that in principle, a transaction implemented though actual implementation may be different and more complex. The idea is shown in Fig. 2.10.

8. Concurrency – Suppose we go to a travel agent for booking a flight ticket from New York to Delhi. Imagine that only one seat is available. So, we place the order for the ticket. However, between the time we were told that the ticket was available and our order for the booking (only a few seconds might have passed between the two actions), someone else from Washington books the ticket. The airlines reservation system fails to detect this and issues tickets to both of us!

Surely, it is much better not to get a ticket than to find just before the departure that someone else has booked the same seat! This problem of allowing two or more users simultaneous access to common data and making sure that it remains correct (integrated) can be very frustrating in real life. Most file-based systems have simply no way of preventing such chaotic situations.

DBMS makes sure that two or more users can access the same database at the same time and that the data would remain in the correct state, by using a feature called concurrency.

This is achieved by locking. When one user is making changes to database, other users are not allowed even to view the data. This ensures that everyone gets the correct picture of the database. For this, DBMS simply locks, or makes unavailable, one or more records in the database to users other than the one who is changing them.

Fig. 2.10 Technicalities behind handling rollbacks

In the above example, when we inquire about the ticket with the intent of booking it, DBMS would internally lock the record. Thus, the other user from Washington is asked to wait until our transaction is over. If we book the ticket, that user is told that tickets are not available any more. However, if we do not book a ticket after our inquiry, that user is given a chance to book it. Of course, this is always done on a first-come-first-serve basis. Whosoever makes the inquiry first gets a chance to update the record.

2.5 AN OVERVIEW OF DATABASE MANAGEMENT

2.5.1 DBMS Basics

Simply put, a database system is a computerised record-keeping system with a lot of facilities. It is convenient to keep records and information in the form of computer databases rather than in manual systems. Fig. 2.11 shows the three DBMS models: Hierarchical, Network and Relational.

Relational DBMS or RDBMS is the most popular DBMS model.

Fig. 2.11 DBMS models

The term model refers to the way data is organised in and accessible from DBMS. These three models differ in a number of ways, as we shall study later.

We will concentrate on RDBMS and, as such, use an example from RDBMS to understand how users can access and manipulate databases. In RDBMS, data records (e.g. customer, student, book etc.) are stored on the hard disk by the operating system (O/S) such as UNIX and Windows 2000. The RDBMS interacts with the O/S and allows the user/programmer to view the records in the form of tables. Obviously, there is no such thing as a table on the hard disk.

A table is a two-dimensional view of the database.

Suppose I have some books in my collection and I have used a computer database to keep a track of these books in my library, since I intend to purchase many more. (We shall see how to create a database subsequently.) For now, let us assume that the database exists. To see which books exist in my library, I can type a simple command on my computer:

          SELECT * FROM Book

We shall look at the details of this command soon. But for now, let us see what happens as a result of this command. The computer comes up with the information I was looking for, on the screen, as shown in Table 2.2.

Table 2.2 Effect of a SELECT command

As we can see, the information is arranged in the form of a table. So, this table – the Book table – represents a complete listing of all the books I have in my library. Let us dissect the command I had typed:

SELECT Informs the RDBMS that I want to select (or retrieve) something from some table.
* Indicates that I want to select all information from the table.
FROM Specifies that the table name would follow now.
Book This is the name of the table from which I want to retrieve data.

Thus, when I want to retrieve any data from any table, I should use a SELECT command. Actually, it is called as query in RDBMS terms, and henceforth, we shall also follow that practice.

Now suppose I had decided that tonight, just before going to bed, I would read a book on home improvement. However, I was not sure which book I wanted. So, I enter another simple command:

         SELECT * FROM Book WHERE Subject = ‘Home improvement’

The computer responds with the result shown in Table 2.3.

C.J. Date has done pioneering work in the area of RDBMS and has demystified the RDBMS principles like no one else. The contribution of E.F. Codd is unmatched in the history of RDBMS.

Table 2.3 Effect of a SELECT…WHERE query

Compare this table with the earlier one. You will notice that by adding a WHERE clause, we have filtered our list. The computer does not show the entire table now. Instead, it shows only the records I want. So, it restricts the output and displays books belonging to the Home improvement category of subjects. It is not only possible to restrict the number of rows displayed but also the columns. For example, I enter a command:

SELECT Title, Author

FROM Book

WHERE Publication = ‘AMC’

The computer displays the result shown in Table 2.4.

Table 2.4 Effect of a SELECT…WHERE query with column selection

Title Author
                            Computer Fundamentals AM John
                   Operating Systems Peter Parker
        Home PC PJ Dilbert

We will notice that the computer displays only the book title and author, as requested. Also, it restricts the output to only books published by AMC.

The question now is – who is doing all these filtering of rows and columns based on the conditions I specify? Of course, it is DBMS! DBMS allows me to enter extremely simple commands which I can learn and experiment with quickly. I need not have any formal computer knowledge. I can use simple command in English to see the contents of my database. The DBMS, thus, simplifies the tasks of a common user.

2.5.2 Internal Process

Of course, DBMS has to carry out a number of tasks in order to present a very friendly face to the end user. The relationship between the hardware (H/W), the operating system (O/S), the DBMS and the application program (AP) is shown in Fig. 2.12. Of course, the AP is optional, and is not required in the case of simple manipulations like the one illustrated earlier. It is needed when the user wants to perform more complex tasks, which cannot directly be done by typing queries.

Let us imagine that the Results application of a school has an AP to display records of students who have scored marks > 70%, in the alphabetic order. We will assume that a database of student records exists (with an index on percent age of marks maintained by the DBMS internally for faster access). The actual data records and the index records are stored on the hard disk, on various sectors, by the operating system. The O/S keeps track of the sectors allocated to various files of different users and also the sectors that are free. The O/S also provides some access rights denoting who can read or write on which records. Thus, it is only the O/S, which interacts with the H/W to ensure security.

Fig. 2.12 Relationship between H/W, O/S, DBMS and AP

The DBMS builds various data structures like chains/pointers on top of these. Thus, in our example, the user will execute the AP. When the AP needs the specific records (students with marks > 70%), it will make a request to the DBMS. The DBMS will go through the data structures to select the records to be read. After this, the DBMS will make a request to the O/S to read those specific records. The O/S will instruct the H/W (i.e. disk device drivers) to read those records in the memory. The DBMS will present those records to the AP. The AP then can print/display them or perform any calculations or any other action on them as required.

2.5.3 Tables, Rows and Columns

In our example, without stating too much, we used two more terms: row and column.

A table is a set of related pieces of information stored in rows and columns.

Information, when represented in the form of such rows and columns is called a table in database terminology. Conceptually, a table is similar to what we have been calling a file, so far. Similarly, a column is similar to a field, and a row to a record. These ideas are illustrated in Fig. 2.13.

Fig. 2.13 Equivalence between FMS and DBMS

Fig. 2.14 shows the corresponding terminology.

Fig. 2.14 Terminology for FMS and DBMS

Henceforth, we shall use the new terms just introduced. Thus, we can conclude the following:

At the intersection of every row and column, there is one value of interest.

Another important point to note is that many related tables make up a database. This is similar to how many files make up the data portion of a FMS. The concept is illustrated in Fig. 2.15.

When information can be presented in the form of tables in a database, regardless of how it is stored, the arrangement of data is called a relational database. The term comes from the fact that data columns are related to each other by virtue of their values, and nothing else.

EF Codd first introduced the relational databases in 1970s.

The concept of relational database was based on EF codd's research work on the set theory in relation to computers. Relational databases have gained immense popularity and have captured a large market share mainly due to their simplicity, apart from other standard database features.

Fig. 2.15 One database contains many tables

2.5.4 SQL and its Power

The language that we use to extract data from a relational database is called Structured Query Language (SQL). Thus, SELECT is a statement in SQL. SQL belongs to the category of Fourth Generation Languages (4GLs). We can produce a list of all the books in a library using a C program, as shown in Fig. 2.16.

Fig. 2.16 C program equivalent to one SELECT statement

The corresponding COBOL program is shown in Fig. 2.17.

Fig. 2.17 COBOL program equivalent to one SELECT statement

Our SQL SELECT statement does the job in one line! Thus, SQL is like a grown-up child who understands an instruction such as brush teeth and does not need to be told in detail about how to do it (such as go to wash basin, take the brush, and apply toothpaste). In simple terms, it expects us to tell what to do. It will decide how to do it! Thus, we are not to bother with opening and closing of files, reading one record at a time, filtering and other such issues.

SQL is, therefore, extremely powerful. It requires only very high-level commands. One SQL command translates into many Third Generation Language (3GL) statements, as illustrated above. We can show the equivalence between SQL, 3GLs such as C/COBOL and Second Generation Languages (2GLs) such as assembly language, as illustrated in Fig. 2.18. Here, we show a simple instruction provided to some children. If the child is grown up, a simple instruction such as brush your teeth would suffice. However, if the child is younger, then we need to provide slightly more detailed instructions — similar to the way we provide instructions in C or COBOL. Finally, if the child is very young, then we must provide very detailed instructions — similar to the way we write assembly language programs.

We should note that SQL is not a product. It is a standard language that many products such as DB2, Oracle, Ingres, Informix, Sybase, SQL Server and MS-Access have adopted. These products are called Relational Database Management Systems (RDBMS). In the following sections, we shall study the basics of SQL in brief.

Fig. 2.18 Comparison between SQL and other languages

2.6 BRIEF INTRODUCTION TO SQL

SQL is concerned with three most common tasks needed by any user of a database management system, namely: data definition, data manipulation and data control. SQL allows us to define databases and tables, add/update/view their contents and allows security features. More specifically, SQL commands can be divided into three main categories: Data Definition Language (DDL), Data Manipulation Language (DML) and Data Control Language (DCL). These categories are shown in Fig. 2.19.

Fig. 2.19 Classification of SQL commands

Let us study each of these.

2.6.1 Data Definition Language (DDL)

Data definition language is concerned with the definition of data

More specifically, it allows us to create databases, tables, and indexes. Furthermore, it allows changes to these objects and also their deletion. For instance, to create our Book table, we can use the Create table command provided by SQL as follows:

CREATE TABLE Book

(Id        INTEGER,  
  Title        CHAR (30),
  Author        CHAR (20),
  Publication        CHAR (30),
  Subject        CHAR (20))

We specify the name by which the table should be referred - in this case Book. The columns that form the table are specified within the brackets. Along with every column we specify the type and the size of data allowed for that column. For example, the title can contain up to a maximum of 30 characters, whereas the Id is an integer. Note that we have not specified any length for the Id. For integers, the DBMS uses its own pre-specified length.

After executing the Create table statement, only an empty structure (called as the template) of the table is created, as shown in Table 2.5.

Table 2.5

No information about any book exists at this point. All we are telling the DBMS is that there exists a table called Book in which there are five columns as specified. The DBMS stores this information in a special area called as database catalog.

The catalog is a special area used by the DBMS to maintain information about various tables, their inter-relationships, etc.

Therefore, database catalog is a form of meta-data, that is data about data. Whenever information on any new book is to be added to the Book table, the DBMS will use the catalog to obtain the details of the table, such as columns, their sizes, data types and so on.

If we need to delete this table in the future, a single line will do the job:

DROP TABLE Book

The DBMS has to internally update its catalog with the information that no Book table exists now. This is hidden from a user. From a user's perspective, the Book table no longer exists.

Other simple things such as adding a new column are allowed. For example, if we decide to add a new column — Price — to our Book table, all we need to do is:

ALTER TABLE Book

ADD Price INTEGER

As a result, the structure of the Book table is modified to the one shown in Table 2.6.

Table 2.6 Modified Book table structure

Indexes allow faster access to data as studied earlier. In SQL, an index can be created as follows:

CREATE INDEX BookIndex

ON Book

(Id)

What we are asking is to, create an index on the Book table and name it as BookIndex. Further, the column on which the index is to be created is the book's Id. This will enable faster access to books when retrieved using the Id (e.g. display all books with Id between 20 and 30). We can create an index on any column, or indeed, any combination of more than one column of a table. The index can be in ascending or descending order of column values.

Interestingly, the DBMS might do this by creating chains, pointers, indexes and their combinations. We have already seen how the librarian maintained chains and indexes manually for faster query responses. A similar mechanism is used by DBMS, but it is hidden from the user. The DBMS manipulates the chains and indexes as and when required, internally.

2.6.2 Data Manipulation Language (DML)

2.6.3 Select, Insert, Update and Delete

We have already seen some examples of data manipulation. The SELECT statement allows viewing of the contents of a table in a variety of ways. All or only a few columns, rows or their combinations can be selected, based on the search criteria. SELECT * gives everything: all the rows and all the columns of a table. The WHERE clause does the job of restriction. Only the rows satisfying one or more conditions can be retrieved by using it. SELECT allows selection of one or more columns. For example,

SELECT Name, Author

FROM Book

WHERE Author = ‘P JONES’

This is a very simple data-filtering example. More sophisticated search criteria can be specified. For this, we will use an example for the rest of the discussion.

SQL is a very simple language. It allows manipulation of RDBMS data with great ease and minimum learning curve.

Consider a Student table containing details about a student and the marks obtained in various subjects. Let the columns of this table be: Name, Math, Science, Languages, Total, Average and Result. Assume that the Total and Average columns are expected to contain the total marks obtained in the three subject categories and their average, respectively. If the student has obtained less than 35 marks in any of the subjects, the result should be Fail; otherwise it should be Pass. At the moment, the table looks as shown in Table 2.7.

Table 2.7 Student table – Initial contents

Note that the Total, Average, and Result columns are currently blank. We shall now carry out a variety of operations on this table, to make ourselves familiar with the various aspects of SELECT statement. It will also help us understand the other data manipulation statements.

1. Obtain a list of students who have failed in Math.
SELECT *
FROM Student
WHERE Math < 35
Output: See Table 2.8.

Table 2.8 Students failing in Math

2. Display names of students who have passed either in Math or Science.
SELECT Name
FROM Student
WHERE Math > 34 OR Science > 34
Output: See Table 2.9.

Table 2.9 Students passing either in Math or in Science

Name
P Jones  
M Henry
J Man    
A Bryan  
W Fringe 
A Lawson
B Marsh  
C Lloyd  
3. Display names of students who have passed in Languages and also in either Science or Math.
SELECT *
FROM Student
WHERE (Languages > 34) AND (Math > 34 OR Science > 34)
Output: See Table 2.10.

Table 2.10 Students passing in Languages and in either Science or Math

Note that C Lloyd's record is not displayed because she has failed in languages.

4. What is the average Math score?
SELECT AVG (Math)
FROM Student
Output:
60.11

Note that SQL provides many in-built functions such as AVG or MAX, which perform these operations on the specified columns of the table. It should be obvious by now that it is possible to query the Student table in a number of ways: almost any requirements can be satisfied with the help of simple SELECT statements. There are many more features of SELECT, which we shall ignore for the moment.

5. What is the maximum score in Languages?
SELECT MAX (Languages)
FROM Student
Output:
80

The question now is, we have been selecting data from the Student table, but who entered it there in the first place? For this, SQL provides an INSERT command. For example:

INSERT INTO Student

(Name, Math, Science, Languages)

(‘P JONES’, 68, 90, 72)

An INSERT statement will insert one row at a time. We have not manually calculated and entered the values for total marks, average and result. If we were to do that, the purpose of using a computer for database management would be defeated! How can we calculate the totals? For that, another DML statement is required:

UPDATE Student

SET Total = Math + Science + Languages

Result: See Table 2.11.

Table 2.11 Result of calculating total marks

This would update the Student table with the total marks calculated as a sum of the three subjects. Further, we can find out and store the average marks:

UPDATE Student

SET Average = Total / 3

Result: See Table 2.12.

Note that Average is a column in our table, whereas AVG is a build-in SQL function. These two are completely different things.

Table 2.12 Result of calculating averages

Now, the result for the passed students can be determined:

UPDATE Student

SET Result = ‘Pass’

WHERE Math > 34 AND Science > 34 AND Languages > 34

Result: See Table 2.13.

Table 2.13 Determining which students have passed

Finally,

UPDATE Student

SET Result = ‘Fail’

WHERE Math < 35 OR Science < 35 OR Languages < 35

Result: See Table 2.14.

Table 2.14 Determining which students have failed

This would lead to further queries.

1. List the students in the order of merit list.
SELECT *
FROM Student
ORDER BY Average DESC
Output: See Table 2.15.

Note that ORDER BY is a special clause, which allows us to sort data in the ascending/descending order of a column. In this case, we have specified that the sorting should happen on the basis of the average marks column. Furthermore, a DESC word in the end specifies that the sorting should happen in the descending order of average marks.

Table 2.15 Listing students in order of merit

2. How many students have passed?
SELECT COUNT (*)
FROM Student
WHERE Result = ‘Pass’
Output:
6
3. Which students have failed in more than one subject?
SELECT *
FROM Student
WHERE (Math < 35 AND (Science < 35 OR Languages < 35)) OR
               (Science < 35 AND (Languages < 35 OR Math < 35)) OR
               (Languages < 35 AND (Math < 35 OR Science < 35))
Output: See Table 2.16.

Table 2.16 Students failing in more than one subject

Suppose we now want to retain only data for students who have passed the examination. A DELETE statement would do the job:

DELETE FROM Student

WHERE Result = ‘Fail’

Result: See Table 2.17.

Note that we can specify which records are to be deleted. If we do not specify any WHERE clause, that is just say DELETE FROM Student, then SQL will delete all rows from the Student table, making it empty.

Table 2.17 Retaining records of passed students only

It should be clear that Data Manipulation Language is extremely powerful. Only four basic statements, namely SELECT, INSERT, UPDATE and DELETE, allows almost any operation on the database.

2.6.4 Multiple Tables and Joins

A database usually comprises of many tables. For example, suppose some other information on the students is available in another table called as StudentInfo. It contains the name, address, phone number and date of birth of the students, as shown in Table 2.18.

Table 2.18 StudentInfo table

Now we wish to call the students up and inform them about the result. For this, we need the student names, averages, results and phone numbers to be displayed together. However, the phone number appears in a separate table, as noted earlier. Thus, we need to combine two tables. This process is called joining one table to another. The query would take the form:

SELECT Student.Name, Student.Average, Student.Result, StudentInfo.Phone-Number

FROM Student, StudentInfo

WHERE Student.Name = StudentInfo.Name

Resulting output: See Table 2.19.

Table 2.19 Result of joining two tables

The above statement mentions two table names in the FROM clause. Thus, data is being selected from two tables and compiled in one. As usual, the SELECT statement lists the columns to be selected. Note that the column is prefixed with the table name (as there are two tables now) with a period (.) in-between. For instance, Student.name is the name from the Student table. The main logic is in the WHERE clause. It specifies that the items where the names match are to be displayed. In other words, for a row in the Student table, only one row that has the same name in the Studentinfo table is to be selected. This means that only one row per student is to be displayed. In the absence of this clause, all rows of the Student table would be joined to all rows of the Studentinfo table! That is, it would result a display of 10 × 10 = 100 rows!

Joins are extremely powerful, but can demand a lot of resources, and thus be slow. Hence, they should be used judiciously.

2.6.5 Nested Queries

Sometimes, we might need to write one SELECT statement inside another. This is called a nested query.

Fig. 2.20 illustrates the concept. The outer SELECT is called as the outer query, whereas the inner SELECT is called as the inner query, or the sub-query.

Fig. 2.20 Nested query

For instance, suppose we want a list of students whose total marks are greater than the overall average. We would write a nested query like this:

SELECT Name, Total Outer query

FROM Student

WHERE Total >

         (SELECT AVG (Total) Inner query (sub-query)

         FROM Student)

The query works like this:

1. The inner query or the sub-query is executed first, and exactly once. Thus, the average of totals for all the students is calculated (which is 187.33).
2. Next, the outer query is executed. For every row of the Student table, the inner query is tested once.

Thus, the query takes the form:

SELECT Name, Total

FROM Student

WHERE Total > 187.33

Output: See Table 2.20.

Table 2.20 Result of executing nested query

Name   Total
P Jones 230
M Henry 267
W Fringe 223
A Lawson 198

Since we have one query within another, this is an example of two-level nested query. In practice, nesting is rarely required beyond three levels.

2.6.6 Data Control Language

Having understood how to create tables and access/manipulate data in them, we must know how to give access to and revoke accesses from various tables to different users. This is extremely important in case of a multi-user database system. We can achieve this with the use of Data Control Language.

Suppose there are two persons working in the personnel department: John and Peter. Only they should have access to the Employee table. We can secure access by using the GRANT and REVOKE statements provided by SQL as follows. First, we remove all accesses to the table by everybody.

REVOKE ALL ON Employee

FROM PUBLIC

Note that ALL means the ability to perform any DML operation such as SELECT, INSERT, UPDATE or DELETE. Thus, we do not allow anyone (because we specify PUBLIC) to perform any operation on the Employee table. In other words, PUBLIC means everybody.

At this stage, no one can do anything with the Employee table. Now, we shall allow only John and Peter the necessary authority to access then Employee table.

GRANT ALL

ON Employee

TO John, Peter

This would allow John and Peter to perform any DML operation (namely SELECT, INSERT, UPDATE and DELETE) on the Employee table. Suppose at some stage, Peter is on leave and a third employee, David, temporarily takes his place. We need to provide only SELECT access to David on the Employee table. This can be done as follows.

GRANT SELECT

ON Employee

TO David

David cannot update the information in the Employee table in any manner. He can just see what it contains.

Thus, SQL enables allowing or taking back accesses to/from sensitive pieces of information very easily. This enforces security and assures the confidentiality of information.

2.7 EMBEDDED SQL

2.7.1 Embedding SQL Statements inside 3GL

We have mentioned that SQL can be used by end users to produce ad-hoc reports and also by application programs. The examples we have seen so far deal with the former.

When we embed SQL statements in an application program, it is called an embedded SQL.

The programming language in which we write SQL statements is usually a 3GL such as COBOL or C. The 3GL is called the host language. The reasons we need embedded SQL are:

Online transaction processing involves formatting of screens for easier data entry and validations. This ensures database integrity. Interactive SQL does not support these features.
Many applications need batch processing. In this, data are collected together and processed in groups called as batches. Such applications are normally developed by using a host language.

The most interesting aspect of this is that as we know, SQL deals with an entire table at a time. For example, when we issue a SELECT statement, it goes through the entire set of rows in one go and comes back with a result. On the other hand, a 3GL deals with one record at a time. Embedded SQL programming also deals with this issue. Therefore, we need a mechanism by which the entire data returned by a SQL query is examined one row at a time, validations, if any are performed and the row is processed. Then, the second row is examined, validated and processed, and so on. The concept is illustrated in Fig. 2.21. Note that we have a C program, which includes SQL statements embedded inside it.

A question that arises is, how would we know whether a statement belongs to the C portion of the code or the SQL portion? Of course, we would know by looking at the statement itself. For instance, the moment we see a SELECT statement, we know that it is an SQL statement. Similarly, we know that if is a C statement. However, to remove the scope for any ambiguity, an SQL statement in an embedded SQL program needs to be delimited inside special boundaries. These take the form EXEC SQL and END-EXEC verbs (of course, these differ from one language such as C to another such as COBOL. In C, it is just a semicolon).

Fig. 2.21 Embedded SQL

2.7.2 Embedded SQL Program Lifecycle

The lifecycle of an embedded SQL program is pretty interesting. We know that a normal C program is compiled and linked before it can be executed. However, an embedded SQL program needs to be pre-compiled before it can be compiled. The difference is shown in Fig. 2.22.

Fig. 2.22 Life cycle of a 3GL program and an embedded SQL program

What is the significance of the pre-compiler and the pre-compilation step? We have noted earlier that any SQL statement inside an embedded SQL program must be inside the boundaries of EXEC SQL and END-EXEC verbs. The pre-compiler precisely looks at these blocks, contained within EXEC SQL and END-EXEC verbs. It ignores everything else. The pre-compiler translates the SQL statements inside these blocks into the appropriate 3GL statements (e.g. C or COBOL, depending on which 3GL is being used). The idea is illustrated in Fig. 2.23.

Fig. 2.23 Effect of pre-compilation

Thus, after pre-compilation, the program does not contain any SQL statements at all. It becomes a 3GL program in the concerned language. This program is passed on to the compiler of the language, which then translates it into the appropriate object code, like any other 3GL program. This is followed by the linking step as usual.

Because the pre-compiler translates the SQL statements inside an embedded SQL program into the appropriate 3GL statements, the compiler of the language need not change at all. We can summarise this as:

The pre-compilation stage translates the SQL statements inside an embedded SQL program into appropriate 3GL statements.

2.7.3 Cursors

We have mentioned that SQL deals with multiple rows at a time, whereas 3GLs deal with one record at a time. This means that if we embed SQL statements inside a 3GL program, which return multiple rows from a table, a 3GL program would not be able to handle them. Such a problem is called impedance mismatch and is illustrated in Fig. 2.24.

Fig. 2.24 Impedance mismatch

Clearly, this is a problem. If a SQL SELECT query hands 200 rows over to a C program, what does that C program do with them? After all, it can process only the first of these 200 records!

A cursor is an important concept in this context. Note that it is completely different from the term cursor associated with the pointer on the screen of the computer, which captures the current row and column position. A database cursor solves the problem of impedance mismatch. It acts as a filter between the result of a SQL query and the 3GL statements that process this result. It is like a buffer. It accepts multiple rows from a SQL query, stores them, and hands them one-by-one to the 3GL program, as and when required. This is shown in Fig. 2.25.

Fig. 2.25 Cursor

In principle, this is similar to the way a spooler works. A spooler allows a user to send a big document for printing and then to continue working on something else. The user need not wait for the printer to print the whole document. It is the spooler which holds the document in its memory while the printer gets ready and accepts the document for printing.

2.8 DYNAMIC SQL

In embedded SQL, the SQL statements to be executed are known at the time of coding the program statements. This is called as static SQL. However, in some cases, the SQL statements to be executed are not known when the program is written. They might be constructed based on user options or, in fact, be written by the user directly in the form SELECT – FROM – WHERE. Such SQL queries, which are not known prior to their execution, form dynamic SQL.

Dynamic SQL offers the flexibility of writing SQL statements during the program's execution.

However, this comes with two main drawbacks.

1. In case of static SQL, the SQL statements are pre-written. Therefore, if there are any errors in the SQL statements, the pre-compiler points them out. In case of dynamic SQL, the statements are pre-compiled, compiled and executed directly, like an interpreter. So, the errors would also be shown at run time.
2. Static SQL allows the DBMS to do optimisation (discussed later). For example, the DBMS may choose to use a particular index to make the SQL statements execute faster. In the case of dynamic SQL, there is no such provision. Therefore, the SQL statements may not choose the best possible execution path.

Dynamic SQL SELECT statements take the form:

EXECUTE IMMEDIATE <SQL statement>

The EXECUTE IMMEDIATE portion tells the operational environment that the statement belongs to dynamic SQL category.

For example, consider the following dynamic SQL statements. Here, we store the dynamic SQL INSERT statement into a variable called My_statement, and pass that variable at the time of execution of dynamic SQL.

My_statament = ‘INSERT INTO Student VALUES (10, ‘Cryptography’, ‘Atul’, ‘THM’, ‘Security’)’

EXECUTE IMMEDIATE:My_statement

Dynamic SQL is useful in the case of online applications. In such cases, it may not be possible every time to anticipate the kind of queries that the user wants to execute. The user may want to construct them at execution time and execute them immediately. For instance, in a shopping cart application on the Internet, the user may choose to buy 0, 1 or 10 items. In such cases, dynamic SQL would facilitate an elegant creation of a query to process these many items. It may not be possible to achieve the same result using embedded SQL.

There is a variation of the dynamic SQL scheme. We can treat an SQL statement as a temporary object and create it repeatedly. The advantages offered by this approach are as follows:

We can write queries.
Performance can be far better. This is because the statement is parsed, validated and optimised far ahead of execution time.
Using parameters, the same statement can be executed in various ways.

In this model, two statements are required: PREPARE and EXECUTE.

The PREPARE statement creates an SQL object that contains the SQL statement to be executed. When this statement is executed, the DBMS examines, interprets, validates and optimises it. It is stored until an EXECUTE (and not EXECUTE IMMEDIATE) statement is encountered. At this stage, the statement is actually executed. This approach is contrasted with the earlier (EXECUTE IMMEDIATE) approach in Fig. 2.26.

Fig. 2.26 Dynamic SQL approaches

2.9 DBMS MODELS

We have mentioned earlier that there are primarily three DBMS models: hierarchical, network and relational. We have seen relational databases (RDBMS) in operation. We shall not discuss the others in detail as they were developed earlier and they are rarely used today. Only a few legacy old systems use them. If we have to maintain them, we have to understand them and that is the only reason that we shall study them.

In these two models (i.e. hierarchical and network), the relationships between records has to be pre-defined. For instance, if one doctor treats many patients, there is said to be a 1:m (1 to many) relationship between these two record types. The doctor record is called a parent record and the patient record the child record. Having established these parent and child records, links are created such as next patient for the same doctor as others, which we have studied earlier.

For the pre-decided queries such as List all patients for doctor XYZ, this is, in fact, faster. That is why these models were popular. The hardware speeds for RDBMS, which resolves the queries at run time, were not very high. RDBMS really became popular when hardware speeds and response times improved. We will now present a quick overview of all the three models.

2.9.1 The Hierarchical Model

Fig. 2.27 shows a database containing information about students, the courses they have opted for and the marks they have obtained in them. In short, each student can opt for many courses and is given marks in each one of them. Hence, there are three types of record here: students, courses and marks.

Fig. 2.27 Hierarchical model

The student record, which is at the top of the hierarchy, is called the parent or root. The course-marks record is a child of the student record type. Hence, we have one parent and one child record. In general, we can have as many child records as needed. Furthermore, each child record can, in turn, have many child records, and so on. However, given a child record type, we can have only one parent. Hence, the hierarchical model is based on the principle of one-to-many record types. Let us discuss the operations such as retrieving, inserting, removing and deleting records from such a database.

2.9.1.1 Retrieval There are two main ways to study this. Fig. 2.28 shows the first approach.

Fig. 2.28 Record retrieval from hierarchical databases – Approach 1

Let us understand how this query would work. The processing would begin with the parent record, that is student. First, a record containing a value of student code = S901 is searched in the database. For this, the database would be read sequentially, starting with the first student record. When a match for S901 is found, the course-marks (children) records for this student are searched. For each such course-marks record, the program displays the course code, name and marks obtained on the screen. This process is repeated for all course-mark records for that student.

The second approach is illustrated in Fig. 2.29.

Fig. 2.29 Record retrieval from hierarchical databases – Approach 2

Let us now look at this query. The processing would again begin with the parent record. Actually, the student record is not of importance in terms of search. The search criterion is based on the course code. However, the database would be read sequentially, starting with the first student record. For each student record, the course-marks (children) records for this student are searched. For each such course-marks record, the course code is compared with C20. If such a record is found, the student code, name and marks obtained are displayed on the screen. This process is repeated for all student records in the database.

We will notice that although the two queries are similar, the logic used to resolve them is entirely different. This comes from the fact that the database is arranged in a parent-child fashion. The logic would be reversed if student were the child record type of course-marks record.

2.9.1.2 Insert Since there cannot be a child record without a parent record, we simply cannot have a course-marks record without a student record. So, if a new course is introduced, its details cannot be entered until at least one student opts for it.

2.9.1.3 Delete The only solution for deleting marks is to delete the course-marks record. If we delete all the records of a particular course during deletion of marks, we lose the information that such a course exists. Similarly, if only one student has opted for a course and we delete that student's record, we lose information about the course, too!

2.9.1.4 Update To change the course title for course C30 from Languages to English, the entire database needs to be searched for course C30 and appropriate changes made.

Hence, we can conclude that because of the way the one-to-many hierarchy is designed, there are inherent problems associated with such databases. The data manipulation is not easy and might lead to problems. IBM mainframes offer IMS, which is the most popular hierarchical database product.

2.9.2 Network Model

The network model is different from the hierarchical model in one important respect. Unlike the hierarchical model, here we can have many-to-many relationships. This means that a parent can have many children, and one child can belong to more than one parent. The network model for the same example is shown in Fig. 2.30.

Hierarchical and network database models are almost obsolete. However, many legacy applications rely in them heavily. Threrefore, we need to learn hierarchical and network databases from a practical point of view.

Fig. 2.30 Network model

Here, the database consists of links (also called connectors). The marks connect the students with the courses they have opted for. Note that the link records are not mere pointers. They are themselves a record – that is, they contain data values. Additionally, they also own the responsibility of connecting parent records with children records. The link records have an interesting property that differentiates network databases from hierarchical databases. The link records allow a child record to exist without any parent record. This is not possible in case of hierarchical databases. The way this is achieved is that a child record has a link record (note that there is no parent record). The other pointer of the link record then points back to the same child record – thus not requiring that it have a parent!

The network model is certainly more complex than the hierarchical model, and it can represent any structure that the hierarchical model represents. Let us look at the various operations to be performed:

2.9.2.1 Retrieval Fig. 2.31 shows the first retrieval operation.

Fig. 2.31 Record retrieval from network databases – Approach 1

This query works exactly in the same manner as in the case of a hierarchical database. The only difference is the addition of the link records. Rather than looking for a child record directly from a parent record, here we first traverse to the link (marks) record and use that to obtain the child record (course). The rest of the query works in the same fashion and we will not describe it. Fig. 2.32 shows the other query.

Fig. 2.32 Record retrieval from network databases – Approach 2

Interestingly, this query starts with the child record type (course). Remember that in case of hierarchical databases and even in this kind of query, the execution began with the parent record (student). However, because network databases allow you to start with a child record and then find its parent, the query is more natural. It starts with the course record, rather than the student record. For each course record, it checks the course code. If it matches with the required value (C20), it gets the parent record – that is student record – by using the link record. It then displays the student code and marks. This process is carried out for all course records that belong to course C20.

As we can see, the two queries can be resolved using a very similar logic. This is because the database can be accessed from top to bottom of the hierarchy in the same fashion as from bottom to top.

2.9.2.2 Insert There are no problems in inserting a student record without a course or a course without a student record. They can exist independently until a connector connects them.

2.9.2.3 Delete Deleting any of the record types, that is, student, course or marks, will result in an automatic adjustment of chain.

2.9.2.4 Update Updates also pose no problems and can be done easily.

2.9.3 Relational Model

The main difference between relational and the other two models is the links between different data items. In case of the relational model, the links are entirely due to the values of various data items, and nothing else. There are no internal links, connectors or parent-child relationships. For instance, a student is connected to a marks record by virtue of a common student code in both the tables. Table 2.21 shows the earlier example using the relational model.

Table 2.21 Relational model

Now, let us look at the different operations using the relational databases.

2.9.3.1 Retrieval Fig. 2.33 finds the answer to the first query.

Fig. 2.33 Record retrieval from relational databases – Approach 1

This query looks pretty simple in comparison to the earlier queries for hierarchical and network databases. It starts with the student record. For each student record, the student code of which is S901, it displays the course code and course name. Note that there is no question of any parent-child hierarchy built into the database itself. A mere join would suffice to answer this query. The join internally would use the logic as stated earlier. Fig. 2.34 depicts this.

Fig. 2.34 Record retrieval from relational databases – Approach 2

This query also looks quite simple. It now starts with the course record. For each course record with a course code of C20, it displays the student code and marks. Again, there is no question of any parent-child hierarchy built-into the database itself. As before, a join would be employed, which would internally use the logic stated earlier.

The logic of the queries is not only uniform; it is also simpler to build. As stated earlier, relational databases also make insert, update and delete operations easier. We will not go into the details of the same. Since the three tables exist physically separately, there should not be any adverse effect if operations performed on one table are performed on others.

2.10 DATABASE SYSTEM ARCHITECTURE

We have briefly discussed the concept of levels or views (e.g. logical and physical) that different kinds of users have in relation to a DBMS. We will now discuss it more formally.

Classically, we have three levels of the same DBMS, depending on what type of user we are. Fig. 2.35 shows these three levels.

Fig. 2.35 DBMS levels

Let us discuss these three levels now.

External level: This is the view of the database that most end users have. For example, when we talk of Employee record or Student record, we have the external view, also called the community user view, in mind. This level represents the end-user view. It is quite abstract in nature and closer to the way a programming language would model a record. For example, an Employee record in C would consist of a structure data type, and as a 01 level record in COBOL. This level sees the data in a database with this view.
Conceptual level: The conceptual level, also called the logical level, is not tied to a programming language or to a specific technology. It merely describes the data as any user would see it, regardless of any technology differences. For instance, this view would inform us that the employee number is a string that can hold 6 characters at the most, and that the employee salary is an integer, and so on. This level offers a layer of indirection between the external and the internal views.
Internal level: The internal level provides the physical view of the database. More specifically, this view provides information regarding the physical organisation of the data in the database. For example, this view may inform us that the employee record consists of 20 bytes, of which the first 6 are occupied by the employee number, and so on.

It is worth pointing out that there would be as many external views as the number of different types of users of the database. Therefore, there can be many external views for the same database. However, there is always precisely one conceptual and internal view for a database, regardless of the different number or types of users. This is illustrated in Fig. 2.36.

Fig. 2.36 Database architecture levels

Let us illustrate these concepts with the help of an example. Fig. 2.37 shows the different levels of an Employee record.

Fig. 2.37 Three levels of the DBMS architecture – An example

1:m relationship

Child record

Commit

Atomicity

Column

Community user view

Conceptual level

Cursor

Data Control Language (DCL)

Data inconsistency

Data integrity

Data redundancy

Database

Database catalog

DBMS models

Embedded SQL

File Management System (FMS)

Host language

Inner query

Join

Logical view

Nested query

Outer query

Physical view

Relational Database Management System (RDBMS)

Row

Static SQL

Sub-query

Transaction

Concurrency

Data consistency

Data Definition Language (DDL)

Data independence

Data Manipulation Language (DML)

Data sharing

Database Administrator (DBA)

Database Management System (DBMS)

Dynamic SQL

External level

Hierarchical DBMS

Impedance mismatch

Internal level

Logical level

m:n relationship

Network DBMS

Parent record

Physical view

Rollback

Security

Structured Query Language (SQL)

Table

A database is a well-organised set of data.
A Database Management System (DBMS) is a set of programs that helps us to manage databases.
Relational Database Management System (RDBMS) is the most popular of all DBMS models.
Structured Query Language (SQL) is the language for working with RDBMS.
File Management Systems (FMS) suffers from many drawbacks such as data redundancy, data inconsistency, lack of transactions, and so on.
Data redundancy refers to duplication of data.
Data inconsistency refers to different values for the same data items in a database, and is caused by redundancy.
Transaction is a logical unit of work. It must complete in entirety or not at all, but never only partly.
DBMS scores over FMS in terms of quicker response, data independence, security, data sharing, data consistency, removing data redundancy and providing for transaction management and concurrency.
In concurrency, one or more users/programs attempt to access the same data at the same time.
DBMS can be hierarchical, network or relational.
Hierarchical databases employ parent-child relationships. This poses certain problems in data insertion, updates and deletion.
Network databases employ parent-child as well as child-parent relationships. They remove some drawbacks of hierarchical databases.
Relational databases relate data items purely based on their values.
The equivalent terms for file, record, and field in RDBMS are table, row and column respectively.
SQL is a very high-level language that provides simple instructions for accessing data from tables.
SQL consists of three sets of languages: Data Definition Language (DDL), Data Manipulation Language (DML) and Data Control Language (DDL).
DDL is used for creating and destroying tables, indexes, and other forms of structures.
DML is used to retrieve or manipulate data stored in a database.
DCL controls the access to various tables, indexes and other structures.
DML is most widely used and provides four major statements: SELECT, INSERT, UPDATE and DELETE.
Joins can be used to retrieve data from multiple tables.
Embedded SQL programming facilitates the use of SQL with 3GLs such as C and COBOL.
Dynamic SQL provides for entering of SQL queries at execution time, rather than at the development stage.
A database can be considered as consisting of three layers: External, Conceptual and Internal.
External view of a database presents the view that the end users have.
Conceptual view describes the data without stress on any technology or programming language.
Internal view provides information regarding the organisation of a database.

1. Using a set of files is better than using a database.
2. RDBMS is the least popular of the possible DBMS models.
3. Data redundancy means multiple copies of the same data item.
4. Data inconsistency means the same values for different data items.
5. A scheme of grouping a set of integrated data files is called FMS.
6. A transaction is a set of operations that must be performed completely or not at all.
7. Database concurrency is achieved by using the locking mechanisms.
8. A table is a three-dimensional view of data.
9. DML is concerned with data definition.
10. In nested queries, one SELECT statement is written inside another.

1. Data is better organised and is easily accessible by using _________.
(a) DBMS
(b) FMS
(c) EMS
(d) OOAD
2. A _________ is a set of operations that must be performed completely or not at all.
(a) query
(b) command
(c) data sharing
(d) transaction
3. Commit and rollback are related to ___________.
(a) data integrity
(b) data consistency
(c) data security
(d) data sharing
4. Data is processed by using _________.
(a) DDL
(b) DML
(c) DCL
(d) DPL
5. When SQL statements are embedded inside 3GL, we call such a program as _________.
(a) nested query
(b) distinct query
(c) embedded SQL
(d) nested programming
6. ___________ translates the SQL statements inside a 3GL program into appropriate 3GL statements.
(a) Compiler
(b) Linker
(c) Loader
(d) Pre-compiler
7. If SQL statements are known before the program executes, we call them _________.
(a) cursor
(b) dynamic SQL
(c) static SQL
(d) embedded SQL
8. If SQL statements are not known before the program executes, we call them _________.
(a) cursor
(b) dynamic SQL
(c) static SQL
(d) embedded SQL
9. In _________, we have a strict parent-child relationship only.
(a) hierarchical databases
(b) network databases
(c) relational databases
(d) all of the above
10. In _________, we have a parent-child as well as child-parent relationship.
(a) hierarchical databases
(b) network databases
(c) relational databases
(d) all of the above

1. Explain DBMS and FMS with short explanations.
2. How is DBMS better than FMS?
3. Explain the terms table, row and column.
4. What is SQL? Why is it a powerful language?
5. Explain the terms Data Definition Language, Data Manipulation Language and Data Control Language in SQL.
6. Write a note on embedded SQL.
7. What is a cursor? How is it useful?
8. Explain the difference between static and dynamic SQL.
9. Write a note on hierarchical and network databases.
10. How is relational database technology different from the other two DBMS technologies?

1. Write SQL command to create the following table, named Student:
Roll number Number Maximum length 5
Student name Character Maximum length 30
Rank Number Maximum length 2
Total marks Number Maximum length 3
2. Create the above table in MS-Access.
3. Assume that the Student table contains the following data. Write a SELECT statement to print the average marks and the maximum marks.

4. Consider the following table (called Details) in addition to the above table.
         Roll number Birth date
        17 7-Apr-1973
        90 26-Oct-1972
        27 19-Oct-2001
        56 22-Dec-2002

Write a SELECT query to print the roll numbers, names, birth dates and ranks of all the students with the help of a join.

5. Produce the same output as above by using a sub-query.
6. Provide read-only access of the Student table to all users.
7. How would you ensure that only the user named User1 can perform insertions in the Details table?
8. Study Pro*C or Pro*COBOL to learn more about embedded SQL.
9. Study what is meant by isolation level in embedded SQL programming.
10. Learn more about hierarchical and network databases by reading about the products called IMS and IDMS.