# Chapter 3 – The Relational Model – Introduction to Database Management Systems

Chapter 3

 The Relational Model

Ted Codd came up with the relational model which disconnects the logical organisation of a database from its physical storage methods during the early 1970s.

Chapter Highlights

• RDBMS in Depth
• Relational Algebra and Relational Calculus
• Database Integrity and Constraints
• Meaning and Types of Keys
• Views and their Usage

3.1 RELATIONAL DATABASES PRIMER

3.1.1 Tabular Representation of Data

The most natural way of organising data so that even people not familiar with computers can easily understand it is to arrange it in a tabular format. This means that when data is arranged in a two-dimensional table, it can be easily visualised and understood by everyone. These tables have the following characteristics:

1. Each entry in a table occurs exactly once. There are no repeating rows.
2. All data items in the same column are of the same kind.
3. Every column has a distinct name to identify it.

3.1.2 Some Terminology

Readers who do not wish to know about mathematical jargon can skip this section without any loss of continuity. However, for the purists, we shall discuss some of the jargon in relational database technology.

Technologists like to use complex words for simple concepts, which sometimes makes comprehension difficult. Thus, a table is called a relation and a database constructed by using such relations is called a relational database. Actually, a relational database is nothing but a flat collection of data items.

Relational databases are based on the mathematical theory of relations. As such, the vocabulary used in relational mathematics is used here as well. Also, we can apply the results of relational mathematics to relational databases straightway. This makes predictions of the results of operations on relational databases quite precise.

There are a few more terms that complicate things a bit further. A row or a record is called as a tuple. Thus, the following two statements mean the same thing:

• A table is a collection of rows
• A relation is a set of tuples

If there are n columns in the table, then the relation is said to be of degree n. Relations of degree 2 are binary, relations of degree 3 are ternary, and soon.

The number of rows in a table is called the cardinality of the table. The number of columns in a table is called its degree. For example, the table in Fig. 3.1 has a cardinality of 5 and a degree of 4.

Fig. 3.1 Cardinality and degree

As we shall study soon, a set of values of one column in a table is called a domain.

3.1.3 Domains

Domain is an important concept in relational databases.

A domain is a set of all possible values for a column or a relation.

For example, suppose we have a column called Student_name in a Student table. Then, the set of all possible student names is the domain. In other words, the domain is similar to a data type. We know that integer, string and character are common data types in many programming languages. They define the set of possible values and the rules associated with those values. For instance, when we declare a variable as of type integer in a programming language, we implicitly know the possible range of values that it can accommodate, the arithmetic operations that can be performed on that variable, and so on.

Similarly, Student_name is a set of all possible student names. It is as good as a data type. However, the idea of a domain has further connotations. Strictly speaking, a domain is a set of values that allows direct comparisons. Thus, there is no point in comparing student numbers with employee numbers, although both are integers. They have different domains.

Based on these ideas, we can define a few domains as shown in Table 3.1.

Relational databases are similar to any tabular organisations of data. In fact, when we create spreadsheets, those are quite similar to the tables in relational databases.

Table 3.1 Examples of domains

 Domain Significance Indian_Phone_Numbers Set of 8-digit phone numbers valid in India. Names Set of names of people. Department_codes Set of 3-character department codes valid in an organisation, such as EDP, MKT, ADM, etc. Player_ages Set of ages of cricket players; must be between 12 and 75.

We can see that a domain has some or all of the following characteristics:

• Name
• Data type
• Format
• Unit of measurement
• Range or list of allowed values

We shall now show the mathematical terms in action, as depicted in Fig. 3.2.

Fig. 3.2 Relational database terminology

3.2 RELATIONAL DATABASE CHARACTERISTICS

Relational databases have some peculiar characteristics which distinguish them from file-based structures. We shall now study them.

• Ordering of rows: We have seen that a table is a set of rows. A table in relational databases is similar to a set of values in mathematics, as shown in Fig. 3.3.

Fig. 3.3 Table is similar to a set

If we think about this concept further, we will note that there is absolutely no importance of the order of values in a set in mathematics. Similarly, the order of rows has no significance in a table. This is in stark contrast to files, where the order of records can be quite significant. For example, we talk of the 1st, 2nd and nth record in a file. However, no such terminology is used in the case of rows in a table in relational database.

Of course, physically, a table in a relational database is stored on the disk. Therefore, there is some logical ordering of records. However, this must not be visible, nor should it have any importance from an external or logical perspective.

Ordering of columns: Just as the ordering of rows in a table has no importance, the ordering of columns in a table is also of no significance. As before, we will note that there will be some physical ordering of columns on the disk. However, from an external point of view, this has no meaning.

Atomicity of values: Each value in a row is atomic or indivisible. In other words, no value should repeat in the same row. To understand this, consider Table 3.2. Here, we show a Customer table containing three columns — Name, Status, and Address. We can see that one customer can have multiple addresses, all stored in the same row. This is non-atomic, and hence, non-relational. We shall later study how to take care of such problems and make such values atomic.

Table 3.2 Non-atomic values in the Address column

 Name Status Address Atul Inactive PuneSolapurMumbai Anita Active PuneMumbai Jui Active Pune

Table 3.3 shows the atomic view of the same table, which now follows the principle of relational databases.

Table 3.3 Atomic values in the Address column

 Name Status Address Atul Inactive Pune Atul Inactive Solapur Atul Inactive Mumbai Anita Active Pune Anita Active Mumbai Jui Active Pune

Uniqueness of rows: In a table, no two rows should be exactly the same. That is, there must be something different between any two rows of a table. Some of the values in any two rows may repeat, but not all of them. Table3.4 shows what is not permitted. We can see that rows 1 and 3 are exactly the same. This is not allowed.

Table 3.4 Repetition of complete rows is not allowed

 Name Status Address Atul Inactive Pune Atul Inactive Solapur Atul Inactive Pune Anita Active Pune Anita Active Mumbai Jui Active Pune

Relational database theory consists of two primary concepts, relational algebra and relational calculus, as shown in Fig. 3.4.

Fig. 3.4 Relational databases theory

Now we shall discuss these concepts.

3.3 RELATIONAL ALGEBRA

3.3.1 Relational Algebra Operators

Relational algebra is a set of operations on relational databases that allow retrieval of data.

There are 8 such operations defined in relational algebra, as listed in Table 3.5.

Table 3.5 Operations specified in relational algebra

 Operation Meaning Restrict Returns rows that satisfy a specified condition. Project Returns rows with specified columns. Product Returns rows after combining two tables. Union Returns all rows that appear in either or both of two tables. Intersect Returns all rows that appear in both of two tables. Difference Returns all rows that appear in the first but not in the second of two tables. Join Returns rows after combining two tables based on common values. Divide Returns all rows that appear after dividing one table by another.

Some of these operations are self-explanatory. However, others are not so easy to understand. Therefore, we shall try to depict them in a diagrammatic form, as shown in Fig. 3.5.

Fig. 3.5 Relational algebra operations illustrated

Let us now discuss these operations one-by-one.

3.3.1.1 Restrict We know that in mathematics, a set can have any number of subsets. A set is the subset of another if all its members are also members of the other set. For example, consider the following two sets:

S1 = {A, B, C, D, E}

S2 = {B, C, D}

We can state here that S2 is a subset of S1, because all the elements of S2 are also the elements of S1.

As in the case of set theory in mathematics, we can treat a table as a set and derive some rows from that table as its subset. In effect, we will have another table with the same header, but with a different name. For example, from a Student table, we can derive another table that contains records for only those students who have passed. Thus we restrict the number of rows that we want.

The idea in a restrict operation is to select only the desired rows and eliminate the unwanted rows.

Inside a SQL SELECT operation, we can achieve restriction by using the comparison operators shown in Table 3.6.

Table 3.6 Operators used for restriction

 Operator Meaning Example = Equal to Result = ‘Pass’ <> Not equal to City <> ‘Pune’ > Greater than Salary > 1000 < Less than Commission < .50 >= Greater than or equal to Age >= 18 <= Less than or equal to Profit <= 50

Consider a simple Book table shown in Table 3.7.

Table 3.7 Book table

 Id Title Author Publication Subject 1 Computer Fundamentals AM John AMC Computer 2 Operating Systems Peter Parker AMC Computer 3 Computer Networks AB Simpson Bernard's Computer 4 My Autobiography MK Gandhi Goyal Autobiography 5 Chinese Cooking Rumala Rao Diwan Cooking 6 The Freedom Struggle MK Bose Bose General 7 Home PC PJ Dilbert AMC Computer 8 Gardening SK Das Das & Das Home improvement 9 You and Your Dog ERP Slater Home and Away Home improvement 10 Math for Small Children B Reid Children Books Children

If we apply the restriction of Publication = ‘AMC’ to the above table, we will get the result shown in Table 3.8.

Table 3.8 Restrict on Book table

 Id Title Author Publication Subject 1 Computer Fundamentals AM John AMC Computer 2 Operating Systems Peter Parker AMC Computer 7 Home PC PJ Dilbert AMC Computer

3.3.1.2 Project We know that the unwanted rows are eliminated in a restrict operation. Similarly, we can also eliminate the unwanted columns.

The operation of eliminating columns in a table is called a project operation.

Projection operation on a table leads to the formation of another table by copying specified columns of the original table. During the process, if any duplicate rows are found because of this elimination, they are ignored too.

For example, if we consider the same Book table defined earlier and perform a projection operation to consider only the Title, Author, and Publication columns, the result is as shown in Table 3.9.

Table 3.9 Project on Book table

 Title Author Publication Computer Fundamentals AM John AMC Operating Systems Peter Parker AMC Computer Networks AB Simpson Bernard's My Autobiography MK Gandhi Goyal Chinese Cooking Rumala Rao Diwan The Freedom Struggle MK Bose Bose Home PC PJ Dilbert AMC Gardening SK Das Das & Das You and Your Dog ERP Slater Home and Away Math for Small Children B Reid Children Books

Note that a projection operation does not eliminate any rows, unless there are duplicates. It only eliminates columns.

3.3.1.3 Product We know of Cartesian product in mathematics. Cartesian product of two sets yields a third set. This set contains pairs of elements. Here, the first element in each pair belongs to the first set, and the second element in each pair belongs to the second set.

For example, let us consider two sets as follows.

S1 = {A, B, C}

S2 = {D, E, F}

Then, the Cartesian product S1 × S2 is the following set:

{(A, D), (A, E), (A, F), (B, D), (B, E), (B, F), (C, D), (C, E), (C, F)}

Similarly, let us consider two tables shown in Table 3.10 and Table 3.11.

Table 3.10 Male table

 Person Department Ram Accounts Sham Sales

Table 3.11 Female table

 Person Department Radha Accounts Geeta Sales

If we decide to obtain a product of these two tables so as to identify the possible marriage pairs, the result would be as shown in Table 3.12. We have slightly modified the column names for ease of understanding, as well as to preserve the uniqueness of the column names.

Table 3.12 Male_Female table

 Male_Person Male_Department Female_Person Female_Department Ram Accounts Radha Accounts Ram Accounts Geeta Sales Sham Sales Radha Accounts Sham Sales Geeta Sales

Thus, we can conclude the following.

Product returns rows after combining two tables.

3.3.1.4 Union In mathematical set theory, the union of two sets is a set that contains elements present in either or both of the sets. In the process, any resulting duplicates are eliminated.

For example, let us consider two sets as follows.

S1 = {A, B, C, D, E}

S2 = {D, E, F, G}

The union of the two sets is the following set:

{A, B, C, D, E, F, G}

The concept of union in relational databases works slightly differently, as follows.

By using union, multiple queries can be put together and their output merged.

When we use union, the output of two or more SQL queries is merged into a single set of rows and columns. For example, consider two tables shown in Table 3.13 (Salespeople table) and Table 3.14 (Customers table).

Table 3.13 Salespeople table

 SID Sname City Commission 101 Ram Pune 10 102 Ana Pune 12 103 Jyoti Mumbai 11 104 Raguh Delhi 14 105 Deepa Chennai 12

Table 3.14 Customers table

 Cnum Cname City Orderamt Snum 201 Amrita Pune 10000 102 202 Rekha Pune 12700 101 203 Dheeraj Delhi 18080 104 204 Prasad Delhi 12810 104 205 Srinath Mumbai 15610 103 206 Meena Chennai 14374 105

Now, let us take a union of sales people and customers based in the city of Pune. Let us assume that we only want to have a look at the sales person/customer numbers and names. That is, the following query in English is constructed:

Select number and name of salespeople working in Pune

UNION

Select number and name of customers residing in Pune

The effect of union is either or, or both. Thus, we are saying the following:

Give us a list (containing numbers and names) of salespeople and customers based in Pune.

The resulting output is shown in Table 3.15.

Table 3.15 Result of union

 Number Name 101 Ram 102 Ana 201 Amrita 202 Rekha

The column headings (that is, Cnum, Cname, Snum, Sname) are changed, because the result is based on two distinct tables. That is, the columns in a union are not extracted from a single table – but from two or more tables. Therefore, the names of the columns from which the output is formed can (and usually does) vary. For instance, here, the number is stored in a column called as Snum in the Salespeople table, and as Cnum in the Customers table.

An important prerequisite for unions is that the output columns should be union-compatible. That is, all the SELECT queries that make up the union statement need to select the same number of columns, and their data types should also match. For example, in the above case, we cannot choose Sname from the Salespeople table and Orderamt column from the Customers table. This is because these two columns are incompatible with each other in terms of their data types. In any case, such a union would make little sense.

When we do a union of two or more tables, any duplicates arising out of the operation are automatically eliminated.

3.3.1.5 Intersection In mathematics, the intersection of two sets is the set of elements that are common to both the sets.

For example, let us consider two sets as follows.

S1 = {A, B, C, D, E}

S2 = {D, E, F, G}

Then, the intersection of the two sets is the following set:

{D, E}

In the case of relational databases, to perform an intersection operation on two tables, the two tables must be union-compatible.

The result of intersection is the rows common to the rows produced by the individual queries.

In other words, it is the intersection of the rows produced by two or more queries.

For example, consider two tables as shown in Table 3.16 and Table 3.17.

Table 3.16 Table A

 Cname City Ruchi Pune Subha Mumbai

Table 3.17 Table B

 Cname City Ravi Delhi Ruchi Pune

If we take an intersection of tables A and B, then the result is as shown in Table 3.18.

Table 3.18 A INTERSECT B

 Cname City Ruchi Pune

As we can see, the intersection operator is used in a manner similar to the union operator. However, intersection provides the and functionality. If this sounds confusing, let us consider a more useful example of union and intersection.

Let us consider the tables shown in Table 3.19 and Table 3.20. The first table shows the languages that the students are learning, and the second table shows the languages that the teachers teach.

Table 3.19 Student table

 Student Language Ruchi English Subha Hindi Ruchi Hindi Ravi Marathi

Table 3.20 Teacher table

 Teacher Language Sandeep English Sunil English Sunil Marathi Abhijit Hindi Dhananjay French

View is a window to a database. It allows us to view data in one or more tables and provides for access control. What is more, it allows us to update data in certain cases!

Now, let us first do a union as follows:

Select all languages that the students are learning

UNION

Select all languages that the teachers are teaching

The result is shown in Table 3.21. As expected, the languages present in either or both of the tables are shown.

Table 3.21 Result of UNION

 Language English Hindi Marathi French

Now let us do an intersection as follows:

Select all languages that the students are learning

INTERSECT

Select all languages that the teachers are teaching

The result is shown in Table 3.22. As expected, the languages present in both the tables are shown. French, which is present only in the second table, but not in the first, is not shown.

Table 3.22 Result of INTERSECT

 Language English Hindi Marathi

3.3.1.6 Difference In mathematics, the difference between two sets is a set that contains members of the first set that are not in the second set.

For example, let us consider two sets as follows.

S1 = {A, B, C, D, E}

S2 = {D, E, F, G}

Then, the difference of the two sets S1 _ S2 is the following set:

{A, B, C}

Similarly, the difference of the two sets S2 _ S1 is the following set:

{F, G}

Like union and intersection, we can perform a difference operation only on those tables that are union-compatible. The difference between tables A and B defined earlier in the chapter (i.e. A – B) is as shown in Table 3.23.

************

Table 3.23 Difference A – B

 Cname City Subha Mumbai

Similarly, the difference between tables B and A defined earlier in the chapter (i.e. B - A) is as shown in Table 3.24.

Table 3.24 Difference B – A

 Cname City Ravi Delhi

The difference operator provides a not functionality. Thus, we can state:

When we take the difference between two tables we get rows that are present in the first table but are not present in the second one.

3.3.1.7 Join Join joins two or more tables by using operators such as equal to, not equal to, greater than, etc.

Joining two (or more) tables has some prerequisites, if it has to be of some practical value:

• The tables should be joined based on a common column.
• The common column should be compatible in terms of domain.

For example, we will consider the Student and Teacher tables mentioned earlier. Let us join them based on the Language column. The condition is that the language in the Student table should match that in the Teacher table. The result is shown in Table 3.25. Note that we have renamed the Language column to preserve uniqueness.

Table 3.25 Joining Student and Teacher tables based on common language

 Student Student_Language Teacher Teacher_Language Ruchi English Sandeep English Ruchi English Sunil English Subha Hindi Abhijit Hindi Ravi Hindi Sunil Marathi Ravi Marathi Abhijit Hindi

As we have seen, the commonality is based on the Language column between the two tables being equal. As we have tested to see if the language is same (or equal), this is called as equi-join. This is the most common form of join in relational databases. There are other forms of joins as well. For example, in a non-equi-join, we create a join based on an operator such as not equal to.

Another important type of join is the natural join. In this case, the relationship between tables is already known, and comes out as a result of the natural join operation. For example, let us revisit our Salespeople and Customers tables. We will recollect that both the tables contain a column called as Snum. This column is a must in the Salespeople table, as it is the primary key, and identifies a salesperson uniquely. However, we have also added that column to the Customers table to know which salesperson serves which customer. In effect, there is naturally a correspondence between the two tables, based on this common thread of salesperson number. Therefore, we can perform a natural join on these tables based on the salesperson number to see which customer is served by which salesperson. Note that we have given prominence to the customer table here. So, if a salesperson does not serve any customer, her name would not appear in the result. The result is shown in Table 3.26.

Table 3.26 Natural join

 Cname Sname Amrita Ana Rekha Ram Dheeraj Raghu Prasad Raghu Srinath Jyoti Meena Deepa

If we observe carefully, we will realise that all the sales people from the Salespeople table are present in Table 3.26. This is because every salesperson serves at least one customer. However, let us assume that we have a new salesperson numbered 106 having the record shown in Table 3.27. This salesperson is not assigned to any customer as of now.

Table 3.27 Unassigned salesperson

 SID Sname City Commission 107 Deepak Mumbai 12

If we perform a natural join again, the output will be the same as shown earlier. In effect, this new salesperson would be ignored because he does not have a customer assigned yet. This is fine, as long as what we are looking for is a list of customers and their assigned salespersons. However, if our objective is to see all salespeople in the output, regardless of whether they have a customer assigned or not, this objective would not be served.

The form of join we have shown so far is called as inner join. On the other hand, when we write a query to bring records of salespeople regardless of whether they have a customer assigned or not, it is called as an outer join. An outer join shows both matching as well as non-matching values from the tables being joined.

3.3.1.8 Divide Explaining the divide operator is slightly cumbersome. Therefore, we shall straightaway take an example. Let us again consider the Student table defined earlier in the chapter. Let us assume that we have another table called as Language, as shown in Table 3.28. If we now divide the Student table with the Language table, we get the result as shown.

Table 3.28 Division example

As we can see, the result is the selection of all the students – in this case just 1 – who know both the English and Hindi languages.

3.3.2 Grouping

Grouping is an additional feature of relational algebra. It facilitates the combination of values based on a common reference. For example, let us consider the Salespeople table shown in Table 3.29. The three columns in the table convey the following information:

 SID Salesperson's ID PID Product ID Quantity Quantity sold

Table 3.29 Salesperson table

 SID PID Quantity S1 P1 10 S1 P1 12 S1 P2 5 S2 P2 18 S2 P2 7 S2 P2 5 S2 P3 3 S2 P3 12 S3 P1 11 S3 P2 8 S3 P2 1

Let us first group the data based on salesperson ID. This will provide us information regarding the quantity of products sold by each salesperson. The result is shown in Table 3.30.

Table 3.30Grouping on SID

 SID Quantity S1 27 S2 45 S3 20

Although this information is useful, it would be still better if we had information regarding the quantity of each product sold by each salesperson. That is, the grouping should happen on SID as well as PID. Table 3.31 shows the result.

Table 3.31 Grouping on SID and PID

 SID PID Quantity S1 P1 22 S1 P2 23 S2 P2 12 S2 P3 15 S3 P1 11 S3 P2 9

The SQL language also provides easy-to-use facilities for grouping data based on relational algebra. To achieve the above result, we can specify the following SQL query:

SELECT   SID,   PID,   SUM (Quantity)

FROM   Salesperson

GROUP   BY   SID,   PID

3.4 RELATIONAL CALCULUS

Relational calculus is a formal declarative query language.

At this juncture, it is important to distinguish between relational algebra and relational calculus.

Relational algebra is procedural in nature. In other words, we need to specify what data to be retrieve, and also how to retrieve it. On the other hand, relational calculus is declarative in the sense that we need to only specify what data needs to be retrieved. We need not specify how to retrieve it. Fig.3.6 shows the distinction between the two.

Fig. 3.6 Relational algebra versus relational calculus

Because of this, we can state the following:

• Relational algebra is procedural
• Relational calculus is non-procedural

In other words, relational algebra is a bit like the 3GLs such as C or COBOL, where we need to specify the algorithms (i.e. sequence of operations) for performing a specific task. On the other hand, relational calculus does not require detailed algorithms to be specified. We need to only mention what needs to be achieved and leave it at that.

Interestingly, any retrieval operation performed by using relational algebra can be performed by using relational calculus, and vice versa. In other words, both of them are equal in terms of their capabilities. Their expressive power is identical.

Let us consider an example to understand relational calculus. Suppose we have an Employee table and we want to find out the list of employees whose salary is above Rs 25,000. We can write the following SQL query (which is in the form of relational algebra):

SELECT Name, Salary

FROM Employee

WHERE Salary > 25000

Those not interested in the mathematical theory behind this can skip the next few sentences safely without any loss of continuity.

If we want to write this in the language of relational calculus, then the query takes the following form:

{t.Name | Employee (t) and t.Salary > 25000}

The meaning of this query is as follows:

• t.Name specifies the column to be retrieved (t is called as tuple variable in classical terminology)
• Employee (t) is a condition that specifies that we want to retrieve data from the Employee table
• t.salary > 25000 is the condition for satisfying the query

We shall not discuss relational calculus any further, as the discussion focuses more on the mathematics behind the scene, which is not within the scope of the current text.

3.5 DATABASE INTEGRITY

3.5.1 Constraints

Database integrity refers to the accuracy or correctness of data in a database.

A database may have a number of integrity constraints. For example, let us revisit the Salesperson table. We may have many constraints on this table, such as the following:

• The salesperson ID and product ID must consist of two digits.
• Every salesperson ID and product ID must start with an alphabet.
• Every salesperson ID and product ID must end with a number between 1 and 9.
• Quantity must be a number between 1 and 999.

We can think of any number of additional constraints. For example:

• Salesperson S2 will be able to sell only product P2.
• Whenever product P3 is sold, the quantity must be over 2.

All these constraints would be based on the rules and demands of the business. Therefore, we need to inform the DBMS of such constraints, and make sure that they are not violated.

If the database is empty, there is no problem in specifying constraints. However, if the database already contains some data, then we may have a problem. What if we specify a new constraint, which is already violated in the database? Imagine the same happening in case we have to add the following constraint:

• Salesperson S2 will be able to sell only product P2.

What if there is already a record in the Salesperson table which shows that salesperson S2 has sold product P3? There would be no point in even trying to force this constraint. Therefore, when we specify a constraint on a database, the DBMS has to check if the constraint is currently valid. If it is not, the DBMS will reject the constraint. Otherwise, the DBMS will store it and enforces it with immediate effect. This is shown in Fig. 3.7.

Fig. 3.7 Implementing database constraints

Similarly, there is a need to be able to remove existing constraints.

3.5.2 Declarative and Procedural Constraints

Database constraints can be of two possible types: declarative and procedural. Fig. 3.8 shows this.

Fig. 3.8 Database constraints

The type of constraints that we have discussed so far falls under the declarative category. In other words, we have considered that these constraints will be specified at the time of table creation or even later. However, they would still be a part of the database design scope.

On the other hand, the procedural constraints come in the form of triggers or stored procedures. These are precompiled procedures, which can be invoked from an application program.

Generally, it is far better to have declarative constraints than procedural constraints. This is because the former can be specified at design time. The latter need to be specified at the time of program development, which is both time-consuming and error-prone. It is worth mentioning that among the list of things that have changed in the relational database world, the topic of database constraints is perhaps at the top. It has evolved significantly from the early days.

There is another way to classify database integrity constraints. They can be classified into four categories, as shown in Fig. 3.9.

Fig. 3.9 Types of database constraints

Let us discuss these types.

3.5.2.1 Type constraints

Type constraints specify what is legal to a given type.

For example, consider that we want to define Age as a type. Then, we can specify the following type constraint in plain English:

TYPE Age Representation (Rational)

CONSTRAINT Age > 0

This constraint specifies that the type Age would allow rational numbers with a value greater than 0. In other words, an expression that does not produce a value greater than 0 for Age will fail.

3.5.2.2 Attribute constraints

Attribute constraints specify that a specified column is of a specified type.

For example, consider the following declaration:

CREATE TABLE Employee

 (Emp_No Integer, Emp_Name Char (30), Department Char (5), … )

We can see that the employee number is an integer; the employee name is a string, and so on. These declarations automatically add attribute constraints to column definitions.

In other words, if we attempt to store a string in the employee number column, it will fail.

3.5.2.3 Instance constraints

Instance constraints apply to a specific instance or condition.

SQL provides good features for specifying instance constraints.

For example, consider the following CREATE TABLE statement in SQL:

CREATE TABLE Student

 (Snum INTEGER NOT NULL PRIMARY KEY, Sname CHAR (30) NOT NULL, Total_Marks INTEGER CHECK (Total_Marks <= 100))

SQL provides a keyword CHECK for specifying constraints. Here, we have specified a constraint that the total marks of a student cannot exceed 100.

Let us consider another example.

CREATE TABLE Salespeople

 (Snum INTEGER NOT NULL PRIMARY KEY, Sname CHAR (30) NOT NULL, Region CHAR (10) CHECK (Region IN ‘West’, ‘East’), Commission INTEGER CHECK (Commission < 80))

Note how we have specified two different constraints:

1. The region that a salesperson can work in can only be West or East.
2. The commission paid to a salesperson cannot exceed 79.

A logical question at this stage is: can we combine two or more constraints, so that they work together? Yes, we can, and that is what we will discuss next.

3.5.2.4 Database constraints

Database constraints group two or more instance constraints.

Continuing with the Salespeople table, let us assume that only the salespersons in the Northern region are allowed commissions of .20 or above. How can we explain this condition in an IF-ELSE form in plain English?

IF Region = ‘North’

Do not bother about the commission value

ELSE

The commission must be < 20

ENDIF

Translating into SQL-like form, this condition becomes as shown below in the table definition. Note that we have got rid of the earlier constraints to keep things simple.

CREATE TABLE Salespeople

 (Snum INTEGER NOT NULL PRIMARY KEY, Sname CHAR (30) NOT NULL, Region CHAR (10), Commission INTEGER,

CHECK (Region = ‘North’ OR Commission < 20))

We can also write more complex constraints.

3.5.3 More on Constraints

This discussion raises two questions:

1. Can we specify constraints in SQL only while creating a table? What do we do if a table already exists, and we simply need to add constraints to it?
2. We have still not seen any constraint that applies to multiple rows of a table. The constraints that we have discussed may even reference multiple columns, but still, they apply to the same row in the table. How do we take care of this problem? For example, suppose we want to be sure that all the salespeople in a given region must have the same commission.

The answer to the first question is No. We can also add constraints to existing tables by using an ALTER TABLE clause.

Using a CHECK constraint cannot solve the problem specified in the second question. There is a work-around when we use views. We shall discuss it later.

RDBMS has two main aspects: Relational algebra and relational calculus. These names come from the fact that RDBMS has its roots in mathematics and set theory.

3.6 KEYS

Keys are fundamental to relational databases. In fact, without keys, relational databases will not be usable at all. The power of relational databases comes from the fact that we can split related data into different tables and logically link them together by using keys. In this context, we need to discuss several terms.

Database keys can be of various types, as follows.

• Superkey
• Key
• Composite key
• Candidate key
• Primary key
• Alternate key or Secondary key
• Foreign key

Let us discuss these types of keys.

3.6.1 Superkey and Key

A superkey is a set of columns that uniquely identifies every row in a table, while a key is a minimal set of such columns.

The word minimal comes from the fact that we cannot exclude any column from a key and still identify a row. All the columns that make up the key must be available — otherwise, it is no longer a key. In other words, a key is a minimal set of columns that uniquely identifies or functionally determines every column value in a row.

Let us understand this with an example. Consider a simple Employee table containing just two columns: Emp_ID and Emp_Name. Then the concepts of a superkey and a key are illustrated in Fig. 3.10.

Fig. 3.10 Example of superkey and key

We can see that the two columns together make up the superkey. However, it is not a key because it is not a minimal set of columns. On the other hand, Emp_ID is a key, because it is a minimal set of columns that can identify a row uniquely.

3.6.2 Composite Key

There are situations when a single column cannot constitute a key. This is because a single column cannot uniquely identify every row in the table. Instead, we need to have two or more columns together in order to identify every row in the table uniquely.

A key consisting of two or more columns is called as a composite key.

This concept is shown in Fig. 3.11. Here we have a table containing five columns. The first three of them together make up the composite key.

Fig. 3.11 Composite key concept

For example, consider the table structure and data for the SP table as shown in Table 3.32. This table tells us which supplier sells which part. As we can see, neither the supplier ID nor the part ID can identify a row in the table uniquely. However, the two of them together can easily identify any row in the table uniquely. Hence, it is a composite key.

Table 3.32 SP table

 Supplier_ID Part_ID Quantity S1 P1 10 S1 P2 5 S2 P2 8 S2 P3 6 S2 P4 5

The concept is illustrated in Fig. 3.12.

Fig. 3.12 Composite key example

3.6.3 Candidate Key

A table can have more than one set of columns that could be chosen as the key. These are called candidate keys.

For example, consider again the Salespeople table containing the following columns — Snum, Sname, Region and Commission. From the list of columns, it may appear that apart from Snum, Sname can also be a key. This assumption will prove right as long as we always have unique salesperson names. However, if we cannot make this assumption, Sname cannot be a candidate key. This is shown in Fig. 3.13.

Fig. 3.13 Candidate key possibilities - Example 1

Let us now consider that there is one more column in the Salesperson table, called as PN (short form for Passport Number). Since we can identify a person uniquely based on the passport number, this can certainly be another candidate key. This is shown in Fig. 3.14.

Fig. 3.14 Candidate key possibilities - Example 2

3.6.4 Primary Key

We have discussed the idea of primary keys earlier.

The primary key identifies every record in a table uniquely.

When we have two or more candidate keys, we have to decide which of them becomes the primary key. The criterion for this decision is based on the ease of use in the day-to-day working as well as data entry. Emp_ID, Snum and PN are all examples of primary key.

Sometimes, a table just does not have a primary key. In such cases, we may need to introduce an additional column which contains unique values (such as a running sequence number). That is, we may need to artificially add a primary key.

3.6.5 Alternate Key or Secondary Key

Alternate keys or secondary keys are keys that may or may not identify a record uniquely, but help in faster searching.

There are varying definitions of the alternate key concept. Another definition of alternate key is — any candidate key, which is not the primary key is an alternate key. However, we shall follow the definition mentioned earlier.

The significance of defining an alternate key is as follows:

• The DBMS creates and uses an index based on the alternate key (just as it would, based on a primary key).
• Searching based on an alternate key is quicker than searching based on a non-key column, because of the index.
• Unlike a primary key, an alternate key can have duplicates.

As per our definition, in the Salespeople table, any column that is not the primary key is likely to be an alternate key. What are these columns? As we know, they are Sname, Region and Commission. Would all of these be alternate keys? It depends. Would we do frequent searches based on all of these columns? If yes, then we might as well make all of these alternate keys. If, however, we write frequent queries to search only on one column — say the Sname column — then we should make that column alone the alternate key.

We outline the differences between a primary key and an alternate key as shown in Table 3.33.

Table 3.33 Primary key versus alternate key

 Primary key Alternate key Requires the column to have unique values in all the rows. Uniqueness of values is not required. Duplicates are allowed. There can be only one primary key per table. There can be any number of alternate keys per table. A primary key has to be a candidate key also. An alternate key may or may not be a candidate key.

3.6.6 Foreign Key

A foreign key is a set of columns in one table, which is a key in
another table.

Foreign keys are extremely important in the context of table inter-relationships. We have seen the SP table, containing the supplier ID, part ID and the quantity. We have not worried too much about finding more details about the supplier and the part, though. For example, what is the name and address of the supplier or what part does the part ID stand for? How do we locate this information?

Relational database works by dividing data into different tables, and then linking them on the basis of common attributes/columns. For example, we would most likely have separate tables for suppliers and for parts. These tables would contain details regarding suppliers and parts respectively. The SP table would be used to establish a relationship between them. The idea is illustrated in Fig. 3.15. We have deliberately not shown the details of the Supplier and the Part tables, reasons for which will be known soon.

Fig. 3.15 SP table related to the Supplier and Part tables

The next question is, how do we establish the relationships between the SP table and the Supplier as well as Part tables? This is exactly where the vital concept of foreign key comes into the picture.

• Just as we have a Supplier_ID column in the SP table, we would also have a similar column in the Supplier table. For a given Supplier_ID in the SP table, there must be exactly one supplier row in the Supplier table.

Similarly,

• Just as we have a Part_ID column in the SP table, we would have a similar column in the Part table. For a given Part_ID in the SP table, there must be exactly one part row in the Part table.

Thus, Supplier_ID is the common link between the Supplier and SP
tables, and Part_ID is the common link between the Part and the SP
tables.

More importantly, for a given value of Supplier_ID or Part_ID in the SP table, there must be exactly one corresponding row in the Supplier and Part tables, respectively. Why is it? Let us understand with an example.

Suppose the first row in the SP table contains Supplier_ID = S1. We shall ignore the values in the other columns for this row in the SP table. Furthermore, let us assume that the Supplier table contains two rows for Supplier_ID = S1. We shall also ignore the Part table for now. The question is: which one of these suppliers does the first row in SP table refer to? This problem is shown in Fig. 3.16.

Fig. 3.16 Importance of uniqueness of values in relationships

To ensure that this does not occur, we must allow only one matching entry from the SP table to the Supplier table for a given Supplier_ID (and also to the Part table for a given Part_ID).

Interestingly, the inverse of this is not true. That is, there can be multiple rows in the SP table for the same supplier, that is they can have the same Supplier_ID. After all, Supplier_ID is not the primary key of the SP table. So, this makes sense.

We will now expand our earlier diagram to show the internal details of the Supplier and Part tables to illustrate this. Fig. 3.17 depicts this.

Based on these premises, let us now get our terminology correct. Table 3.34 tabulates the concepts.

Fig. 3.17 Establishing relationships between tables

Table 3.34 Primary and foreign keys for the example shown earlier

 Table Primary key Foreign key Supplier Supplier_ID Supplier_ID of SP table Part Part_ID Part_ID of SP table SP Supplier_ID + Part_ID Supplier_ID of Supplier table Part_ID of Part table

Firstly, let us discuss about the primary keys of the three tables. This should be quite easy to understand.

• In the Supplier table, Supplier_ID is the primary key.
• In the Part table, Part_ID is the primary key.
• In the SP table, Supplier_ID + Part_ID is the (composite) primary key.

Now, consider the foreign keys. The primary key of one table generally becomes the foreign key of the other table, if the linking of the two tables is based on that key.

• The foreign key of Supplier is (a part of) the primary key of SP, that is Supplier_ID.
• The foreign key of Part is (a part of) the primary key of SP, that is Part_ID.
• The foreign key of SP is jointly made up from the primary keys of Supplier and Part, that is Supplier_ID and Part_ID.

Just to explain further Supplier_ID is the primary key in the Supplier table but foreign key in the SP table. We leave it to the reader to describe the other two primary keys.

As an aside, we should note that the names of the columns in the two tables need not be the same in order to establish a foreign key relationship. For example, consider two tables T1 and T2. Let the primary key of T1 be X. As evident in our discussion so far, X will become the foreign key in T2. However, it is not necessary that X be named as X in table T2. It may very well be named Y, Z or be given any other column name.

3.6.7 Keys and SQL

Let us now understand the support provided by SQL to the concept of keys. We shall focus our discussion on primary and foreign keys, since they are the ones that are significant from a practical point of view. The other types of keys are more theoretical and, in fact, do not require any actual support from SQL.

3.6.7.1 Defining primary keys in SQL

SQL provides a constraint called PRIMARY KEY in order to define a primary key on a table.

As we already know, one table can have only one primary key. The value of a primary key cannot be empty (i.e. NULL).

Let us define the creation statement for the Supplier table, and its associated primary key.

CREATE TABLE Supplier

 (Supplier_ID CHAR (5) NOT NULL PRIMARY KEY, Name CHAR (30), Address CHAR (30))

As we can see, the PRIMARY KEY constraint defines that the supplier number should be treated as the primary key of this table. Defining primary keys in SQL is that simple!

How do we define a primary key on multiple columns? That is, how can we define a composite primary key? For this purpose, we need to apply the PRIMARY KEY clause to all the columns of the composite key together. Let us illustrate this with the example of the SP table.

CREATE TABLE SP

 (Supplier_ID CHAR (5) NOT NULL, Part_ID CHAR (5) NOT NULL, Quantity INTEGER,

PRIMARY KEY (Supplier_ID, Part_ID))

The primary key is now defined as a composite key on two columns. This definition is not linked to the definition of the individual columns. Instead, it is defined in the end as a separate constraint.

3.6.7.2 Defining foreign keys in SQL

SQL uses the FOREIGN KEY constraint to define foreign keys.

This constraint restricts the values that we can enter into a table to force a key and its corresponding foreign key to be in sync with each other. For example, consider the following table declaration.

CREATE TABLE SP

 (Supplier_ID CHAR (5) NOT NULL, Part_ID CHAR (5) NOT NULL, Quantity INTEGER,

PRIMARY KEY (Supplier_ID, Part_ID))

FOREIGN KEY (Supplier_ID) REFERENCES Supplier (Supplier_ID),

FOREIGN KEY (Part_ID) REFERENCES Part (Part_ID))

As before, the table declaration specifies the list of columns, their data types and the primary key. In addition, however, it also specifies two foreign key relationships. Let us dissect this.

FOREIGN KEY (Supplier_ID) REFERENCES Supplier (Supplier_ID)

This constraint specifies that in the SP table, Supplier_ID is a foreign key. Furthermore, it specifies that this foreign key references or corresponds to the Supplier_ID column of the Supplier table. We know that Supplier_ID is the primary key of the Supplier table.

We leave it to the reader to explain the other foreign key relationship.

Note that even if we omit the word Supplier_ID in reference to the Supplier table and write the foreign key as follows, it will be valid.

FOREIGN KEY (Supplier_ID) REFERENCES Supplier

The reason for this is that the Supplier_ID is named the same in the SP as well as Supplier tables. However, if the title of this column were, say Supplier_Number in the Supplier table, then we must be explicit, as follows:

FOREIGN KEY (Supplier_ID) REFERENCES Supplier (Supplier_Number)

How else would SQL know what is the equivalent of the Supplier_ID of the SP table in the Supplier table?

Going one step ahead, we can specify rules or actions regarding the relationships between the two tables based on the foreign keys. For example, suppose we have a supplier S1 in the Supplier table and that there are three rows for this supplier in the SP table. What will happen if we delete the row for S1 from the Supplier table? The three rows in the SP table would be in a hanging state, as shown in Fig. 3.18.

Database integrity ensures that only valid actions are possible against a database. Database constraints are useful in ensuring this. By using constraints, we can clearly specify what actions are possible against the database and what are not.

Fig. 3.18 Effect of removing key in one table corresponding to foreign key in another table

To take care of such problems, SQL has some provisions in the form of referential triggered actions. We can summarise them as follows.

• UPDATE rules: These rules specify what happens to a foreign key value when the primary key in the other table (also called as the parent key) is modified. For example, in the above case, a supplier ID, S1 in the Supplier table is changed — say to S9. We should be able to specify what will happen to the S1 records in the SP table.

DELETE rules: These rules specify what happens to a foreign key value when the row corresponding to the primary key in the other table (i.e. the parent key) is deleted. For example, in the above case, we should be able to specify what should happen to the S1 records in the SP table if we delete the row containing S1 in the Supplier table.

For the above two actions, SQL provides four effects, as summarised in Table3.35.

Table 3.35 Summary of effects of UPDATE/DELETE on foreign key constraints

 Effect Description CASCADE The foreign key should change to the new value of the parent key. For UPDATE operations, the foreign key value now becomes the same as the parent key value. For example, in our case, if S1 in Supplier table is changed to S9, all the S1 rows in the SP table will also be changed to S9. For DELETE operations, the row in the dependent table containing the foreign key value will be deleted. For example, in our case, if the row containing S1 as the Supplier_ID in Supplier table is deleted, all the S1 rows in the SP table will also be deleted automatically. SET NULL The foreign key value is set to NULL. Of course, if the foreign key column has a NOT NULL constraint, then this rule will be violated and an error would be flagged. SET DEFAULT The foreign key value is set to the specified DEFAULT value (SQL provides for this feature). If no DEFAULT value is specified, it will be set to NULL. Of course, if the foreign key column has a NOT NULL constraint or has no DEFAULT value specified, then this rule will be violated and an error would be flagged. NO ACTION There would be no automatic action taken, and the UPDATE/DELETE statement would be rejected if this leaves the foreign key in an inconsistent state.

Let us add these effects to our table definitions to understand things more clearly. In an example we shall specify that if the parent key in the Supplier table is updated, then the same change should be carried out in the SP table. However, if the row containing the parent key in the Supplier table is deleted, then the corresponding foreign key in the SP table should be set to NULL. This is how the new definition of the SP table would look like.

CREATE TABLE SP

 (Supplier_ID CHAR (5) NOT NULL, Part_ID CHAR (5) NOT NULL, Quantity INTEGER,

PRIMARY KEY (Supplier_ID, Part_ID))

FOREIGN KEY (Supplier_ID) REFERENCES Supplier (Supplier_ID),

ON DELETE SET NULL,

FOREIGN KEY (Part_ID) REFERENCES Part (Part_ID))

Let us understand the significance of the FOREIGN KEY clause. We state that the Supplier_ID column in the SP table depends on the Supplier_ID in the Supplier table. If we update any Supplier_ID in the Supplier table, the same change should be propagated to the SP table. However, if we delete the row containing a particular Supplier_ID in the Supplier table, then we should not delete the row for that supplier from the SP table. Instead, we should change the value of the Supplier_ID column for this row to null.

We leave it to the reader to describe other FOREIGN KEY constraints in a similar fashion.

3.7 ENTITY AND REFERENTIAL INTEGRITY

There is (yet) another way of classifying integrity constraints based on the concepts of primary key and foreign key. This classification considers two types: entity integrity and referential integrity. This is shown in Fig. 3.19.

Fig. 3.19 Database integrity types

Let us discuss these in brief.

3.7.1 Entity Integrity

We know that the rows in a table represent real-world entities (such as an employee, a student, a supplier, a part, etc). The key value uniquely identifies every row in the table. Thus, if the users of the database want to retrieve any row from a table or to perform any action on that row, they must know the value of the key for that row. This means that cannot represent any entity in the database unless we can identify it completely, based on the value of a key. Thus, we cannot allow the key (or any part of the key, if it is a composite key) to contain a null value. This is captured in the entity integrity rule, as follows.

No key column of any row in a table can have a null value.

3.7.2 Referential Integrity

We have discussed foreign keys in great detail. We know that when multiple tables need to be related or linked with each other, we must use foreign keys for doing so. For instance, we had linked the Supplier and SP tables based on the Supplier_ID, and the Part and SP tables based on the Part_ID. It is, therefore, extremely important that:

• The Supplier_ID in any row of the SP table corresponds to a Supplier_ID of the Supplier table.
• The Part_ID in any row of the SP table corresponds to a Part_ID of the Part table.

If such rules are violated, the relation between the tables would be broken. A database in which all the foreign keys either contain null or valid references to other tables is said to observe the referential integrity rule. We can summarise the referential integrity rule, as follows.

Every foreign key must contain a null value or a valid key reference in another table.

3.8 VIEWS

3.8.1 What is a View?

A view is a logical table that derives its data from other tables.

A view does not contain any data of its own. Its contents are taken from other tables through the execution of a query. The other tables that provide data to a view are called base tables. Base tables usually contain data. As views do not contain any data, when we execute a query on a view, the corresponding base tables are consulted, the appropriate data are fetched, and the view is temporarily populated. When the lifetime of the query is over, the data in the view is discarded.

Thus, a view can be considered as a window to one or more base tables. We can view data in these base tables by using a view. Therefore, we can consider a view as very similar to a query.

Let us consider an example. In SQL, we can create a view by using a CREATE VIEW statement. Let us create a view on the Salespeople table as follows:

CREATE VIEW PuneSalesTeam

AS SELECT Snum, Sname, City, Commission

FROM Salespeople

WHERE City = ‘Pune

When we execute this statement, there is no output. Instead, SQL simply informs us that a view named PuneSalesTeam has been created in the database. It is only when we make an attempt to select data from the view that the DBMS looks at the Salespeople table, brings out the rows for the salespeople working in Pune, fills the view with this information, and shows it to us.

In mathematical terms, a view is a named expression in relational algebra. When we execute a statement for creating a view, the relational algebra statement is not executed by the DBMS. It is simply remembered. From a user's point of view, PuneSalesTeam contains data on the sales staff in Pune. However, from the DBMS perspective, the view is empty.

For retrieving data from the view just created, we can use a simple SQL statement as follows.

SELECT * FROM PuneSalesTeam

In response, we will be provided information/data on the salespeople working in Pune.

Why do we create views when we can retrieve data from the base tables? The main reason is access control. We may wish to show only data pertaining to Pune employees to certain users. In such a case, we can allow them to access the view, but not the base table. Although we shall study this later, the idea should be simple to understand. This has many other interesting implications as well. For example, consider the following view definition.

CREATE VIEW SalesView

AS SELECT Sname, City

FROM Salespeople

WHERE City = ‘Pune’

Note that we have selected just two columns from the base table. Let us now attempt to execute the following SQL statement:

SELECT commission FROM SalesView

We have not made the commission column a part of the view. Therefore, the above query will fail. The DBMS will tell us that there is no such column as commission in the SalesView view. This is another facet of the access control feature.

Think of a key as a column that can open a table, just as a key in real life can open a lock. Various types of keys can be defined in a database.

Similarly, the following query would not produce any output:

SELECT Sname

FROM SalesView

WHERE City = ‘Delhi’

Note that only the salespeople from Pune are a part of the view. Hence, there is no question of being able to find any suppliers for any other city in it.

3.8.2 Updating Data through Views

Can data be updated through views? For example, consider the following view definition.

CREATE VIEW SalesViewNew

AS SELECT Snum, Sname, City

FROM Salespeople

Now, let us execute an UPDATE statement on this view, as follows.

UPDATE SalesViewNew

SET City = ‘Kanpur’

WHERE SID = 104

This statement will be executed successfully. However, recall that a view itself has no data. It always brings data from a base table. When we update a view, it updates data in the base table. Therefore, in this case, the data corresponding to the salesperson whose ID is 104 would be updated in the Salespeople table.

Let us consider another UPDATE statement.

UPDATE SalesViewNew

SET Commission = 50

WHERE SID = 104

This statement would produce any output, because there is no commission column in the view.

We can create views based on summary data. For example, consider the following query.

CREATE VIEW SalesSummary

AS SELECT City, Max (Commission)

FROM Salespeople

GROUP BY City

The resulting output is shown in Table 3.36.

Table 3.36 Use of GROUP BY clause

 City Max of Commission Pune 12 Mumbai 11 Delhi 14 Chennai 12

We cannot perform any UPDATE operation on this view. In general, an UPDATE statement cannot be executed on any view that is based on a GROUP BY clause.

Alternate key

Attribute constraints

Cardinality

Database constraints

Database integrity

Degree

Divide

Entity integrity

Foreign key

Grouping

Instance constraints

Join

Key

Outer join

Procedural constraints

Project

Referential triggered actions

Relational calculus

Secondary key

Superkey

Tuple

Union

Composite key

Database constraints

Declarative constraints

Difference

Domain

Equi-join

Functional dependency

Inner join

Intersection

Join

Natural join

Primary key

Product

Referential integrity

Relational algebra

Restrict

Stored procedures

Triggers

Type constrains

View

Atomicity

Candidate key

• A table in RDBMS is also called a relation.
• A row in RDBMS is also called a tuple.
• Domain defines a set of all the possible values for a column or a relation.
• RDBMS operations can be classified into relational algebra and relational calculus.
• Relational algebra is procedural.
• The main relational algebra operations are restrict, project, product, union, intersect, difference, join, and divide.
• Restrict operation returns rows that satisfy a specified condition.
• Project operation returns rows with specified columns.
• Product operation returns rows after combining two tables.
• Union operation returns all rows that appear in either or both of the tables.
• Intersect operation returns all rows that appear in both of two tables.
• Difference operator returns all rows that appear in the first but not in the second of two tables.
• Join operator returns rows after combining two tables based on common values.
• Divide operator returns all rows that appear after dividing one table by another.
• Grouping is a feature of relational algebra that facilitates retrieval of summary values.
• Relational calculus is a formal declarative query language. It is non-procedural.
• Database constraints can be used to ensure database integrity.
• Database constraints can be declarative or procedural.
• Examples of database constraints are type constraints, attribute constraints, instance constraints and database constraints.
• Type constraints specify the legal values for a type.
• Attribute constraints specify that a column is of a specified type.
• Instance constraints apply to a specific instance or condition.
• Database constraints group two or more instance constraints.
• Database keys allow identification of records.
• Database keys can be classified into superkey, key, composite key, candidate key, primary key, alternate/secondary key and foreign key.
• A superkey is a set of columns that identifies every row in a table.
• A key is a minimal set of key columns.
• A composite key is made up of two or more columns.
• If a table has two or more keys, they are the candidate keys.
• A primary key identifies every record in a table uniquely.
• An alternate key (also called as a secondary key) may or may not identify every record in a table uniquely.
• Foreign key relationship allows two or more tables to be related to each other.
• Entity integrity mandates that no primary key value can be null.
• Referential integrity specifies that every foreign key must contain a null or a valid primary key value.
• A view is a logical table derived from other tables.
• View definitions exist physically, but not the data in them.
• Views are used to provide access control and ease of query management.
• We can update data through views, with certain restrictions.

1. A table can also be called a relation.
2. A table is a set of tuples.
3. A domain is a set of all possible values in a row.
4. Every value in a row-column combination should be atomic in RDBMS.
5. In the Restrict operation, we select only the desired columns.
6. Product returns rows after combining two or more tables.
7. Relational calculus is a formal declarative language.
8. Database constraints group two or more instance constraints.
9. A composite key is a set of columns in one table.
10. Entity integrity specifies that a primary key column should not contain null.

1. The number of rows in a table is called as its _________.
(a) uniqueness
(b) dimension
(c) degree
(d) cardinality
2. The number of columns in a table is its ____________.
(a) degree
(b) cardinality
(c) relation
(d) tuple
3. _________ is a set of operations on RDBMS that allows retrieval of data.
(a) Relational algebra
(b) Cardinality
(c) Relational database theory
(d) Relational calculus
4. The operation of eliminating columns in a table is called ________.
(a) product
(b) project
(c) union
(d) none of the above
5. The result of _________ is the rows common to the rows produced by the individual queries.
(a) union
(b) intersection
(c) product
(d) query
6. ________ is a set of columns that uniquely identifies every row in a table.
(a) Key
(b) Superkey
(c) Foreign key
(d) Candidate key
7. A key consisting of two or more columns is called __________.
(a) composite key
(b) candidate key
(c) primary key
(d) alternate key
8. SQL provides a constraint called __________.
(a) Secondary key
(b) Primary key
(c) Database key
(d) key
9. Every _______ must contain a null or the value of another valid key.
(a) key
(b) foreign key
(c) primary key
(d) alternate key
10. ___________ is a logical table that derives data from other tables.
(a) Cursor
(b) Database
(c) Table
(d) View

1. Explain the concept of domains.
2. Explain the term relational algebra.
3. Explain the term relational calculus.
4. What is database integrity? Explain the term constraints with short descriptions.
5. Explain declarative and procedural constraints.
6. Describe a key and a superkey.
7. What are candidate key and composite key?
8. Explain the difference between primary key and foreign key.
9. What is referential integrity?
10. What are views? What is their significance?

 Account table Account_Number Numeric 5 Account_Name Alphabetic 25 Opening_balance Numeric 5 Opening_type Alphabetic 2 Journal table Account_Number Numeric 5 Transaction_date Date 8 Transaction_type Alphabetic 2 Transaction_Amount Numeric 5

The Account table specifies the account number, name and the opening balance along with its type (credit/debit). For example, an accountholder can have an opening credit balance of Rs. 10000, which means that the bank owes the accountholder Rs. 10000. Over a period of time, many transactions (debits/credits) happen on the account.

Based on this information, attempt the following.

1. Write SQL code to create the referential integrity constraint between the two tables, based on the Account_number as the common column.
2. Specify a constraint such that if an account number is deleted from the Account table, the corresponding records in the Journal table should also be deleted.
3. Check if the referential integrity condition is violated for any records in the Journal table. (Hint: Use the concept of outer join).
4. Find out the meaning of left, right and full outer joins. How would each one of them be relevant to this example?
5. Write an algorithm in plain English that will find out the current balances of all the accountholders. (Hint: Use a combination of 3GL programming and SELECT statements.)
6. Find out the number of transactions for every user on 4 January 2004. (Hint: Use the GROUP BY mechanism.)
7. How can we ensure that the Transaction_type column always contains a value of DR (for debit transactions) or CR (for credit transactions) only? (Hint: Think of constraints).
8. Find a list of accountholders who owe money to the bank (i.e. those having a debit balance). (Hint: Use a combination of 3GL programming and SELECT statements).
9. How would you enforce entity integrity in the Account table?
10. Create a view on the Journal table to allow selection of only the account numbers and amounts for all debit transactions.