Second, Third Normal Form, Boyce - Codd Normal Form, Higher Normal Forms

<< Functional Dependency, Inference Rules, Normal Forms

Normalization Summary, Example, Physical Database Design >>

Database Management System (CS403)

Lecture No. 20

Reading Material

"Database Systems Principles, Design and Implementation"

Section 7.7 7.10

written by Catherine Ricardo, Maxwell Macmillan.

"Database Management Systems", 2nd edition, Raghu Ramakrishnan, Johannes Gehrke,

McGraw-Hill

Overview of Lecture:

o Second and Third Normal Form

o Boyce - Codd Normal Form

o Higher Normal Forms

In the previous lecture we have discussed functional dependency, the inference rules

and the different normal forms. From this lecture we will study in length the second

and third normal forms.

Second Normal Form

A relation is in second normal form if and only if it is in first normal form and all

nonkey attributes are fully functionally dependent on the key. Clearly if a relation is

in 1NF and the key consists of a single attribute, the relation is automatically 2NF.

The only time we have to be concerned 2NF is when the key is composite. A relation

that is not in 2NF exhibits the update, insertion and deletion anomalies we will now

see it with an example. Consider the following relation.

CLASS (crId, stId, stName, fId, room, grade)

crId, stId

stName, fId, room, grade

stId

stName

crId

fId, room

Now in this relation the key is course ID and student ID. The requirement of 2NF is

that all non-key attributes should be fully dependent on the key there should be no

169

Database Management System (CS403)

partial dependency of the attributes. But in this relation student ID is dependent on

student name and similarly course ID is partially dependent on faculty ID and room,

so it is not in second normal form. At this level of normalization, each column in a

table that is not a determiner of the contents of another column must itself be a

function of the other columns in the table. For example, in a table with three columns

containing customer ID, product sold, and price of the product when sold, the price

would be a function of the customer ID (entitled to a discount) and the specific

product. If a relation is not in 2NF then there are some anomalies, which are as under:

�

Redundancy

�

Insertion Anomaly

�

Deletion Anomaly

�

Updation Anomaly

The general requirements of 2NF are:-

�

Remove subsets of data that apply to multiple rows of a table and place them

in separate rows.

�

Create relationships between these new tables and their predecessors through

the use of foreign keys.

Consider the following table which has the anomalies:-

crId

StId

stName

fId

room

grade

C3456

S1020

Suhail Dar

F2345

104

C5678

S1020

Suhail Dar

F4567

106

C3456

S1038

Shoaib Ali

F2345

104

C5678

S1015

Tahira Ejaz

F4567

106

Now the first thing is that the table is in 1NF because there are no duplicate values in

any tuple and all cells contain atomic value. The first thing is the redundancy. Like in

this table of CLASS the course ID C3456 is being repeated for faculty ID F2345 and

similarly the room no 104 is being repeated twice. Second is the insertion anomaly.

Suppose we want to insert a course in the table, but this course has not been registered

to any student. But we cannot enter the student ID, because no student has registered

this course yet. So we can also not insert this course. This is called as insertion

anomaly which is wrong state of database. Next is the deletion anomaly. Suppose

there is a course which has been enrolled by one student only. Now due to some

170

Database Management System (CS403)

reason, we want to delete the record of student. But here the information about the

course will also be deleted, so in this way this is the incorrect state of database in

which infact we want to delete the information about the student record but along with

this the course information has also been deleted. So it is not reflecting the actual

system. Now the next is updation anomaly. Suppose a course has been registered by

50 students and now we want to change the class rooms of all the students. So in this

case we will have to change the records of all the 50 students. So this is again a

deletion anomaly. The process for transforming a 1NF table to 2NF is:

�

Identify any determinants other than the composite key, and the columns they

determine.

�

Create and name a new table for each determinant and the unique columns it

determines.

�

Move the determined columns from the original table to the new table. The

determinate becomes the primary key of the new table.

�

Delete the columns you just moved from the original table except for the

determinant which will serve as a foreign key.

�

The original table may be renamed to maintain semantic meaning.

Now to remove all these anomalies from the table we will have to decompose this

table, into different tables as under:

CLASS (crId, stId, stName, fId, room, grade)

crId, stId

stName, fId, room, grade

stId

stName

crId

fId, room

Now this table has been decomposed into three tables as under:-

STD (stId, stName)

COURSE (crId, fId, room)

CLASS (crId, stId, grade)

So now these three tables are in second normal form. There are no anomalies

available now in this form and we say this as 2NF.

Third Normal Form

A relational table is in third normal form (3NF) if it is already in 2NF and

every non-key column is non-transitively dependent upon its primary key. In

171

Database Management System (CS403)

other words, all nonkey attributes are functionally dependent only upon the

primary key.

Transitive Dependency

Transitive dependency is one that carries over another attribute. Transitive

dependency occurs when one non-key attribute determines another non-key

attribute. For third normal form we concentrate on relations with one

candidate key, and we eliminate transitive dependencies. Transitive

dependencies cause insertion, deletion, and update anomalies. We will now

see it with an example:-

STD(stId, stName, stAdr, prName, prCrdts)

stId

stName, stAdr, prName, prCrdts

prName

prCrdts

Now here the table is in second normal form. As there is no partial dependency of any

attributes here. The key is student ID . The problem is of transitive dependency in

which a non-key attribute can be determined by a non-key attribute. Like here the

program credits can be determined by program name, which is not in 3NF. It also

causes same four anomalies, which are due to transitive dependencies. For Example:-

STUDENT

stId

stName

stAdr

prName

prCrdts

S1020

Sohail Dar

I-8 Islamabad

MCS

S1038

Shoaib Ali

G-6 Islamabad

BCS

132

S1015

Tahira Ejaz

L Rukh Wah

MCS

S1015

Tahira Ejaz

L Rukh Wah

MCS

S1018

Arif Zia

E-8,

BIT

134

Islamabad.

Now in this table all the four anomalies are exists in the table. So we will have to

remove these anomalies by decomposing this table after removing the transitive

dependency. We will see it as under: -

STD (stId, stName, stAdr, prName, prCrdts)

stId

stName, stAdr, prName, prCrdts

prName

prCrdts

The process of transforming a table into 3NF is:

172

Database Management System (CS403)

�

Identify any determinants, other the primary key, and the columns they

determine.

�

Create and name a new table for each determinant and the unique columns it

determines.

�

Move the determined columns from the original table to the new table. The

determinate becomes the primary key of the new table.

�

Delete the columns you just moved from the original table except for the

determinate which will serve as a foreign key.

�

The original table may be renamed to maintain semantic meaning.

STD (stId, stName, stAdr, prName)

PROGRAM (prName, prCrdts)

We have now decomposed the relation into two relations of student and program. So

the relations are in third normal form and are free of all the anomalies

Boyce - Codd Normal Form

A relation is in Boyce-Codd normal form id and only if every determinant is a

candidate key. A relation R is said to be in BCNF if whenever X -> A holds in R, and

A is not in X, then X is a candidate key for R. It should be noted that most relations

that are in 3NF are also in BCNF. Infrequently, a 3NF relation is not in BCNF and

this happens only if

(a) the candidate keys in the relation are composite keys (that is, they are not single

attributes),

(b)

there

than

one

candidate

key

the

relation,

and

The BCNF differs from the 3NF only when there are more than one candidate keys

and the keys are composite and overlapping. Consider for example, the relationship:

enrol (sno, sname, cno, cname, date-enrolled)

Let us assume that the relation has the following candidate keys:

173

Database Management System (CS403)

(sno,cno)

(sno,cname)

(sname,cno)

(sname, cname)

(we have assumed sname and cname are unique identifiers). The relation is in 3NF

but not in BCNF because there are dependencies

sno -> sname

cno -> cname

Where attributes are part of a candidate key are dependent on part of another

candidate key. Such dependencies indicate that although the relation is about some

entity or association that is identified by the candidate keys e.g. (sno, cno), there are

attributes that are not about the whole thing that the keys identify. For example, the

above relation is about an association (enrolment) between students and subjects and

therefore the relation needs to include only one identifier to identify students and one

identifier to identify subjects. Provided that two identifiers about the students (sno,

sname) and two keys about subjects (cno, cname) means that some information about

the students and subjects which is not needed is being provided. This provision of

information will result in repetition of information and the anomalies that we

discussed at the beginning of this chapter. If we wish to include further information

about students and courses in the database, it should not be done by putting the

information in the present relation but by creating new relations that represent

information about entities student and subject.

These difficulties may be overcome by decomposing the above relation in the

following three relations:

(sno, sname)

(cno, cname)

(sno, cno, date-of-enrolment)

174

Database Management System (CS403)

We now have a relation that only has information about students, another only about

subjects and the third only about enrolments. All the anomalies and repetition of

information have been removed.

Higher Normal Forms

After BCNF are the fourth, a fifth and domain key normal form exists. Although till

BCNF normal form tables are in required form, but if we want we can move on to

fourth and fifth normal forms as well. 4NF deals with multivalued dependency, fifth

deals with possible loss less decompositions; DKNF reduces further chances of any

possible inconsistency.

Summary

The goal of normalization is to create a set of relational tables that are free of

redundant data and that can be consistently and correctly modified. This means that

all tables in a relational database should be in the third normal form (3NF). A

relational table is in 3NF if and only if all non-key columns are (a) mutually

independent and (b) fully dependent upon the primary key. Mutual independence

means that no non-key column is dependent upon any combination of the other

columns. The first two normal forms are intermediate steps to achieve the goal of

having all tables in 3NF. In order to better understand the 2NF and higher forms, it is

necessary to understand the concepts of functional dependencies and loss less

decomposition.

Exercise:

The tables of Examination System which were brought in 1NF in previous lecture

bring those tables into 2 and 3NF.

175

Table of Contents: