|
|||||
Database
Management System
(CS403)
VU
Lecture No.
20
Reading
Material
"Database
Systems Principles, Design
and Implementation"
Section
7.7 7.10
written
by Catherine Ricardo, Maxwell
Macmillan.
"Database
Management Systems", 2nd edition, Raghu Ramakrishnan,
Johannes Gehrke,
McGraw-Hill
Overview of Lecture:
o Second
and Third Normal
Form
o Boyce -
Codd Normal Form
o Higher
Normal Forms
In the
previous lecture we have
discussed functional dependency,
the inference rules
and
the different normal forms.
From this lecture we will
study in length the
second
and
third normal forms.
Second
Normal Form
A
relation is in second normal
form if and only if it is in
first normal form and
all
nonkey
attributes are fully
functionally dependent on the
key. Clearly if a relation
is
in 1NF
and the key consists of a
single attribute, the
relation is automatically
2NF.
The
only time we have to be concerned
2NF is when the key is
composite. A relation
that is
not in 2NF exhibits the
update, insertion and
deletion anomalies we will
now
see it
with an example. Consider
the following
relation.
CLASS
(crId, stId, stName, fId,
room, grade)
crId,
stId
stName,
fId, room, grade
stId
stName
crId
fId,
room
Now in
this relation the key is
course ID and student ID.
The requirement of 2NF
is
that
all non-key attributes
should be fully dependent on
the key there should be
no
169
Database
Management System
(CS403)
VU
partial
dependency of the attributes.
But in this relation student
ID is dependent on
student
name and similarly course ID
is partially dependent on faculty ID
and room,
so it is
not in second normal form.
At this level of normalization,
each column in a
table
that is not a determiner of
the contents of another column must
itself be a
function
of the other columns in the
table. For example, in a
table with three
columns
containing
customer ID, product sold,
and price of the product
when sold, the
price
would be
a function of the customer ID (entitled
to a discount) and the
specific
product.
If a relation is not in 2NF
then there are some
anomalies, which are as
under:
-
·
Redundancy
·
Insertion
Anomaly
·
Deletion
Anomaly
·
Updation
Anomaly
The
general requirements of 2NF
are:-
·
Remove
subsets of data that apply to
multiple rows of a table and
place them
in
separate rows.
·
Create
relationships between these
new tables and their
predecessors through
the
use of foreign keys.
Consider
the following table which
has the anomalies:-
crId
StId
stName
fId
room
grade
C3456
S1020
Suhail
Dar
F2345
104
B
C5678
S1020
Suhail
Dar
F4567
106
C3456
S1038
Shoaib
Ali
F2345
104
A
C5678
S1015
Tahira
Ejaz
F4567
106
B
Now
the first thing is that
the table is in 1NF because
there are no duplicate
values in
any
tuple and all cells
contain atomic value. The
first thing is the
redundancy. Like in
this
table of CLASS the course ID
C3456 is being repeated for
faculty ID F2345 and
similarly
the room no 104 is being
repeated twice. Second is the insertion
anomaly.
Suppose we
want to insert a course in
the table, but this
course has not been
registered
to any
student. But we cannot enter
the student ID, because no
student has
registered
this
course yet. So we can also
not insert this course.
This is called as
insertion
anomaly
which is wrong state of
database. Next is the
deletion anomaly. Suppose
there is
a course which has been
enrolled by one student
only. Now due to
some
170
Database
Management System
(CS403)
VU
reason,
we want to delete the record
of student. But here the
information about the
course
will also be deleted, so in this
way this is the incorrect
state of database in
which
infact we want to delete the
information about the
student record but along
with
this
the course information has
also been deleted. So it is not
reflecting the actual
system.
Now the next is updation
anomaly. Suppose a course has been
registered by
50 students
and now we want to change
the class rooms of all the
students. So in this
case we
will have to change the records of all
the 50 students. So this is again
a
deletion
anomaly. The process for
transforming a 1NF table to
2NF is:
·
Identify
any determinants other than
the composite key, and
the columns they
determine.
·
Create
and name a new table
for each determinant and
the unique columns it
determines.
·
Move
the determined columns from
the original table to the
new table. The
determinate
becomes the primary key of
the new table.
·
Delete
the columns you just
moved from the original
table except for
the
determinant
which will serve as a foreign
key.
·
The
original table may be renamed to
maintain semantic meaning.
Now to
remove all these anomalies
from the table we will have
to decompose this
table,
into different tables as
under:
CLASS
(crId, stId, stName, fId,
room, grade)
crId,
stId
stName,
fId, room, grade
stId
stName
crId
fId,
room
Now
this table has been decomposed
into three tables as
under:-
STD
(stId, stName)
COURSE
(crId, fId, room)
CLASS
(crId, stId, grade)
So now
these three tables are in
second normal form. There
are no anomalies
available
now in this form and we say
this as 2NF.
Third
Normal Form
A
relational table is in third
normal form (3NF) if it is
already in 2NF and
every
non-key column is non-transitively
dependent upon its primary
key. In
171
Database
Management System
(CS403)
VU
other
words, all nonkey attributes
are functionally dependent only
upon the
primary
key.
Transitive
Dependency
Transitive
dependency is one that carries
over another attribute.
Transitive
dependency
occurs when one non-key
attribute determines another
non-key
attribute.
For third normal form we
concentrate on relations with
one
candidate
key, and we eliminate
transitive dependencies.
Transitive
dependencies
cause insertion, deletion,
and update anomalies. We will
now
see it
with an example:-
STD(stId,
stName, stAdr, prName,
prCrdts)
stId
stName,
stAdr, prName,
prCrdts
prName
prCrdts
Now here
the table is in second
normal form. As there is no
partial dependency of
any
attributes
here. The key is student ID .
The problem is of transitive
dependency in
which a
non-key attribute can be
determined by a non-key attribute.
Like here the
program
credits can be determined by
program name, which is not in
3NF. It also
causes
same four anomalies, which
are due to transitive dependencies.
For Example:-
STUDENT
stId
stName
stAdr
prName
prCrdts
S1020
Sohail
Dar
I-8
Islamabad
MCS
64
S1038
Shoaib
Ali
G-6
Islamabad
BCS
132
S1015
Tahira
Ejaz
L Rukh
Wah
MCS
64
S1015
Tahira
Ejaz
L Rukh
Wah
MCS
64
S1018
Arif
Zia
E-8,
BIT
134
Islamabad.
Now in
this table all the
four anomalies are exists in
the table. So we will have
to
remove
these anomalies by decomposing
this table after removing
the transitive
dependency.
We will see it as under: -
STD
(stId, stName, stAdr,
prName, prCrdts)
stId
stName,
stAdr, prName,
prCrdts
prName
prCrdts
The
process of transforming a table
into 3NF is:
172
Database
Management System
(CS403)
VU
·
Identify
any determinants, other the
primary key, and the
columns they
determine.
·
Create
and name a new table
for each determinant and
the unique columns it
determines.
·
Move
the determined columns from
the original table to the
new table. The
determinate
becomes the primary key of
the new table.
·
Delete
the columns you just
moved from the original
table except for
the
determinate
which will serve as a foreign
key.
·
The
original table may be renamed to
maintain semantic meaning.
STD
(stId, stName, stAdr,
prName)
PROGRAM
(prName, prCrdts)
We have
now decomposed the relation
into two relations of
student and program.
So
the
relations are in third normal
form and are free of
all the anomalies
Boyce -
Codd Normal Form
A
relation is in Boyce-Codd normal
form id and only if every
determinant is a
candidate
key. A relation R is said to be in BCNF
if whenever X -> A holds in
R, and
A is not
in X, then X is a candidate key
for R. It should
be noted that most
relations
that
are in 3NF are also in
BCNF. Infrequently, a 3NF
relation is not in BCNF
and
this
happens only if
(a)
the candidate keys in the
relation are composite keys
(that is, they are
not single
attributes),
(b)
there
is
more
than
one
candidate
key
in
the
relation,
and
(c)
the keys are not
disjoint, that is, some
attributes in the keys are
common.
The
BCNF differs from the
3NF only when there
are more than one
candidate keys
and
the keys are composite
and overlapping. Consider
for example, the
relationship:
enrol
(sno, sname, cno, cname,
date-enrolled)
Let us
assume that the relation
has the following candidate
keys:
173
Database
Management System
(CS403)
VU
(sno,cno)
(sno,cname)
(sname,cno)
(sname,
cname)
(we
have assumed sname and
cname are unique identifiers).
The relation is in
3NF
but
not in BCNF because there
are dependencies
sno ->
sname
cno ->
cname
Where
attributes are part of a
candidate key are dependent
on part of another
candidate
key. Such dependencies indicate that
although the relation is
about some
entity or
association that is identified by
the candidate keys e.g.
(sno, cno), there are
attributes
that are not about
the whole thing that
the keys identify. For
example, the
above
relation is about an association
(enrolment) between students and subjects
and
therefore
the relation needs to
include only one identifier
to identify students and
one
identifier
to identify subjects. Provided that
two identifiers about the
students (sno,
sname)
and two keys about subjects
(cno, cname) means that some
information about
the
students and subjects which is not needed
is being provided. This
provision of
information
will result in repetition of information
and the anomalies that
we
discussed
at the beginning of this
chapter. If we wish to include
further information
about
students and courses in the
database, it should not be
done by putting the
information
in the present relation but by
creating new relations that
represent
information
about entities student and
subject.
These
difficulties may be overcome by
decomposing the above
relation in the
following
three relations:
(sno,
sname)
(cno,
cname)
(sno,
cno, date-of-enrolment)
174
Database
Management System
(CS403)
VU
We now
have a relation that only
has information about students,
another only about
subjects
and the third only
about enrolments. All the
anomalies and repetition
of
information
have been removed.
Higher
Normal Forms
After
BCNF are the fourth, a
fifth and domain key
normal form exists. Although
till
BCNF
normal form tables are in
required form, but if we
want we can move on
to
fourth
and fifth normal forms as
well. 4NF deals with
multivalued dependency,
fifth
deals
with possible loss less
decompositions; DKNF reduces
further chances of
any
possible
inconsistency.
Summary
The
goal of normalization is to create a
set of relational tables
that are free of
redundant
data and that can be
consistently and correctly
modified. This means
that
all
tables in a relational database
should be in the third
normal form (3NF). A
relational
table is in 3NF if and only
if all non-key columns are
(a) mutually
independent
and (b) fully dependent
upon the primary key.
Mutual independence
means
that no non-key column is
dependent upon any
combination of the
other
columns.
The first two normal
forms are intermediate steps to
achieve the goal of
having
all tables in 3NF. In order
to better understand the 2NF
and higher forms, it
is
necessary
to understand the concepts of
functional dependencies and loss
less
decomposition.
Exercise:
The
tables of Examination System which
were brought in 1NF in
previous lecture
bring
those tables into 2 and
3NF.
175
Table of Contents:
|
|||||