|
|||||
Database
Management System
(CS403)
VU
Lecture No.
23
Reading
Material
"Database
Systems Principles, Design
and Implementation" written by
Catherine Ricardo,
Maxwell
Macmillan.
"Database
Management Systems", 2nd edition, Raghu Ramakrishnan,
Johannes Gehrke,
McGraw-Hill
Overview of
Lecture
Physical
Record and
De-normalization
Partitioning
In the
previous lecture, we have
studied different data types
and the coding
techniques.
We have
reached now on implementing our
database in which relations
are now
normalized.
Now we will make this
database efficient form
implementation point of
view.
Physical
Record and
Denormalization
Denormalization
is a technique to move from
higher to lower normal forms
of
database
modeling in order to speed up
database access. Denormalization
process is
applied
for deriving a physical data
model from a logical form.
In logical data base
design we
group things logically
related through same primary
key. In physical
database
design fields are grouped, as
they are stored physically and
accessed by
DBMS. In
general it may decompose one
logical relation into
separate physical
records,
combine some or do both.
There is a valid reason for
denormalization that is
to enhance
the performance. However,
there are several indicators,
which will help to
identify
systems, and tables, which
are potential denormalization candidates.
These
are:
Many
critical queries and reports
exist which rely upon data
from more than one
table.
Often
times these requests need to be
processed in an on-line
environment.
Repeating
groups exist which need to be
processed in a group instead of
individually.
Many
calculations need to be applied to one or
many columns before queries
can be
successfully
answered.
Tables
need to be accessed in different ways by
different users during the
same
timeframe.
Certain
columns are queried a large
percentage of the time. Consider
60% or greater
to be a
cautionary number flagging
denormalization as an option.
187
Database
Management System
(CS403)
VU
We should
be aware that each new
RDBMS release usually bring
enhanced
performance
and improved access options
that may reduce the need
for
denormalization.
However, most of the popular
RDBMS products on occasion will
require
denormalized data structures. There are
many different types of
denormalized
tables,
which can resolve the
performance problems caused
when accessing fully
normalized
data. Denormalization must balance the
need for good system
response
time
with the need to maintain data,
while avoiding the various
anomalies or problems
associated
with denormalized table structures.
Denormalization goes
hand-in-hand
with
the detailed analysis of
critical transactions through
view analysis. View
analysis
must include the specification of
primary and secondary access
paths for
tables
that comprise end-user views of
the database. A fully
normalized database
schema
can fail to provide adequate
system response time due to
excessive table join
operations
Denormalization
Situation 1:
Merge
two Entity types into
one with one to one
relationship. Even if one of
the entity
type is
optional, so joining can
lead to wastage of storage, however if
two accessed
together
very frequently their
merging might be a wise
decision. So those two
relations
must be merged for better
performance, which have one
to one relationship.
Denormalization
Situation 2:
Many to
many binary relationships mapped to
three relations. Queries needing
data
from
two participating ETs need
joining of three relations
that is expensive. Join is
an
expensive
operation from execution
point of view. It takes time
and lot of resources.
Now
suppose there are two
relations STUDENT and COURSE
and there exits a
many
to many
relationship in between them. So
there are three relations
STUDENT,
COURSE
and ENROLLED in between
them. Now if we want to see
that a student
has
enrolled how many courses.
So to get this we will have to
join three relations,
first
the
STUDENT and ENROLLED and
then joining it with COURSE,
which is quite
expensive.
The relation created against
relationship is merged with
one of the relation
created
against participating ETs.
Now the join operation will
be performed only once.
Consider
the following many to many
relationship:-
EMP
(empID, eName,pjId,Sal)
PROJ
(pjId,pjName)
WORK
(empId.pjId,dtHired,Sal)
This is a
many to many relationship in
between EMP and PROJ
with a relationship of
WORK. So
now if we by de-normalizing these
relations and merge the
WORK
relation
with PROJ relation, which is
comparatively smaller one.
But in this case it
is
violating
2NF and anomalies of 2NF
would be there. But there
would be only one
join
operation
involved by joining two
tables, which increases the
efficiency.
EMP
(empID, eName,pjId,Sal)
PROJ
(pjId,pjName, empId,dtHired,Sal)
So now it
is up to you that you want
to weigh the drawbacks and
advantages of
denormalization.
Denormalization
Situation 3:
Reference
Data: One to many situation
when the ET on side does
not participate in
any
other relationship, then
many side ET is appended
with reference data rather
than
the
foreign key. In this case
the reference table should
be merged with the main
table.
188
Database
Management System
(CS403)
VU
We can
see it with STUDENT and
HOBBY relations. One student
can have one
hobby
and one hobby can be
adopted by many students. Now in
this case the
hobby
can be
merged with the student
relation. So in this case
although redundancy of
data
would be
there, but there would
not be any joining of two
relations, which will have
a
better
performance.
Partitioning
De-normalization
leads to merging different relations,
whereas partitioning
splits
same
relation into two. The
general aims of data partitioning and
placement in
database
are to
1. Reduce
workload (e.g. data access,
communication costs, search space)
2.
Balance
workload
3. Speed
up the rate of useful work
(e.g. frequently accessed
objects in main
memory)
There
are two types of
partitioning:-
Horizontal
Partitioning
Vertical
Partitioning
Horizontal
Partitioning:
Table is
split on the basis of rows,
which means a larger table
is split into smaller
tables.
Now the advantage of this is
that time in accessing the
records of a larger table
is much
more than a smaller table.
It also helps in the
maintenance of tables,
security,
authorization
and backup. These smaller
partitions can also be
placed on different
disks to
reduce disk contention. Some of the
types of horizontal partitioning are
as
under:-
Range
Partitioning:
In this
type of partitioning range is
imposed on any particular
attribute. So in this
sway
different partitions are made on
the basis of those ranges
with the help of
select
statement.
For Example for those students whose ID
is from 1-1000 are in
partition 1
and so
on. This will improve the
overall efficiency of the
database. In range partition
the
partitions may become unbalanced. So in
this way few partitions
may be
overloaded.
Hash
Partitioning:
It is a
type of horizontal partitioning. In
this type particular
algorithm is applied
and
DBMS
knows that algorithm. So
hash partitioning reduces
the chances of
unbalanced
partitions
to a large extent.
List
Partitioning:
In this
type of partitioning the
values are specified for
every partition. So there is
a
specified
list for all the
partitions. So there is no range
involved in this rather
there is a
list of
values.
Summary:
De-normalization
can lead to improved processing
efficiency. The objective is
to
improve
system response time without
incurring a prohibitive amount of
additional
189
Database
Management System
(CS403)
VU
data
maintenance requirements. This is
especially important for
client-server
systems.
Denormalization requires thorough
system testing to prove the
effect that
denormalized
table structures have on processing
efficiency. Furthermore, unseen
ad
hoc
data queries may be
adversely affected by
denormalized table
structures.
Denormalization must be accomplished in
conjunction with a
detailed
analysis
of the tables required to
support various end-user views of
the database. This
analysis
must include the identification of
primary and secondary access
paths to data.
Similarly
before carrying out
partitioning of the table
thorough analysis of
the
relations
is must.
Exercise:
Critically
examine the tables drawn
for Examination system and
see if there is a
requirement
of denormalization and partitioning
and then carry out
the process.
190
Table of Contents:
|
|||||