|
|||||
Database
Management System
(CS403)
VU
Lecture No.
14
Reading
Material
"Database
Systems Principles, Design
and Implementation"
Section
6.1 6.3.3
written
by Catherine Ricardo, Maxwell
Macmillan.
"Database
Management Systems", 2nd edition,
Raghu
Ramakrishnan,
Johannes Gehrke,
McGraw-Hill
Overview of
Lecture
Logical
Database Design
o
Introduction
to Relational Data
Model
o
Basic
properties of a table
o
Mathematical
and database
relations
o
From
this lecture we are going to
discuss the logical database
design phase of
database
development process. Logical
database design, like
conceptual database
design is
our database design; it
represents the structure of data
that we need to store
to
fulfill the requirements of
the users or organization
for which we are developing
the
system.
However there are certain
differences between the two
that are presented in
the
table below:
Conceptual
Database Design
Logical
Database Design
Developed
in a semantic data model In legacy data
models (relational
1
(generally
E-R data model)
generally
in current age)
Free of
particular DBMS in
which
Free of data
model in which going to
be
2
going to
be implemented; many/any
implemented;
many/any possible
possible
Obtained
by
translating
the
3
Results
from Analysis Phase
conceptual
database
design
into
another
data model
124
Database
Management System
(CS403)
VU
4
Represented
graphically
Descriptive
5
More
expressive
Relatively
less expressive
Going to
be transformed and
then
6
Going to
be implemented
implemented
You
can think more, give a
try
Table 1:
Differences between Conceptual
and Logical Database
Designs
As we
have already discussed in
previous lectures and as is given in
row 2 of the
above
table, the conceptual
database design can be
transformed into any data
model,
like,
hierarchical, network, relational or
object-oriented. So the study of
the logical
database
design requires first
involves the study of the
data model/(s) that we
can
possibly
use for the purpose.
However, in the current age,
since early eighties,
the
most
popular choice for the
logical database design is
the relational data model;
so
much
popular that today it can be
considered to be the only
choice. Why? Because
of
its
features we are going to discuss in
today's lecture. That is why
rather than studying
different
data models we will be studying
only the relational data
model. Once we
study
this, the development of
logical database design is
transformation of conceptual
database
design to relational one and
the process is very simple
and straightforward.
So from
today's lecture our discussion
starts on the relational
data model. Just for
the
sake of
revision we repeat the definition of data
model "a set of constructs/tools
used
to
develop a database design;
generally consists of three components
which are
constructs,
manipulation language and
integrity constraints". We have
discussed it
earlier
that the later part of
the definition (three
components) fits precisely
with the
relational
data model (RDM), that is,
it has these components defined
clearly.
Relational
Data Model
The RDM
is popular due to its two
major strengths and they
are:
o Simplicity
o Strong
Mathematical Foundation
The RDM
is simple, why, there is
just one structure and
that is a relation or a
table.
Even
this single structure is
very easy to understand, so a
user of even of a
moderate
125
Database
Management System
(CS403)
VU
genius
can understand it easily.
Secondly, it has a strong
mathematical foundation
that
gives
many advantages, like:
o Anything
included/defined in RDM has got a precise
meaning since it is based
on mathematics, so
there is no confusion.
o If we
want to test something
regarding RDM we can test it
mathematically, if it
works
mathematically it will work with RDM
(apart from some
exceptions).
o The
mathematics not only provided
the RDM the structure
(relation) but also
well
defined manipulation languages
(relational algebra and
relational calculus).
o It
provided RDM certain boundaries, so
any modification or addition we
want to
make in
RDM, we have to see if it
complies with the relational
mathematics or
not. We
cannot afford to cross these
boundaries since we will be losing the
huge
advantages
provided by the mathematical
backup.
"An IBM
scientist E.F. Codd proposed
the relational data model in
1970. At that
time most
database systems were based
on one of two older data
models (the
hierarchical
model and the network
model); the relational model
revolutionized
the
database field and largely
replaced these earlier
models. Prototype
relational
database
management systems were developed in
pioneering research projects
at
IBM and
UC-Berkeley by the mid-70s,
and several vendors were
offering
relational
database products shortly
thereafter. Today, the
relational model is by
far
the dominant data model and
is the foundation for the
leading DBMS
products,
including IBM's DB2 family,
Informix, Oracle, Sybase,
Microsoft's
Access
and SQLServer, FoxBase, and
Paradox. Relational database
systems are
ubiquitous
in the marketplace and represent a
multibillion dollar industry"
[1]
The RDM
is mainly used for
designing/defining external and
conceptual schemas;
however
to some extent physical
schema is also specified in it.
Separation of
conceptual
and physical levels makes
data and schema manipulation
much easier,
contrary
to previous data models. So the
relational data model also truly
supports
"Three
Level Schema
Architecture".
Introduction
to the Relational Data
model
The RDM
is based on a single structure
and that is a relation.
Speaking in terms of the
E-R data
model, both the entity
types and relationships are
represented using relations
126
Database
Management System
(CS403)
VU
in RDM.
The relation in RDM is similar to
the mathematical relation
however
database
relation is also represented in a two
dimensional structure called
table. A
table
consists of rows and
columns. Rows of a table are
also called tuples. A row
or
tuple of
a table represents a record or an
entity instance, where as the
columns of the
table
represent the properties or
attributes.
stID
stName
clName
doB
sex
S001
M.
Suhail
MCS
12/6/84
M
S002
M.
Shahid
BCS
3/9/86
M
S003
Naila
S.
MCS
7/8/85
F
S004
Rubab A.
MBA
23/4/86
F
S005
Ehsan M.
BBA
22/7/88
M
Table 2:
A database relation represented in the
form of a table
In the
above diagram, a table is
shown that consists of five
rows and five
columns.
The
top most rows contain the
names of the columns or
attributes whereas the
rows
represent
the records or entity instances. There
are six basic properties of
the database
relations
which are:
·
Each
cell of a table contains
atomic/single value
A cell is
the intersection of a row
and a column, so it represents a
value of an
attribute
in a particular row. The
property means that the
value stored in a single
cell
is
considered as a single value. In
real life we see many
situations when a
property/attribute
of any entity contains
multiple values, like,
degrees that a person
has,
hobbies of a student, the
cars owned by a person, the
jobs of an employee. All
these
attributes have multiple
values; these values cannot
be placed as the value of
a
single
attribute or in a cell of the
table. It does not mean
that the RDM cannot
handle
such situations, however,
there are some special
means that we have to
adopt
in these
situations, and they can
not be placed as the value
of an attribute because an
attribute
can contain only a single
value. The values of
attributes shown in table
1
are
all atomic or single.
127
Database
Management System
(CS403)
VU
·
Each
column has a distinct name;
the name of the attribute it
represents
Each
column has a heading that is
basically the name of the
attribute that the
column
represents. It has to be unique,
that is, a table cannot
have duplicated
column/attribute
names. In the table 2 above,
the bold items in the
first row
represent
the column/attribute
names.
·
The
values of the attributes
come from the same
domain
Each
attribute is assigned a domain along
with the name when it is
defined. The
domain
represents the set of
possible values that an
attribute can have. Once
the
domain
has been assigned to an attribute, then
all the rows that
are added into
the
table
will have the values from
the same domain for
that particular column.
For
example,
in the table 2 shown above
the attribute doB (date of
birth) is assigned the
domain
"Date", now all the
rows have the date
value against the attribute
doB. This
attribute
cannot have a text or
numeric value.
·
The
order of the columns is
immaterial
If the
order of the columns in a
table is changed, the table
still remains the
same.
Order of
the columns does not
matter.
·
The
order of the rows is
immaterial
As with
the columns, if rows' order
is changed the table remains
the same.
·
Each
row/tuple/record is distinct, no two
rows can be same
Two
rows of a table cannot be
same. The value of even a
single attribute has to
be
different
that makes the entire
row distinct.
There are
three components of the RDM,
which are, construct
(relation), manipulation
language
(SQL) and integrity
constraints (two). We have
discussed the relation so
far;
the
last two components will be discussed
later. In the next section
we are going to
128
Database
Management System
(CS403)
VU
discuss
the mathematical relations
briefly that will help to
link the mathematical
relations
with the database relations
and will help in a better
understanding of the
later.
Mathematical
Relations
Consider
two sets
A = {x,
y}
B = {2,
4, 6}
Cartesian
product of these sets (A x B) is a
set that consists of ordered
pairs where
first
element of the ordered pair
belongs to set A where as
second element belongs
to
set B, as
shown below:
A X B=
{(x,2), (x,4), (x,6), (y,2),
(y,4), (y,6)}
A
relation is some subset of
this Cartesian product, For
example,
·
R1=
{(x,2), (y,2),(x,6),(x,4)}
·
R2 =
{(x,4), (y,6), (y,4)}
The
same notion of Cartesian
product and relations can be
applied to more than
two
sets,
e.g. in case of three sets, we will
have a relation of ordered
triplets
Applying
the same concept in a real
world scenario, consider two
sets Name and
Age
having
the elements:
·
Name =
{Ali, Sana, Ahmed,
Sara}
·
Age =
{15,16,17,18,.......,25}
Cartesian
product of Name &
Age
Name X
Age= {(Ali,15), (Sana,15),
(Ahmed,15), (Sara,15), ....,
(Ahmed,25),
(Sara,25)}
Now
consider a subset CLASS of
this Cartesian
product
CLASS =
{(Ali, 18), (Sana, 17), (Ali,
20), (Ahmed, 19)}
This
subset CLASS is a relation
mathematically, however, it may represent
a class in
the
real world where each
ordered pair represents a
particular student mentioning
the
name
and age of a student. In the
database context each
ordered pair represents a
tuple
and
elements in the ordered pairs represent
values of the attributes.
Think in this way,
if Name
and Age represent all
possible values for names
and ages of students,
then
any
class you consider that will
definitely be a subset of the Cartesian
product of the
Name
and Age. That is,
the name and age
combination of all the students of
any class
129
Database
Management System
(CS403)
VU
will be
included in the Cartesian
product and if we take out
particulars ordered
pairs
that
are related to a class then
that will be a subset of the
Cartesian product, a
relation.
Database
Relations
Let
A1, A2, A3, ..., An be some
attributes and D1, D2,
D3,..., Dn be their domains
A
relation
scheme relates certain attributes
with their domain in context
of a relation. A
relation
scheme can be represented
as:
R =
(A1:D1, A2:D2, ......, An:Dn),
for example,
STD
Scheme = (stId:Text, stName:
Text, stAdres:Text, doB:Date)
OR
STD(stId,
stName, stAdres, doB)
Whereas
the stId, stName, stAdres
and doB are the attribute
names and Text,
Text,
Text
and Date are their
respective domains. A database
relation as per this
relation
scheme
can be:
STD={(stId:S001,
stName:Ali,
stAdres:
Lahore,
doB:12/12/76),
(stId:S003,
stName:A.
Rehman, stAdres: RWP, doB:2/12/77)}
OR
STD={(S001,
Ali, Lahore, 12/12/76), (S003, A.
Rehman, RWP,
2/12/77)}
The
above relation if represented in a two
dimensional structure will be called a
table
as is
shown below:
stId
stName
stAdres
doB
S001
Ali
Lahore
12/12/76
S002
A. Rehman
RWP
2/12/77
With
this, today's lecture is
finished; the discussion on RDM will be
continued in the
next
lecture.
Summary
In this
lecture we have started the discussion on
the logical database design
that we
develop
from the conceptual database
design. The later is
generally developed
using
E-R data
model, whereas for the
former RDM is used. RDM is based on
the theory of
mathematical
relations; a mathematical relation is
subset of the Cartesian
product of
two or
more sets. Relations are
physically represented in the form of
two-dimensional
130
Database
Management System
(CS403)
VU
structure
called table, where
rows/tuples represent records and columns
represent the
attributes.
Exercise:
Define
different attributes (assigning
name and domain to each)
for an entity
STUDENT,
then apply the concept of
Cartesian product on the
domains of these
attributes,
then consider the records of
your class fellows and
see if it is the subset
of
the
Cartesian product.
131
Table of Contents:
|
|||||