|
|||||
![]() Introduction
to Computing CS101
VU
LESSON
37
DATABASE
SOFTWARE
Focus
of the last Lesson was on
Data Management
·
First
of a two-Lesson sequence
·
We
became familiar with the
issues and problems related to
data-intensive computing
·
We
also found out about
flat-file and tabular
storage
Data
Management
·
Keeping
track of a few dozen data
items is straight forward
·
However,
dealing with situations that
involve significant number of data items,
requires more
attention
to the data handling
process
·
Dealing
with millions - even billions - of
inter-related data items requires even
more careful
thought
Issues
in Data Management
Data
Entry
·
New
titles are added every
day
·
New
customers are being added
every day
·
That
new data needs to be added
accurately
Data
Updates
·
All
those actions require updates to
existing data
·
Those
changes need to be entered
accurately
Data
Security
·
All the
data that BholiBooks has in
its computer systems is quite
critical to its
operation
·
The
security of the customers' personal data is of utmost
importance. Hackers are always
looking
for
that type of data,
especially for credit card
numbers
·
This
problem can be managed by
using appropriate security
mechanisms that provide
access to
authorized
persons/computers only
·
Security
can also be improved
through:
Encryption
Private or virtual-private
networks
Firewalls
Intrusion detectors
Virus detectors
Data
Integrity
·
Integrity
refers to maintaining the correctness and
consistency of the data
Correctness:
Free from errors
Consistency: No
conflict among related data
items
·
Integrity
can be compromised in many
ways:
Typing
errors
Transmission
errors
Hardware
malfunctions
Program
bugs
Viruses
250
![]() Introduction
to Computing CS101
VU
Fire, flood,
etc.
Ensuring
Data Integrity
·
Type
Integrity
·
Limit
Integrity
·
Referential
Integrity
·
Physical
Integrity
Data
Accessibility
·
What
is required is that:
Data be
stored in an organized
manner
Additional
info about the data be
stored so that the data
access times are
minimized
·
A
solution to this concurrency
control problem:
Lock access to data while
someone is using it
DBMS
·
A
DBMS takes care of the
storage, retrieval, and
management of large data
sets on a database
·
It
provides SW tools needed to organize
& manipulate that data in a
flexible manner
·
It
includes facilities
for:
Adding,
deleting, and modifying
data
Making
queries about the stored
data
Producing
reports summarizing the required
contents
Database
·
A
collection of data organized in
such a fashion that the computer
can quickly search for a
desired
data
item
OS
Independence
·
It
provides an OS-independent view of the
data to the user, making
data manipulation and
management
much more convenient
What
can be stored in a
database?
·
As
long as it is digital data, it can be
stored:
Numbers, Booleans, text
Sounds
Images
Video
In
the very, very old days ...
·
Even
large amounts of data was
stored in text files, known
as flat-file
databases
·
All
related info was stored in a
single long, tab- or
comma-delimited text
file
·
Each
group of info called a
record
-
in that file was separated
by a special character; vertical bar
`|'
was a popular option
·
Each
record consisted of a group of fields,
each field containing some
distinct data item
The
Trouble with Flat-File
Databases
·
The
text file format makes it
hard to search for specific
info or to create reports that
include only
certain
fields from each
record
·
Reason:
One has to search
sequentially through the entire
file to gather desired info, such as
`all
books
by a certain author'
251
![]() Introduction
to Computing CS101
VU
·
However,
for small sets of data
say, consisting of several tens of kB
they can provide
reasonable
performance
Tabular
Storage: Features &
Possibilities
1.Similar
items of data form a
column
2.Fields
placed in a
particular row same as a
flat-file record
are strongly
interrelated
3.One
can sort the table w.r.t.
any column
4.That
makes searching e.g., for all the
books written by a certain
author straight
forward
5.Similarly,
searching for the 10 cheapest/most
expensive books can be easily
accomplished through a
sort
6.Effort
required for adding a new
field to all the records of a
flat-file is much greater than
adding a new
column
to the table
CONCLUSION:
Tabular
storage is better than
flat-file storage
We
will continue on with
tables' theme today
Today's
Lecture:
Database
SW
In
our 4th & final Lesson
on productivity software, we will
continue our discussion from
last week
·
on
data management
·
We
will find out about
relational databases
·
We
will also implement a simple
relational database
Let's
continue on with the tabular approach. We
stored data in a table last
time, and liked it.
Let's revisit
that
table and then put together
another one
Table
from the Last
Lecture
Title
Author
Publisher
Price
InStock
Y
Good Bye
Mr.
Altaf Khan
BholiBooks
1000
Bhola
The
Terrible
Bhola
BholiBooks
199
Y
Twins
Champion
Calculus
&
Smith
Sahib
Good
Publishers
325
N
Analytical
Geometry
Accounting
Zamin
Geoffry
Sung-e-Kilometer
29
Y
Secrets
Publishers
Another
table ...
Customer
Title
Shipment
Type
Aadil
Ali
Good Bye
Mr. Bhola
2002.12.26
Air
Aadil
Ali
The Terrible
Twins
2002.12.26
Air
Miftah
Muslim
Calculus
&
2002.12.25
Surface
Analytical
Geometry
Karen
Kaur
Good Bye
Mr. Bhola
2002.12.24
Air
252
![]() Introduction
to Computing CS101
VU
This &
the previous table are related
·
They
share a column, & are
related through it
·
A
program can match info
from a field in one table
with info in a corresponding
field of another
table
to generate
a
3rd table that combines
requested data from both
tables
·
That
is, a program can use
matching values in 2 tables to relate
info
in one to info in the other
Q:
Who is BholiBooks' best
customer?
·
That
is, who has spent the most
money on the online bookstore?
·
To
answer that question, one
can process the inventory and the
shipment tables to generate a
third
table
listing the customer names
and the prices of the books that
they have ordered
Customer
Price
The
generated
table
Aadil
Ali
1000
Aadil
Ali
199
Miftah
Muslim
325
Can
you now process this
table to find the answer to
our question
Karen
Kaur
1000
Relational
Databases
·
Databases
consisting of two or more related
tables are called relational
databases
·
A
typical relational database
may have anywhere from 10 to
over a thousand tables
·
Each
column of those tables can
contain only a single type
of data (contrast this with
spreadsheet
columns!)
·
Table
rows are called records; row
elements are called
fields
·
A
relational database stores
all its data inside tables,
and nowhere else
·
All operations on
data are done on those
tables or those that are
generated by table operations
·
Tables, tables,
and nothing but tables!
37.1
RDBMS
·
Relational
DBMS software
·
Contains
facilities for creating,
populating, modifying, and
querying relational
databases
·
Examples:
Access
DB2
FileMaker
Pro
Objectivity/DB
SQL
Server
MySQL
Oracle
Postgres
The
Trouble with Relational
DBs
·
Much
of current SW development is done using
the object-oriented methodology
·
When
we want to store the object-oriented
data into an RDBMS, it needs
to be translated into a
form
suitable for RDBMS
The
Trouble with Relational
DBs
·
Then
when we need to read the
data back from the RDBMS, the
data needs to be translated back
into
an object-oriented form before
use
·
These
two processing delays, the associated
processing, and time spent in
writing and maintaining
the
translation code are the key
disadvantages of the current
RDBMSes
253
![]() Introduction
to Computing CS101
VU
Solution?
·
Don't
have time to discuss that,
but try searching the Web on the
following terms:
·
Object-oriented
databases
Object-relational databases
Classification
of DBMS w.r.t.
Size
·
Personal/Desktop/Single-user
(MB-GB)
Examples: Tech.
papers' list; Methai shop
inventory
Typical
DMBS: Access
·
Server-based/Multi-user/Enterprise
(GB-TB)
Examples:
HBL; Amazon.com
Typical
DMBS: Oracle, DB2
·
Seriously-huge
databases (TB-PB-XB)
Examples:
2002 BaBar experiment at
Stanford (500TB); 2005
LHC database at CERN
(1XB)
Typical
DMBS: Objectivity/DB
37.2
Some Terminology
·
Primary
Key is a
field that uniquely
identifies each record stored in a
table
·
Queries
are
used to view, change, and
analyze data. They can be
used to:
Combine data from
different tables, efficiently
Extract the exact data that is
desired
·
Forms
can
be used for entering,
editing, or viewing data, one record at a
time
·
Reports
are
an effective, user-friendly way of
presenting data. All DBMSes provide tools
for
producing
custom reports.
·
Data
normalization is the
process of efficiently organizing
data in a database. There
are two goals
of the
normalization process:
Eliminate redundant
data
Storing only related
data in a table
Before
we do a demo, let me just
mention my favorite database
application:
Data
Mining
·
The
process of analyzing large
databases to identify
patterns
·
Example:
Mining the sales records
from a BholiBooks could
identify interesting shopping
patterns
like
"53% of customers who bought
book A also bought book
B". This pattern can be
put to good use!
·
Dat a
mining often utilizes
intelligent systems' techniques
Let's
now demonstrate the use of a
desktop RDBMS
·
We
will create a new relational
database
·
It
will consist of two
tables
·
We
will populate those
tables
·
We
will generate a report after
combining the data from the
two tables
Access
Tutorial
http://www.microsoft.com/education/DOWNLOADS/tutorials/classroom/office2k/acc2000.doc
254
![]() Introduction
to Computing CS101
VU
Today's
Lecture:
·
In
this final Lesson on
productivity software, we continued our
discussion from last week on
data
management
·
We
found out about relational
databases
·
We
also implemented a simple
relational database
Next
Lecture' Goals
(Cyber
Crime)
·
To
know the different types of computer
crimes that occur over cyber
space
·
To
familiarize ourselves with with several
methods that can be used to
minimize the effect of
these
crimes
·
To get
familiar with a few policies
and legislation designed to tackle
cyber crime
255
Table of Contents:
|
|||||