|
|||||
Database
Management System
(CS403)
VU
Lecture No.
34
Reading
Material
"Database
Management Systems", 2nd edition,
Raghu Ramakrishnan,
Johannes
Gehrke,
McGraw-Hill
"Modern
Database Management",
Fred McFadden, Jeffrey
Hoffer,
Benjamin/Cummings
Overview of
Lecture
o Data
Storage Concepts
o Physical
Storage Media
o Memory
Hierarchy
In the
previous lecture we have
discussed the forms and
their designing. From
this
lecture we will discuss the storage
media.
Classification
of Physical Storage MediaStorage
media are classified
according to
following
characteristics:
Speed of
access
Cost per
unit of data
Reliability
We can
also differentiate storage as
either
Volatile
storage
Non-volatile
storage
Computer storage
that is lost when the
power is turned off is called as
volatile storage.
Computer
storage that is not lost
when the power is turned
off is called as non
volatile
storage, Pronounced cash a special
high-speed storage mechanism. It can
be
either a
reserved section of main memory or an
independent high-speed storage
device.
Two types of caching are
commonly used in personal
computers: memory
caching
and disk caching.
A memory
cache, sometimes called a cache store or
RAM Cache is a portion of
memory
made of high-speed static RAM
(SRAM) instead of the slower
and cheaper
DRAM used
for main memory. Memory
caching is effective because most
programs
access
the same data or instructions
over and over. By keeping as
much of this
information
as possible in SRAM, the
computer avoids accessing
the slower DRAM.
Some
memory caches are built
into the architecture of
microprocessors.. The Intel
80486
microprocessor, for example,
contains an 8K memory cache,
and the Pentium
has a
16K cache. Such internal
caches are often called
Level 1 (L1) caches.
Most
modern
PCs also come with external
cache memory, called Level 2
(L2) caches.
These
caches sit between the
CPU and the DRAM. Like L1
caches, L2 caches are
composed of SRAM
but they are much
larger. Disk caching works
under the same
principle
as memory caching, but
instead of using high-speed
SRAM, a disk cache
uses
conventional main memory.
The most recently accessed data
from the disk
(as
255
Database
Management System
(CS403)
VU
well as
adjacent sectors) is stored in a memory
buffer. When a program needs
to
access
data from the disk, it first
checks the disk cache to
see if the data is there.
Disk
caching
can dramatically improve the
performance of applications, because
accessing
a byte of
data in RAM can be thousands of times
faster than accessing a byte
on a
hard
disk.
When data
is found in the cache, it is
called a cache
hit, and
the effectiveness of a
cache is
judged by its hit
rate. Many
cache systems use a
technique known as smart
caching,
in which the system can
recognize certain types of
frequently used data.
The
strategies
for determining which
information should be kept in
the cache constitute
some of
the more interesting
problems in computer
science.
The
main memory of the computer
is also known as RAM,
standing for Random
Access
Memory. It is constructed from
integrated circuits and
needs to have
electrical
power in
order to maintain its
information. When power is
lost, the information is
lost
too!
The CPU can directly
access it. The access
time to read or write any
particular
byte
are independent of whereabouts in
the memory that byte
is, and currently is
approximately
50 nanoseconds
(a
thousand millionth of a second). This is
broadly
comparable
with the speed at which
the CPU will need to access
data. Main memory
is
expensive compared to external
memory so it has limited
capacity. The
capacity
available
for a given price is
increasing all the time.
For example many
home
Personal
Computers now have a
capacity of 16 megabytes
(million
bytes), while 64
megabytes is
commonplace on commercial workstations.
The CPU will normally
transfer
data to and from the main
memory in groups of two,
four or eight bytes,
even
if the
operation it is undertaking only
requires a single
byte.
Flash
memory is a form of EEPROM
that allows multiple memory
locations to be
erased or
written in one programming
operation. Normal EEPROM
only allows one
location
at a time to be erased or written,
meaning that flash can
operate at higher
effective
speeds when the system
uses it to read and write to
different locations at
the
same
time. All types of flash
memory and EEPROM wear
out after a certain
number
of erase
operations, due to wear on
the insulating oxide layer
around the charge
storage mechanism
used to store data.
Flash
memory is non-volatile, which
means that it stores
information on a silicon
chip
in a way
that does not need power to
maintain the information in
the chip. This
means
that if
you turn off the
power to the chip, the
information is retained
without
consuming
any power. In addition,
flash offers fast read
access times and
solid-state
shock
resistance. These characteristics are why
flash is popular for
applications such
as storage on
battery-powered devices like
cellular phones and PDAs.Flash
memory is
based on
the Floating-Gate Avalanche-Injection
Metal Oxide
Semiconductor
(FAMOS
transistor) which is essentially an
NMOS transistor with an
additional
conductor
suspended between the gate
and source/drain
terminals.
Magnetic
disk is round plate on which
data can be encoded. There are
two basic types
of disks:
magnetic disks and optical
disks on magnetic disks,
data is encoded as
microscopic
magnetized needles on the
disk's surface. You can
record and erase
data
on a
magnetic disk any number of
times, just as you can
with a cassette tape.
Magnetic
disks come in a number of
different forms:
Floppy
Disk: A typical
5¼-inch floppy disk can
hold 360K or 1.2MB
(megabytes).
3½-inch
floppies normally store 720K,
1.2MB or 1.44MB of
data.
Hard
Disk: Hard
disks can store anywhere
from 20MB to more than 10GB.
Hard
disks
are also from 10 to 100
times faster than floppy
disks.
256
Database
Management System
(CS403)
VU
Optical
disks record data by burning
microscopic holes in the surface of
the disk with
a laser. To read
the disk, another laser beam
shines on the disk and
detects the holes
by
changes in the reflection
pattern.
Optical
disks come in three basic
forms:
CD-ROM:
Most
optical disks are read-only.
When you purchase them, they
are
already
filled with data. You
can read the data from a
CD-ROM, but you
cannot
modify,
delete, or write new
data.
WORM: Stands
for write-once,
read-many. WORM
disks can be written on once
and
then read
any number of times;
however, you need a special
WORM disk drive to
write
data onto a WORM
disk.
Erasable
optical (EO): EO disks
can be read to, written to,
and erased just
like
magnetic
disks.
The
machine that spins a disk is
called a disk drive. Within
each disk drive is one
or
more
heads
(often
called read/write heads)
that actually read and write
data.
Accessing
data from a disk is not as
fast as accessing data from
main memory, but
disks
are much cheaper. And unlike
RAM, disks hold on to data even
when the
computer
is turned off. Consequently,
disks are the storage medium
of choice for
most
types of data. Another storage
medium is magnetic tape. But
tapes are used
only
for
backup and archiving because
they are sequential-access devices (to
access data in
the
middle of a tape, the tape drive must
pass through all the
preceding data).
Short
for Redundant Array of
Independent (or Inexpensive)
Disks, a
category of disk
drives
that employ two or more
drives in combination for
fault tolerance and
performance.
RAID disk drives are used
frequently on servers but
aren't generally
necessary
for personal computers.
Fundamental
to RAID is "striping", a method of
concatenating multiple drives
into
one
logical storage unit. Striping
involves partitioning each
drive's storage space
into
stripes
which may be as small as one
sector (512 bytes) or as
large as several
megabytes. These
stripes are then interleaved round-robin,
so that the combined
space
is composed
alternately of stripes from each
drive. In effect, the storage
space of the
drives is
shuffled like a deck of
cards. The type of
application environment, I/O
or
data
intensive, determines whether
large or small stripes should be
used.
Most
multi-user operating systems
today, like NT, UNIX and
Netware, support
overlapped
disk I/O operations across
multiple drives. However, in
order to maximize
throughput
for the disk subsystem,
the I/O load must be
balanced across all the
drives
so that
each drive can be kept
busy as much as possible. In a
multiple drive system
without
striping, the disk I/O
load is never perfectly
balanced. Some drives
will
contain
data files which are
frequently accessed and some
drives will only rarely
be
accessed.
In I/O intensive environments,
performance is optimized by striping
the
drives in
the array with stripes large
enough so that each record
potentially falls
entirely
within one stripe. This
ensures that the data and
I/O will be evenly
distributed
across
the array, allowing each
drive to work on a different
I/O operation, and
thus
maximize
the number of simultaneous
I/O operations, which can be
performed by the
array.
In data
intensive environments and
single-user systems which
access large records,
small
stripes (typically one 512-byte sector in
length) can be used so that
each record
will span
across all the drives in
the array, each drive
storing part of the data
from the
record.
This causes long record
accesses to be performed faster, since
the data transfer
occurs in
parallel on multiple drives.
Unfortunately, small stripes rule
out multiple
overlapped
I/O operations, since each
I/O will typically involve
all drives. However,
operating
systems like DOS which do
not allow overlapped disk
I/O, will not be
257
Database
Management System
(CS403)
VU
negatively
impacted. Applications such as
on-demand video/audio, medical
imaging
and data
acquisition, which utilize
long record accesses, will
achieve optimum
performance
with small stripe
arrays.
RAID-0
RAID
Level 0 is not redundant,
hence does not truly
fit the "RAID" acronym. In
level
0, data is
split across drives,
resulting in higher data
throughput. Since no
redundant
information
is stored, performance is very
good, but the failure of
any disk in the
array
results in data loss. This level is
commonly referred to as
striping.
RAID-1
RAID
Level 1 provides redundancy by
writing all data to two or
more drives. The
performance
of a level 1 array tends to be faster on
reads and slower on
writes
compared to a
single drive, but if either
drive fails, no data is lost.
This is a good
entry-level
redundant system, since only
two drives are required;
however, since one
drive is
used to store a duplicate of the data,
the cost per megabyte is
high. This level
is
commonly referred to as
mirroring.
RAID-2
RAID
Level 2, which uses Hamming
error correction codes, is
intended for use
with
drives
which do not have built-in
error detection. All SCSI
drives support
built-in
error
detection, so this level is of
little use when using
SCSI drives.
RAID-3
RAID
Level 3 stripes data at a byte
level across several drives,
with parity stored on
one
drive. It is otherwise similar to
level 4. Byte-level striping
requires hardware
support
for efficient use.
RAID-4
RAID
Level 4 stripes data at a block
level across several drives,
with parity stored on
one
drive. The parity
information allows recovery
from the failure of any
single drive.
The
performance of a level 4 array is
very good for reads
(the same as level 0).
Writes,
however,
require that parity data be
updated each time. This
slows small random
writes,
in particular, though large
writes or sequential writes are
fairly fast. Because
only
one drive in the array
stores redundant data, the
cost per megabyte of a level
4
array
can be fairly low.
RAID-5
RAID
Level 5 is similar to level 4,
but distributes parity among
the drives. This
can
speed
small writes in multiprocessing
systems, since the parity
disk does not become
a
bottleneck. Because parity
data must be skipped on each
drive during reads,
however,
the performance for reads
tends to be considerably lower than a
level 4
array.
The cost per megabyte is
the same as for level
4.
The
manner data records are stored and
retrieved on physical devices
.The technique
used to
find and retrieve store records
are called access
methods.
Sequential
File Organization
Records
are arranged on storage devices in some
sequence based on the value
of some
field,
called sequence field.
Sequence field is often the
key field that identifies
the
record.
258
Database
Management System
(CS403)
VU
Simply,
easy to understand and manage,
best for providing
sequential access. It is
not
feasible
for direct or random access;
inserting/deleting a record in/from
the middle of
the
sequence involves cumbersome record
searches and rewriting of
the file.
RAID-0 is
the fastest and most
efficient array type but
offers no fault-tolerance.
RAID-1 is
the array of choice for
performance-critical, fault-tolerant
environments. In
addition,
RAID-1 is the only choice
for fault-tolerance if no more
than two drives
are
desired.
RAID-2 is
seldom used today since ECC
is embedded in almost all modern
disk
drives.
RAID-3
can be used in data intensive or
single-user environments which
access long
sequential
records to speed up data transfer.
However, RAID-3 does not
allow
multiple
I/O operations to be overlapped
and requires synchronized-spindle
drives in
order to
avoid performance degradation
with short records.
RAID-4
offers no advantages over RAID-5
and does not support
multiple
simultaneous
write operations.
RAID-5 is
the best choices in multi-user
environments which are not
write
performance
sensitive. However, at least three
and more typically five
drives are
required
for RAID-5 arrays.
259
Table of Contents:
|
|||||