ZeePedia

Data Storage Concepts, Physical Storage Media, Memory Hierarchy

<< Designing Input Form, Arranging Form, Adding Command Buttons
File Organizations: Hashing Algorithm, Collision Handling >>
img
Database Management System (CS403)
VU
Lecture No. 34
Reading Material
"Database Management Systems", 2nd edition, Raghu Ramakrishnan, Johannes
Gehrke, McGraw-Hill
"Modern  Database  Management",  Fred  McFadden,  Jeffrey  Hoffer,
Benjamin/Cummings
Overview of Lecture
o Data Storage Concepts
o Physical Storage Media
o Memory Hierarchy
In the previous lecture we have discussed the forms and their designing. From
this lecture we will discuss the storage media.
Classification of Physical Storage MediaStorage media are classified according to
following characteristics:
Speed of access
Cost per unit of data
Reliability
We can also differentiate storage as either
Volatile storage
Non-volatile storage
Computer storage that is lost when the power is turned off is called as volatile storage.
Computer storage that is not lost when the power is turned off is called as non ­
volatile storage, Pronounced cash a special high-speed storage mechanism. It can be
either a reserved section of main memory or an independent high-speed storage
device. Two types of caching are commonly used in personal computers: memory
caching and disk caching.
A memory cache, sometimes called a cache store or RAM Cache is a portion of
memory made of high-speed static RAM (SRAM) instead of the slower and cheaper
DRAM used for main memory. Memory caching is effective because most programs
access the same data or instructions over and over. By keeping as much of this
information as possible in SRAM, the computer avoids accessing the slower DRAM.
Some memory caches are built into the architecture of microprocessors.. The Intel
80486 microprocessor, for example, contains an 8K memory cache, and the Pentium
has a 16K cache. Such internal caches are often called Level 1 (L1) caches. Most
modern PCs also come with external cache memory, called Level 2 (L2) caches.
These caches sit between the CPU and the DRAM. Like L1 caches, L2 caches are
composed of SRAM but they are much larger. Disk caching works under the same
principle as memory caching, but instead of using high-speed SRAM, a disk cache
uses conventional main memory. The most recently accessed data from the disk (as
255
img
Database Management System (CS403)
VU
well as adjacent sectors) is stored in a memory buffer. When a program needs to
access data from the disk, it first checks the disk cache to see if the data is there. Disk
caching can dramatically improve the performance of applications, because accessing
a byte of data in RAM can be thousands of times faster than accessing a byte on a
hard disk.
When data is found in the cache, it is called a cache hit, and the effectiveness of a
cache is judged by its hit rate. Many cache systems use a technique known as smart
caching, in which the system can recognize certain types of frequently used data. The
strategies for determining which information should be kept in the cache constitute
some of the more interesting problems in computer science.
The main memory of the computer is also known as RAM, standing for Random
Access Memory. It is constructed from integrated circuits and needs to have electrical
power in order to maintain its information. When power is lost, the information is lost
too! The CPU can directly access it. The access time to read or write any particular
byte are independent of whereabouts in the memory that byte is, and currently is
approximately 50 nanoseconds (a thousand millionth of a second). This is broadly
comparable with the speed at which the CPU will need to access data. Main memory
is expensive compared to external memory so it has limited capacity. The capacity
available for a given price is increasing all the time. For example many home
Personal Computers now have a capacity of 16 megabytes (million bytes), while 64
megabytes is commonplace on commercial workstations. The CPU will normally
transfer data to and from the main memory in groups of two, four or eight bytes, even
if the operation it is undertaking only requires a single byte.
Flash memory is a form of EEPROM that allows multiple memory locations to be
erased or written in one programming operation. Normal EEPROM only allows one
location at a time to be erased or written, meaning that flash can operate at higher
effective speeds when the system uses it to read and write to different locations at the
same time. All types of flash memory and EEPROM wear out after a certain number
of erase operations, due to wear on the insulating oxide layer around the charge
storage mechanism used to store data.
Flash memory is non-volatile, which means that it stores information on a silicon chip
in a way that does not need power to maintain the information in the chip. This means
that if you turn off the power to the chip, the information is retained without
consuming any power. In addition, flash offers fast read access times and solid-state
shock resistance. These characteristics are why flash is popular for applications such
as storage on battery-powered devices like cellular phones and PDAs.Flash memory is
based on the Floating-Gate Avalanche-Injection Metal Oxide Semiconductor
(FAMOS transistor) which is essentially an NMOS transistor with an additional
conductor suspended between the gate and source/drain terminals.
Magnetic disk is round plate on which data can be encoded. There are two basic types
of disks: magnetic disks and optical disks on magnetic disks, data is encoded as
microscopic magnetized needles on the disk's surface. You can record and erase data
on a magnetic disk any number of times, just as you can with a cassette tape.
Magnetic disks come in a number of different forms:
Floppy Disk: A typical 5¼-inch floppy disk can hold 360K or 1.2MB (megabytes).
3½-inch floppies normally store 720K, 1.2MB or 1.44MB of data.
Hard Disk: Hard disks can store anywhere from 20MB to more than 10GB. Hard
disks are also from 10 to 100 times faster than floppy disks.
256
img
Database Management System (CS403)
VU
Optical disks record data by burning microscopic holes in the surface of the disk with
a laser. To read the disk, another laser beam shines on the disk and detects the holes
by changes in the reflection pattern.
Optical disks come in three basic forms:
CD-ROM: Most optical disks are read-only. When you purchase them, they are
already filled with data. You can read the data from a CD-ROM, but you cannot
modify, delete, or write new data.
WORM: Stands for write-once, read-many. WORM disks can be written on once and
then read any number of times; however, you need a special WORM disk drive to
write data onto a WORM disk.
Erasable optical (EO): EO disks can be read to, written to, and erased just like
magnetic disks.
The machine that spins a disk is called a disk drive. Within each disk drive is one or
more heads (often called read/write heads) that actually read and write data.
Accessing data from a disk is not as fast as accessing data from main memory, but
disks are much cheaper. And unlike RAM, disks hold on to data even when the
computer is turned off. Consequently, disks are the storage medium of choice for
most types of data. Another storage medium is magnetic tape. But tapes are used only
for backup and archiving because they are sequential-access devices (to access data in
the middle of a tape, the tape drive must pass through all the preceding data).
Short for Redundant Array of Independent (or Inexpensive) Disks, a category of disk
drives that employ two or more drives in combination for fault tolerance and
performance. RAID disk drives are used frequently on servers but aren't generally
necessary for personal computers.
Fundamental to RAID is "striping", a method of concatenating multiple drives into
one logical storage unit. Striping involves partitioning each drive's storage space into
stripes which may be as small as one sector (512 bytes) or as large as several
megabytes. These stripes are then interleaved round-robin, so that the combined space
is composed alternately of stripes from each drive. In effect, the storage space of the
drives is shuffled like a deck of cards. The type of application environment, I/O or
data intensive, determines whether large or small stripes should be used.
Most multi-user operating systems today, like NT, UNIX and Netware, support
overlapped disk I/O operations across multiple drives. However, in order to maximize
throughput for the disk subsystem, the I/O load must be balanced across all the drives
so that each drive can be kept busy as much as possible. In a multiple drive system
without striping, the disk I/O load is never perfectly balanced. Some drives will
contain data files which are frequently accessed and some drives will only rarely be
accessed. In I/O intensive environments, performance is optimized by striping the
drives in the array with stripes large enough so that each record potentially falls
entirely within one stripe. This ensures that the data and I/O will be evenly distributed
across the array, allowing each drive to work on a different I/O operation, and thus
maximize the number of simultaneous I/O operations, which can be performed by the
array.
In data intensive environments and single-user systems which access large records,
small stripes (typically one 512-byte sector in length) can be used so that each record
will span across all the drives in the array, each drive storing part of the data from the
record. This causes long record accesses to be performed faster, since the data transfer
occurs in parallel on multiple drives. Unfortunately, small stripes rule out multiple
overlapped I/O operations, since each I/O will typically involve all drives. However,
operating systems like DOS which do not allow overlapped disk I/O, will not be
257
img
Database Management System (CS403)
VU
negatively impacted. Applications such as on-demand video/audio, medical imaging
and data acquisition, which utilize long record accesses, will achieve optimum
performance with small stripe arrays.
RAID-0
RAID Level 0 is not redundant, hence does not truly fit the "RAID" acronym. In level
0, data is split across drives, resulting in higher data throughput. Since no redundant
information is stored, performance is very good, but the failure of any disk in the
array results in data loss. This level is commonly referred to as striping.
RAID-1
RAID Level 1 provides redundancy by writing all data to two or more drives. The
performance of a level 1 array tends to be faster on reads and slower on writes
compared to a single drive, but if either drive fails, no data is lost. This is a good
entry-level redundant system, since only two drives are required; however, since one
drive is used to store a duplicate of the data, the cost per megabyte is high. This level
is commonly referred to as mirroring.
RAID-2
RAID Level 2, which uses Hamming error correction codes, is intended for use with
drives which do not have built-in error detection. All SCSI drives support built-in
error detection, so this level is of little use when using SCSI drives.
RAID-3
RAID Level 3 stripes data at a byte level across several drives, with parity stored on
one drive. It is otherwise similar to level 4. Byte-level striping requires hardware
support for efficient use.
RAID-4
RAID Level 4 stripes data at a block level across several drives, with parity stored on
one drive. The parity information allows recovery from the failure of any single drive.
The performance of a level 4 array is very good for reads (the same as level 0). Writes,
however, require that parity data be updated each time. This slows small random
writes, in particular, though large writes or sequential writes are fairly fast. Because
only one drive in the array stores redundant data, the cost per megabyte of a level 4
array can be fairly low.
RAID-5
RAID Level 5 is similar to level 4, but distributes parity among the drives. This can
speed small writes in multiprocessing systems, since the parity disk does not become
a bottleneck. Because parity data must be skipped on each drive during reads,
however, the performance for reads tends to be considerably lower than a level 4
array. The cost per megabyte is the same as for level 4.
The manner data records are stored and retrieved on physical devices .The technique
used to find and retrieve store records are called access methods.
Sequential File Organization
Records are arranged on storage devices in some sequence based on the value of some
field, called sequence field. Sequence field is often the key field that identifies the
record.
258
img
Database Management System (CS403)
VU
Simply, easy to understand and manage, best for providing sequential access. It is not
feasible for direct or random access; inserting/deleting a record in/from the middle of
the sequence involves cumbersome record searches and rewriting of the file.
RAID-0 is the fastest and most efficient array type but offers no fault-tolerance.
RAID-1 is the array of choice for performance-critical, fault-tolerant environments. In
addition, RAID-1 is the only choice for fault-tolerance if no more than two drives are
desired.
RAID-2 is seldom used today since ECC is embedded in almost all modern disk
drives.
RAID-3 can be used in data intensive or single-user environments which access long
sequential records to speed up data transfer. However, RAID-3 does not allow
multiple I/O operations to be overlapped and requires synchronized-spindle drives in
order to avoid performance degradation with short records.
RAID-4 offers no advantages over RAID-5 and does not support multiple
simultaneous write operations.
RAID-5 is the best choices in multi-user environments which are not write
performance sensitive. However, at least three and more typically five drives are
required for RAID-5 arrays.
259
Table of Contents:
  1. Introduction to Databases and Traditional File Processing Systems
  2. Advantages, Cost, Importance, Levels, Users of Database Systems
  3. Database Architecture: Level, Schema, Model, Conceptual or Logical View:
  4. Internal or Physical View of Schema, Data Independence, Funct ions of DBMS
  5. Database Development Process, Tools, Data Flow Diagrams, Types of DFD
  6. Data Flow Diagram, Data Dictionary, Database Design, Data Model
  7. Entity-Relationship Data Model, Classification of entity types, Attributes
  8. Attributes, The Keys
  9. Relationships:Types of Relationships in databases
  10. Dependencies, Enhancements in E-R Data Model. Super-type and Subtypes
  11. Inheritance Is, Super types and Subtypes, Constraints, Completeness Constraint, Disjointness Constraint, Subtype Discriminator
  12. Steps in the Study of system
  13. Conceptual, Logical Database Design, Relationships and Cardinalities in between Entities
  14. Relational Data Model, Mathematical Relations, Database Relations
  15. Database and Math Relations, Degree of a Relation
  16. Mapping Relationships, Binary, Unary Relationship, Data Manipulation Languages, Relational Algebra
  17. The Project Operator
  18. Types of Joins: Theta Join, Equi–Join, Natural Join, Outer Join, Semi Join
  19. Functional Dependency, Inference Rules, Normal Forms
  20. Second, Third Normal Form, Boyce - Codd Normal Form, Higher Normal Forms
  21. Normalization Summary, Example, Physical Database Design
  22. Physical Database Design: DESIGNING FIELDS, CODING AND COMPRESSION TECHNIQUES
  23. Physical Record and De-normalization, Partitioning
  24. Vertical Partitioning, Replication, MS SQL Server
  25. Rules of SQL Format, Data Types in SQL Server
  26. Categories of SQL Commands,
  27. Alter Table Statement
  28. Select Statement, Attribute Allias
  29. Data Manipulation Language
  30. ORDER BY Clause, Functions in SQL, GROUP BY Clause, HAVING Clause, Cartesian Product
  31. Inner Join, Outer Join, Semi Join, Self Join, Subquery,
  32. Application Programs, User Interface, Forms, Tips for User Friendly Interface
  33. Designing Input Form, Arranging Form, Adding Command Buttons
  34. Data Storage Concepts, Physical Storage Media, Memory Hierarchy
  35. File Organizations: Hashing Algorithm, Collision Handling
  36. Hashing, Hash Functions, Hashed Access Characteristics, Mapping functions, Open addressing
  37. Index Classification
  38. Ordered, Dense, Sparse, Multi-Level Indices, Clustered, Non-clustered Indexes
  39. Views, Data Independence, Security, Vertical and Horizontal Subset of a Table
  40. Materialized View, Simple Views, Complex View, Dynamic Views
  41. Updating Multiple Tables, Transaction Management
  42. Transactions and Schedules, Concurrent Execution, Serializability, Lock-Based Concurrency Control, Deadlocks
  43. Incremental Log with Deferred, Immediate Updates, Concurrency Control
  44. Serial Execution, Serializability, Locking, Inconsistent Analysis
  45. Locking Idea, DeadLock Handling, Deadlock Resolution, Timestamping rules