|
|||||
Database
Management System
(CS403)
VU
Lecture No.
44
Reading
Material
"Database
Systems Principles, Design
and Implementation"
Chapter
12
written
by Catherine Ricardo, Maxwell
Macmillan.
"Database
Management Systems" Second edition
written by
Chapter
18
Raghu
Ramakrishnan, Johannes
Gehkre.
Overview of
Lecture
o Concurrency
control problems
o Serial
and interleaved
schedules
o Serializability
theory
o Introduction
to locking
We are
studying the concurrency
control (CC) mechanism. In the
previous lecture we
discussed
that concurrent access means
the simultaneous access of data
from database
by
multiple users. The
concurrency control (CC)
concerns maintaining
the
consistency
of the database during
concurrent access. We also studied
that if
concurrent
access is not controlled
properly then database may
encounter three
different
problems which may turn
database into an inconsistence
state. One of these
problems
we discussed in the previous
lecture, now we will discuss
the remaining two.
Uncommitted
Update Problem
It
reflects a situation when a
transaction updates an object and
another transaction
reads
this updated value but
later the first transaction
is aborted. The problem in
this
situation
is that the second
transaction reads the value
updated by an aborted
transaction.
This situation is shown in
the table below:
310
Database
Management System
(CS403)
VU
TIME
TA
TB
BAL
t1
Read
(BAL)
.......
1000
t2
BAL=BAL+100
........
1000
t3
Write
(BAL)
........
1100
t4
.......
Read
(BAL)
1100
t5
.......
BAL=BAL*1.5
1100
t6
.......
Write
(BAL)
2650
t7
ROLLBACK
........
?
Table 1:
Uncommitted update
problem
As is
shown in the table, at time
t2, transaction TA updates the
value of object `BAL'
by adding
100 in its initial value
1000 and at time t3 it
writes this updated value
to
database.
So `BAL' is written as 1100. Now at
time t4, TB reads the
value of `BAL'
and it
gets 1100 then TB updates
the value multiplying it by
1.5 and writes it at
time
t6. So
the new value written
for `BAL' is 2650. However at
time t7 transaction TA is
rolled
back as a result of this
rollback the changes made by
TA stand cancelled, but
TB has
already performed certain processing on
the value set by TA. This is
an
inconsistent
state of the
database.
Inconsistent
Analysis
This
problem of concurrent access
occurs when one transaction
operates on many
records,
meanwhile another modifies
some of them effecting
result of first. This
problem
is expressed in the following
table:
311
Database
Management System
(CS403)
VU
TIME
TA
TB
t1
Read
(BAL_A) (5000)
.......
t2
INT = BAL_A *
.05
........
t3
TOT =
TOT + INT
........
t4
.......
.......
t5
.......
Read
(BAL_A)
t6
.......
BAL_A=BAL_A
1000
t7
.......
Write
(BAL_A)
t8
.......
Read
(BAL_E) (5000)
t9
.......
BAL_E =
BAL_E + 1000
t10
.......
Write
(BAL_E)
t11
Read
(BAL_E) (6000)
........
t12
INT = BAS_E *
.05
........
t13
TOT =
TOT + INT
........
Table 2:
Inconsistent analysis
problem
Suppose
Tr-A computes interest on all
accounts balances. For this, TA
reads the
balance
of each account multiplies it with
0.05 and adds this
interest amount into
a
variable
TOT. Now by time t5, TA
has read the balance of account `A'
(BAL_A) and
computed
interest on it, it has to
perform same process on all
the accounts. In the
meanwhile
from time t5 to t10 another
transaction TB subtracts a value from
balance
of account `A'
and adds this value to
the balance of account `E'.
When transaction TA
reaches
the account `E' to compute
interest on it, the value
added in it by TB is also
considered
which as a matter of fact is
being considered second
time. First time
from
account `A'
and now from account `E'.
The analysis obtained
through transaction TA
will be
wrong at the end.
We have
discussed three problems of
concurrent access; the CC mechanism
has to
make
sure that these problems do
not occur in the database.
In the following, we will
discuss
how it is done. We will start by studying
some basic concepts.
Serial
Execution
Serial
execution is an execution where
transactions are executed in a
sequential order,
that
is, one after another. A
transaction may consist of many
operations. Serial
execution
means that all the
operations of one transaction
are executer first,
followed
by all
the operations of the next
transaction and like that. A
Schedule or History is a
list of
operations from one or more
transactions. A schedule represents the
order of
execution
of operations. The order of
operations in a schedule should be the
same as
in the
transaction. Schedule for a
serial execution is called a
serial schedule, so in a
serial
schedule all operations of one
transactions are listed followed by
all the
312
Database
Management System
(CS403)
VU
operations
of another transactions and so
on. With a given set of
transactions, we can
have
different serial schedules. For
example, if we have two
transactions, then we
can
have
two different serial
schedules as is explained in the
table below:
BAL
Serial
Sched 1
Serial
Sched 2
BAL
TA,
TB
TB,
TA
Read
(BAL)A
Read
(BAL)B
50
50
(BAL =
BAL 10)A
50
(BAL=BAL *
2)B
50
Write
(BAL)A
40
Write
(BAL)B
100
40
Read
(BAL)B
Read
(BAL)A
100
(BAL=BAL *
2)B
80
(BAL =
BAL 10)A
100
Write
(BAL)B
80
Write
(BAL)A
90
Table 3:
Two different serial
schedules, TA, TB and TB,
TA
The
table shows two different
schedules of two transactions TA and
TB. The
subscript
with each operation shows
the transaction to which the
operation belongs.
For
example, Read (BAL)A means
the read operation of TA to read the
object `BAL'.
By
looking at the table that in
each serial schedule; all
the operations of one
transaction
are executed first followed by those of
the other transaction. An
important
point to
be noted here is that different
serial schedules of the same
transactions may
result in
different final state of the
database. As we can see in
the table 3, final
value
of the
object `BAL' is 80 with the
serial schedule TA, TB and it is 90 with
serial
schedule
TB, TA. However, it is guaranteed
that a serial schedule always leaves
the
database
in a consistent state. In other words,
the three problems of
concurrency that
we studies
earlier will not occur if a
serial schedule is followed.
The
serial schedule ensures the
consistency of the database, so
should we prefer
serial
schedule?
The answer is no. Because
serial execution is badly
under utilization of
resources
and users will not like it
at all since they will have to
wait a lot for
the
transactions'
execution. Suppose we are following a
serial execution and a
transaction
Tn has to
executed completely before
any other transaction may
start. Lets say Tn
needs an
input from the user
who initiated it, and by
chance users gets
stuck
somewhere or is
thinking, unless and until
the users does not
enter a value,
transaction
Tn cannot
proceed and all other
transactions are waiting for
their turn to execute.
If
313
Database
Management System
(CS403)
VU
allowed
to happen, this will be a much
disliked situation. Therefore
serial schedules
are
not preferred for
execution.
Contrary
to serial schedule is an interleaved schedule in
which transactions'
execution
is
interleaved, that is,
operations of different transactions are
intermix with each
other.
However,
the internal order of the
operations is not changed during
this intermixing.
Interleave
schedule provides more efficient
execution since control is switched
among
transaction,
so on one side it makes a
better use of the resources
and on the other
hand
if one
transaction is stuck or halted
due to some reasons, the
processing can go on
with
other transactions. Following figure
presents serial and
interleaved schedules
among
two transactions TA and TB
TA = {Read(X),
Write(X), commit}
TB = {Read(Y),
Read(X), Write(Y),
commit}
S1 = {RA(X), WA(X), CA, RB(Y), RB(X),
WB(Y), CB}
S2 = {RB(Y), RB(X), WB(Y),
CB, RA(X),
WA(X), CA}
S3 = {RB(Y), RB(X), RA(X),
WB(Y), CB, WA(X), CA}
S4 = {RA(X), RB(Y), RB(X),
WA(X), CA, WB(Y), CB}
Fig. 1:
Two transactions and
different schedules
In figure
1, we have two transactions TA and TB
consisting of different
operations.
S1,
S2, S3 and S4 are four
different schedules of these
transactions. S1 and S2 are
two
serial
schedules where as S3 and S4
are two interleaved schedules.
Obviously we can
have
just two serial schedules of
two transactions, but we can
have many other
interleaved
schedules of these
transactions.
Interleaved
schedules are preferred but
they may generate three of
the concurrent
access
problems if not controlled
properly. Before discussing the
solution to this
problem,
lets see what is the
actual problem in concurrent
access.
There
are different situations
during concurrent access of
the data:
·
Different
transactions accessing (reading or
writing) different
objects
·
Different
transactions reading same
object
314
Database
Management System
(CS403)
VU
·
Different
transactions accessing same object
and one or all writing
it
The
first two of these
situations do not create any
problem during concurrent
access,
so we do
not need to worry in these
situations. However, third
situation is precisely
the
one that creates the
concurrency problems and we
have to control this
situation
properly.
Third situation introduces
the concept of conflicting
operations; the
operations
that are accessing the
same object and one of
them writing the object.
The
transactions
that contain conflicting
operations are called
conflicting transactions.
Basically,
in concurrency control we have to
take care of the conflicting
transactions
and
for that we have to study
the concept of serializability.
Serializability
An
interleaved schedule is said to be serializable if
its final state is
equivalent to some
serial
schedule. Serializable schedule can also
be defined as a schedule in which
conflicting
operations are in a serial order.
The serializable schedule ensures
the
consistency
of the database. However,
this is worthwhile to mention here
again that
serializability
takes care of the inconsistencies
only due to the interleaving
of
transactions.
The serializability concerns
generating serializable schedules or we
can
say
that the purpose of the CC
mechanism is to ensure serializability of
the schedules.
Generating
serializable schedule involves taking
care of only conflicting
operations.
We have
to adopt a particular serial schedule
among the conflicting
operations; the
non-conflicting
operations can be interleaved to
any order, it will not
create any
problem
at all. There are two major
approaches to implement the
serializability;
Locking
and Timestamping. We will discuss
both in detail.
Locking
The basic
idea of locking is that an
object is locked prior to
performing any
operation.
When
the operation has been
performed the lock is released.
Locking is maintained by
the
transaction manager or more precisely
lock manager. Transactions apply
locks on
objects.
During the time when an
object is locked it is available
for operations only
to
the
transaction that has got
the lock on that object
through lock manager. When
an
item is
locked and during that
time another transaction
needs to perform some
315
Database
Management System
(CS403)
VU
operation
on that object, the
transaction applies for the
lock on the object but since
the
object is
already locked, this
transaction will have to wait. So it
enters into a wait
state.
When
first transaction finishes
its job, it releases locks
on items then the second
item
gets
the lock on the object
and then it can
proceed.
Transactions
perform two types of
operations on objects; read or write.
That is, a
transaction
either performs a read operation on an
object or it writes it.
Accordingly
there
are two types of locks as
well. A read or shared lock
and a write or
exclusive
lock a
transaction applies lock
according to the nature of
operation that it wants to
perform
on a particular object. If a transaction
TB applies a lock on an object O,
the
lock
manager first checks if O is free, it is
free then TB gets the
desired lock on O.
However,
if O is already locked by another
transaction say TA then lock
manager
checks
the compatibility of the
locks; the one that
has already been assigned and
the
one
that has been applied now.
The compatibility of locks
means that if the two
locks
from
two different transactions
may exist at the same
time. The compatibility of
locks
is checked
according to the following
table:
Transaction
A
Read
Write
Transaction
B
Read
Yes
No
Write
No
No
Table
shows that if a transaction A
has got a read lock on an
object and transaction
B
applies
for the read lock, B will be
granted this lock. It means
two read locks are
compatible
with each other, that
is, they can exist at
the same time. Therefore
two or
even
more than two transactions
can read an object at the
same time. Other
cells
contain
`No'; it means these locks
are not compatible. So if a
transaction has got a
`write'
lock on an object and
another transaction applies
for a `read' or `write'
lock,
the
lock will not be granted and
the transaction applying for
the lock later, will
have
to
wait.
That is
all for today's lecture.
The locking mechanism will be discussed
in detail in
the
next lecture.
Summary
In this
lecture we discussed two of
the three CC problems. Then
we discussed that
there
are different types of
schedules that determine the
sequence of execution of
316
Database
Management System
(CS403)
VU
operations
form different transactions. A
serial schedule maintains the
consistency of
the
database for sure, but
serial schedule is not much
useful. Interleaved schedule is
preferred
but if not controlled
properly an interleaved schedule may
cause consistency
problems.
So CC is based on the serializability
theory that controls the
order of
conflicting
operations. The serializability is
applied using locking or
Timestamping.
Locking
is based on two types of
locks; shared and exclusive.
Only two shared
locks
are
compatible with each other,
none of the remaining
combinations are
compatible.
317
Table of Contents:
|
|||||