|
|||||
Advanced Computer
Architecture-CS501
Advanced
Computer Architecture
Lecture
No. 20
Reading
Material
Vincent
P. Heuring & Harry F. Jordan
Chapter
5
Computer
Systems Design and Architecture
5.1.5,
5.1.6
Summary
·
Structural
RTL for Pipeline
Stages
·
Instruction
Propagation Through the
Pipeline
·
Pipeline
Hazards
·
Data
Dependence Distance
·
Data
Forwarding
·
Compiler
Solution to Hazards
·
SRC
Hazard Detection and
Correction
·
RTL for
Hazard Detection and Pipeline
Stall
Structural
RTL for Pipeline
Stages
The
Register Transfer Language
for each phase is given as
follows:
Instruction
Fetch
IR2
←
M
[PC];
PC2
←
PC+4;
Instruction
Decode & Operand fetch
X3←l-s2:(rel2:PC2,disp2:(rb=0):?,(rb!=0):R[rb]),brl2:PC2,alu2:R[rb],
Y3 ← l-s2:(rel2:c1,disp2:c2),alu2:(imm2:c2,!imm2:R[rc]),
MD3
←store2:R[ra],IR3
←
IR2,stop2:Run
←
0,
PC ← !branch2:PC+4,branch2:(cond(IR2,R[rc]):R[rb],!cond(IR2,R[rc]):PC+4;
ALU
operation
Z4 ← (I-s3:
X3+Y3, brl3: X3, Alu3: X3 op
Y3,
MD4
←
MD3,
IR4
←
IR3;
Memory
access
Z5 ← (load4: M
[Z4], ladr4~branch4~alu4:Z4),
store4:
(M [Z4] ← MD4),
Page
214
Last
Modified: 01-Nov-06
Advanced Computer
Architecture-CS501
IR5
←IR4;
Write
back
regwrite5:
(R[ra] ← Z5);
Instruction
Propagation through the
Pipeline
Consider
the following SRC code segment
flowing through the
pipeline. The
instructions
along
with their addresses
are
200: add
r1, r2, r3
204: ld
r5, [4(r7)
208: br
r6
212:
str r4, 56
...
400
We shall
review how this chunk of
code is executed.
First
Clock Cycle
Add
instruction enters the pipeline in
the first cycle. The
value in PC is
incremented
from 200 to 204.
Second
Clock Cycle
Add
moves to decode stage. Its
operands are fetched from
the register file and
moved to
X3 and Y3 at the end of clock cycle,
meanwhile the Instruction ld
r5,
[4+r7] is
fetched in the first stage
and the PC value is incremented
from 204 to
208.
Third
Clock Cycle
Add
instruction moves to the
execute stage, the results
are written to Z4 on
the
trailing
edge of the clock. Ld
instruction moves to decode
stage. The operands
are
fetched to calculate the
displacement address. Br instruction
enters the
pipeline.
The value in PC is incremented
from 208 to 212.
Fourth
Clock Cycle
Add
does not access memory.
The result is written to Z5 at
the trailing edge of
clock.
The address is being
calculated here for ld. The
results are written to
Z4.
Br is in
the decode stage. Since this
branch is always true, the
contents of PC are
modified
to new address. Str
instruction enters the pipeline.
The value in PC is
incremented
from 212 to 216.
Fifth
Clock Cycle
Page
215
Last
Modified: 01-Nov-06
Advanced Computer
Architecture-CS501
The
result of addition is written
into register r1. Add
instruction completes. Ld
accesses
data memory at the address
specified in Z4 and result stored in Z5
at
falling
edge of clock. Br instruction
just propagates through this
stage without
any
calculation. Str is in the
decode stage. The operands
are being fetched
for
address
calculation to X3 and Y3. The instruction
at address 400 enters the
pipeline.
The value in PC is incremented
from 400 to 404.
Pipeline
Hazards
The
instructions in the pipeline at
any given time are
being executed in parallel.
This
parallel
execution leads to the
problem of instruction dependence. A
hazard occurs when
an
instruction depends on the
result of previous instruction
that is not yet
complete.
Classification of
Hazards
There
are three categories of
hazards
1. Branch
Hazard
2.
Structural Hazard
3. Data
Hazard
Branch
hazards
The
instruction following a branch is
always executed whether or
not the branch is
taken.
This is
called the branch delay
slot. The compiler might
issue a nop instruction in
the
branch
delay slot. Branch delays
cannot be avoided by forwarding
schemes.
Structural
hazards
Page
216
Last
Modified: 01-Nov-06
Advanced Computer
Architecture-CS501
A
structural hazard occurs
when attempting to access
the same resource in different
ways
at the
same time. It occurs when
the hardware is not enough
to implement pipelining
properly
e.g. when the machine does
not support separate data
and instruction memories.
Data
hazards
Data
hazard occur when an
instruction attempts to access
some data value that
has not yet
been
updated by the previous instruction. An
example of this RAW (read after
write) data
hazard
is;
200:
add r2, r3, r4
204: sub
r7, r2, r6
The
register r2 is written in clock
cycle 5 hence the sub instruction
cannot proceed
beyond
stage 2 until the add
instruction leaves the
pipeline.
Data
Hazard Detection &
Correction
Data hazards can be
detected easily as they
occur when the destination
register of an
instruction
is the same as the source
register of another instruction in close
proximity. To
remedy
this situation, dependent instructions
may be delayed or stalled
until the ones
ahead
complete. Data can also be forwarded to
the next instruction before
the current
instruction
completes, however this
requires forwarding hardware and
logic. Data can be
forwarded
to the next instruction from
the stage where it is
available without waiting
for
the
completion of the instruction. Data is
normally required at stage 2
(operand fetch)
however
data is earliest available at
stage 3 output (ALU result) or
stage 4 output
(memory
access result). Hence the
forwarding logic should be able to
transfer data from
stage 3
to stage 2 or from stage 4 to
stage 2.
Data
Dependence Distance
Designing
a data forwarding unit
requires the study of
dependence distances. Without
forwarding,
the minimum spacing required
between two data dependent
instructions to
avoid
hazard is four. The load
instruction has a minimum distance of
two from all
other
instructions
except branch. Branch delays
cannot be removed even with
forwarding.
Table
5.1 of the text shows
numbers related to dependence
distances with respect to
some
important
instruction categories.
Compiler
Solution to Hazards
Hazards
can be detected by the compiler, by
analyzing the instruction
sequences and
dependencies.
The compiler can inserts
bubbles (nop instruction)
between two
instructions
that form a hazard, or it
could reorder instructions so as to
put sufficient
distance
between dependent instructions. The
compiler solution to hazards is
complex,
expensive
and not very efficient as compared to
the hardware solution
Page
217
Last
Modified: 01-Nov-06
Advanced Computer
Architecture-CS501
SRC
Hazard Detection and
Correction
The SRC
uses a hazard detection
unit. The hazard can be
resolved using either
pipeline
stalls or
by data forwarding.
Pipeline
stalls
Consider
the following sequence of
instructions going
through
the SRC pipeline
200:
shl r6, r3, 2
204:
str r3, 32
208: sub
r2, r4,r5
212:
add r1,r2,r3
216: ld
r7, 48
There is
a data hazard between
instruction three and
four
that can
be resolved by using pipeline
stalls or bubbles
When
using pipeline stalls, nop
instructions are placed in between
dependent instructions.
The
logic behind this scheme is
that if opcode in stage 2 and 3
are both alu, and if ra
in
stage 3
is the same as rb or rc in stage 2,
then a pause signal is
issued to insert a
bubble
between
stage 3 and 2. Similar logic is
used for detecting hazards
between stage 2 and 4
and stage
4 and 5.
Data
Forwarding
By adding
data forwarding mechanism to
the SRC data path, the
stalls can be completely
eliminated
at least for the ALU instructions.
The hazard detection is
required between
stages 3
and 4, and between stages 3 and 5. The
testing and forwarding circuits
employ
wider
IRs to store the data
required in later stages.
The logic behind this
method is that if
the ALU
is activated for both 3 and 5 and ra in 5
is the same as rb in 3 then Z5
which
hold
the currently loaded or calculated
result is directly forwarded to
X3. Similarly, if
both
are ALU operations and instruction in
stage 3 does not employ
immediate operands
then
value of Z5 is transferred to Y3. Similar
logic is used to forward
data between stage
3 and
4.
RTL
for Hazard Detection and
Pipeline Stall
The
following RTL expression detects
data hazard between stage 2
and 3, then stalls
stage 1
and 2 by inserting a bubble in stage
3
alu3&alu2&((ra3=rb2)~((ra3=rc2)&!imm2)):
(pause2,
pause1, op3←0)
Meaning:
If opcode
in stage 2 and 3 are both ALU, and if ra
in stage 3 is same as rb or rc in stage
2,
issue a
pause signal to insert a
bubble between stage 3 and
2
Page
218
Last
Modified: 01-Nov-06
Advanced Computer
Architecture-CS501
Following
is the complete RTL for
detecting hazards among ALU instructions
in
different
stages of the
pipeline
Data
Hazard
RTL
for detection and
stalling
between
Stage 2
and 3
alu3&alu2&((ra3=rb2)~((ra3=rc2)&!imm2)):
(pause2,
pause1, op3←0)
Stage 2
and 4
alu4&alu2&((ra4=rb2)~((ra4=rc2)&!imm2)):
(pause2,
pause1, op3←0)
Stage 2
and 5
alu5&alu2&((ra5=rb2)~((ra5=rc2)&!imm2)):
(pause2,
pause1, op3←0)
Page
219
Last
Modified: 01-Nov-06
Advanced Computer
Architecture-CS501
Advanced
Computer Architecture
Lecture
21
Reading
Material
Vincent
P. Heuring&Harry F. Jordan
Chapter
5
Computer
Systems Design and Architecture
5.2
Summary
·
Data
Forwarding Hardware
·
Instruction
Level Parallelism
·
Difference
between Pipelining and Instruction-Level
Parallelism
·
Superscalar
Architecture
·
Superscalar
Design
·
VLIW
Architecture
Maximum
Distance between two
instructions
Example
Read
page no. 219 of Computer
System Design and Architecture
(Vincent
P.Heuring,
Harry F. Jordan)
Data
forwarding Hardware
The
concept of data forwarding was
introduced in the previous
lecture.
RTL
for
data
Page
220
Last
Modified: 01-Nov-06
Table of Contents:
|
|||||