Adapting SRC instructions for Pipelined, Control Signals

<< SRC Exception Processing Mechanism, Pipelining, Pipeline Design

SRC, RTL, Data Dependence Distance, Forwarding, Compiler Solution to Hazards >>

Advanced Computer Architecture-CS501

Advanced Computer Architecture

Lecture 19

Reading Material

Vincent P. Heuring&Harry F. Jordan

Chapter 5

Computer Systems Design and Architecture

5.1.3

Summary

�

Pipelined Version of the SRC

�

Adapting SRC instructions for Pipelined Execution

�

Control Signals for Pipelined SRC

Pipelined Version of the SRC

In this lecture, a pipelined version of the SRC is presented. The SRC uses a five-stage

pipeline. Those five stages are given below:

1. Instruction Fetch

2. Instruction decode/operand fetch

3. ALU operation

4. Memory access

5. Register write

As shown in the next diagram, there are several registers between each stage.

After the instruction has been fetched, it is stored in IR2 and the incremented value of the

program counter is held in PC2. When the register values have been read, the first

opcode and ra. If it is a store to memory instruction, MD3 holds the register value to be

stored.

After the instruction has been executed in the ALU, the register Z4 holds the result. The

op-code and ra are passed on to IR4. During the write back stage, the register Z5 holds the

value to be stored back into the register, while the op-code and ra are passed into IR5.

There are also two separate memories and several multiplexers involved in the pipeline

operation. These will be shown at appropriate places in later figures.

The number after a particular register name indicates the stage where the value of this

Page 208

Last Modified: 01-Nov-06

Advanced Computer Architecture-CS501

Adapting SRC Instructions for Pipelined Execution

As mentioned earlier, the SRC instructions fall into the following three categories:

1. ALU Instructions

2. Load/Store instructions

3. Branch Instructions

We will now discuss how to design a common pipeline for all three categories of

instructions.

1. ALU instructions

ALU instructions are usually of the form:

op-code ra, rb, rc

op-code ra, rb, constant.

In the diagram shown, X3 and Y3 are temporary registers to hold the values between

pipeline stages. X3 is loaded with operand value from the register file. Y3 is loaded with

either a register value from the register file or a constant from the instruction. The

operands are then available to the ALU. The ALU function is determined by decoding the

op-code bits. The result of the ALU operation is stored in register Z4, and then stored in

the destination register in the register write back stage. There is no activity in the memory

access stage for ALU instructions. Note that Z5, IR3, IR4, and IR5 are not shown

Page 209

Last Modified: 01-Nov-06

Advanced Computer Architecture-CS501

explicitly in the figure. The purpose of not including these registers is to keep the

drawing simple. However, these registers will transfer values as instructions progress

through the pipeline. This comment also applies to some other figures in this discussion.

2. Load/Store instructions

Load/Store instructions are usually of the form:

op-code ra, constant(rb)

The instruction is loaded into IR2 and the incremented value of the PC is loaded in PC2.

In the next stage, X3 is loaded with the value in PC2 if the relative addressing mode is

used, or the value in rb if the displacement addressing mode is used. Similarly, C1 is

transferred to Y3 for the relative addressing mode, and c2 is transferred to Y3 for the

displacement addressing mode. The store instruction is completed once memory access

has been made and the memory location has been written to. The load instruction is

completed once the loaded value is transferred back to the register file. The following

figure shows the schematic for a load instruction. A similar schematic can be drawn for

the store instruction.

Page 210

Last Modified: 01-Nov-06

Advanced Computer Architecture-CS501

3. Branch Instructions

Branch Instructions usually involve calculating the target address and evaluating a

condition. The condition is evaluated based on the c2 field of the IR and by using the

value in R[rc]. If the condition is true, the PC is loaded with the value in R[rb], otherwise

it is incremented by 4 as usual. The following figure shows these details.

The complete pipelined data path

The pipelined data path implementation diagrams shown earlier for the three SRC

instruction categories must be combined and refined to get a working system. These

details get complicated very quickly. A detailed combined diagram is shown in Figure

5.7 of the text book.

Page 211

Last Modified: 01-Nov-06

Advanced Computer Architecture-CS501

Control Signals for the Pipelined SRC

We define the following signals for the SRC by grouping similar op-codes:

In most cases, the signals defined above are used in the same stage where they are

generated. If that is not the case, a number used after the signal name indicates the stage

where the signal is generated.

Using these definitions, we can develop RTL statements for describing the pipeline

activity as well as the equations for the multiplexer select signals for different stages of

the pipeline. This is shown in the next diagram.

Control Signals for different pipeline Stages

Consider the RTL description of the Mp1 signal, which controls the input to the PC. It

simply means that if the branch and cond signals are not activated, then the PC is

incremented by 4, otherwise if both are activated then the value of R1 is copied in to the

PC.

The multiplexer Mp2 is used to decide which registers are read from the register file. If

the store signal is activated then R[rb] from the instruction bits is read from the register

file so that its value may be stored into memory, otherwise R[rc] is read from the register

file.

The multiplexer Mp3 is used to decide which registers are read from the register file for

operand 2. If either rl or branch is activated then the updated value of PC2 is transferred

to X3, otherwise if dsp or alu is activated, the value of R[ra] from the register file is

Page 212

Last Modified: 01-Nov-06

Advanced Computer Architecture-CS501

transferred to the x3. In the same way, multiplexer Mp4 is used to select an input from

Y3.

In the same way, multiplexer Mp4 is used to select an input for Y3.

The multiplexer MP5 is used to decide which value is transferred to be written back to

the register file. If the load signal is activated data from memory is transferred to Z5,

however if the load signal is not activated then data from Z4 (which is the result of ALU)

is transferred to Z5 which is then written back to the register file.

Page 213

Last Modified: 01-Nov-06

Table of Contents: