RTL to Describe the SRC, Register Transfer using Digital Logic Circuits

<< Reverse Assembly, SRC in the form of RTL

Thinking Process for ISA Design >>

Advanced Computer Architecture-CS501

Lecture Handouts

Computer Architecture

Appendix

Reading Material

Handouts

Summary

Introduction to FALSIM

Preparing source files for FALSIM

Using FALSIM

FALCON-A assembly language techniques

FALSIM

1. Introduction to FALSIM:

FALSIM is the name of the software application which consists of the

FALCON-A assembler and the FALCON-A simulator. It runs under

Windows XP.

FALCON-A Assembler:

Figure 1 shows a snapshot of the FALCON-A Assembler. This tool loads a

FALCON-A assembly file with a (.asmfa) extension and parses it. It shows

the parse results in an error log, lets the user view the assembled file's

contents in the file listing and also provides the features of printing the

machine code, an Instruction Table and a Symbol Table to a FALCON-A

listing file. It also allows the user to run the FALCON-A Simulator.

The FALCON-A Assembler has two main modules, the 1st-pass and the

2nd-pass. The 1st-pass module takes an assembly file with a (.asmfa)

extension and processes the file contents. It then creates a Symbol Table

Page 1

Last Modified: 01-Nov-06

Advanced Computer Architecture-CS501

which corresponds to the storage of all program variables, labels and data

values in a data structure at the implementation level. If the 1st-pass

completes successfully a Symbol Table is produced as an output, which is

used by the 2nd-pass module. Failures of the 1st-pass are handled by the

assembler using its exception handling mechanism.

The 2nd-pass module sequentially processes the .asmfa file to interpret the

instruction opcodes, register opcodes and constants using the symbol table.

It then produces a list file with a .lstfa extension independent of successful

or failed pass. If the pass is successful a binary file with a .binfa extension is

produced which contains the machine code for the program in the assembly

file.

FALCON-A Simulator:

Figure 6 shows a snapshot of the FALCON-A Simulator. This tool loads a

FALCON-A binary file with a (.binfa) extension and presents its contents

into different areas of the simulator. It allows the user to execute the

program to a specific point within a time frame or just executes it, line by

line. It also allows the user to view the registers, I/O port values and memory

contents as the instructions execute.

FALSIM Features:

The FALCON-A Assembler provides its user with the following features:

Select Assembly File: Labeled as "1" in Figure 1, this feature enables the

user to choose a FALCON-A assembly file and open it for processing by the

assembler.

Assembler Options: Labeled as "2" in Figure 1.

· Print Symbol Table

This feature if selected writes the Symbol Table (produced after the

execution of the 1st-pass of the assembler) to a FALCON-A list file with an

extension of (.lstfa). The Symbol Table includes data members, data

addresses and labels with their respective values.

· Print Instruction Table

This feature if selected writes the Instruction Table to a FALCON-A list file

with an extension of (.lstfa).

Page 2

Last Modified: 01-Nov-06

Advanced Computer Architecture-CS501

List File: Labeled as "3", in Figure 1, the List File feature gives a detailed

insight of the FALCON-A listing file, which is produced as a result of the

execution of the 1st and 2nd-pass. It shows the Program Counter value in

hexadecimal and decimal formats along with the machine code generated for

every line of assembly code. These values are printed when the 2nd-pass is

completed.

Error Log: The Error Log is labeled as "4" in Figure 1. It informs the user

about the errors and their respective details, which occurs in any of the

passes of the assembler.

Search: Search is labeled as "5" in Figure 1 and helps the user to search for

a certain input with the options of searching with "match whole" and

"match any" parts of the string. The search also has the option of checking

with/without considering "case-sensitivity". It searches the List File area

and highlights the search results using the yellow color. It also indicates the

total number of matches found.

Start Simulator: This feature is labeled as "6" in Figure 1. The FALCON-A

Simulator is run using the FALCON-A Assembler's Start Simulator option.

The FALCON-A Simulator is invoked by the user from the FALCON-A

Assembler. Its features are detailed as follows:

Load Binary File: The button labeled as "11" in Figure 6, allows the user to

choose and open a FALCON-A binary file with a (.binfa) extension. When a

file is being loaded into the simulator all the register, constants (if any) and

memory values are set.

Registers: The area labeled as "12" in Figure 6. enables, the user to see

values present in different registers before during and after execution.

Instruction: This area is labeled as "13" in Figure 6 and contains the value of

PC, address of an instruction, its representation in Assembly, the Register

Transfer Language, the op-code and the instruction type.

I/O Ports: I/O ports are labeled as "14" in Figure 6. These ports are available

for the user to enter input operation values and visualize output operation

values whenever an I/O operation takes place in the program. The input

value for an input operation is given by the user before an instruction

executes. The output values are visible in the I/O port area once the

instruction has successfully executed.

Page 3

Last Modified: 01-Nov-06

Advanced Computer Architecture-CS501

Memory: The memory is divided into 2 areas and is labeled as "15" in

Figure 6, to facilitate the view of data stored at different memory locations

before, during and after program execution.

Processor's State: Labeled as "16" in Figure 6, this area shows the current

values of the Instruction register and the Program Counter while the program

executes.

Search: The search option for the FALCON-A simulator is labeled as "17"

in Figure 6. This feature is similar to the way the search feature of the

FALCON-A Assembler works. It offers to highlight the search string which

goes as an input, with the "All " and " Part " option. The results of the search

are highlighted in the color yellow. It also indicates the total number of

matches.

The following is a description of the options available on the button panel

labeled as "18" in Figure 6.

Single Step: "Single Step" lets the user execute the program, one instruction

at a time. The next instruction is not executed unless the user does a "single

step" again. By default, the instruction to be executed will be the one next in

the sequence. It changes if the user specifies a different PC value using the

Change PC option (explained below).

Change PC: This option lets the user change the value of PC

(Program Counter). By changing the PC the user can execute the

instruction to which the specified PC points.

Execute: By choosing this button the user is able to execute the

instructions with the options of execution with/without breakpoint

insertion (refer to Fig. 5). In case of breakpoint insertion, the user has

the option to choose from a list of valid breakpoint values. It also has

the option to set a limit on the time for execution. This "Max

Execution Time" option restricts the program execution to a time

frame specified by the user, and helps the simulator in exception

handling.

Change Register: Using the Change Register feature, the user can

change the value present in a particular register.

Page 4

Last Modified: 01-Nov-06

Advanced Computer Architecture-CS501

Change Memory Word: This feature enables the user to change values

present at a particular memory location.

Display Memory: Display Memory shows an updated memory area,

after a particular memory location other than the pre-existing ones is

specified by the user.

Change I/O: Allows the user to give an I/O port value if the

instruction to be executed requires an I/O operation. Giving in the

input in any one of the I/O ports areas before instruction execution,

indicates that a particular I/O operation will be a part of the program

and it will have an input from some source. The value given by the

user indicates the input type and source.

Display I/O: Display I/O works in a manner similar to Display

Memory. Here the user specifies the starting index of an I/O port. This

features displays the I/O ports stating from the index specified.

2. Preparing source files for FALSIM:

In order to use the FALCON-A assembler and simulator, FALSIM,

the source file containing assembly language statements and directives

should be prepared according to the following guidelines:

· The source file should contain ASCII text only. Each line should be

terminated by a carriage return. The extension .asmfa should be used

with each file name. After assembly, a list file with the original

filename and an extension .lstfa, and a binary file with an extension

.binfa will be generated by FALSIM.

· Comments are indicated by a semicolon (;) and can be placed anywhere

in the source file. The FALSIM assembler ignores any text after the

semicolon.

· Names in the source file can be of one of the following types:

· Variables: These are defined using the .equ directive. A value must

also be assigned to variables when they are defined.

· Addresses in the "data and pointer area" within the memory: These

can be defined using the .dw or the .sw directive. The difference

between these two directives is that when .dw is used, it is not

possible to store any value in the memory. The integer after .dw

identifies the number of memory words to be reserved starting at the

current address. (The directive .db can be used to reserve bytes in

Page 5

Last Modified: 01-Nov-06

Advanced Computer Architecture-CS501

memory.) Using the .sw directive, it is possible to store a constant or

the value of a name in the memory. It is also possible to use pointers

with this directive to specify addresses larger than 127. Data tables

and jump tables can also be set up in the memory using this directive.

· Labels: An assembly language statement can have a unique label

associated with it. Two assembly language statements cannot have the

same name. Every label should have a colon (:) after it.

Use the .org 0 directive as the first line in the program. Although the use

of this line is optional, its use will make sure that FALSIM will start

simulation by picking up the first instruction stored at address 0 of the

memory. (Address 0 is called the reset address of the processor). A jump

[first] instruction can be placed at address 0, so that control is transferred

to the first executable statement of the main program. Thus, the label

first serves as the identifier of the "entry point" in the source file. The

.org directive can also be used anywhere in the source file to force code

at a particular address in the memory.

Address 2 in the memory is reserved for the pointer to the Interrupt

Service Routine (ISR). The .sw directive can be used to store the address

of the first instruction in the ISR at this location.

Address 4 to 125 can be used for addresses of data and pointers1.

However, the main program must start at address 126 or less2, otherwise

FALSIM will generate an error at the jump [first] instruction.

The main program should be followed by any subprograms or

procedures. Each procedure should be terminated with a ret instruction.

The ISR, if any, should be placed after the procedures and should be

terminated with the iret instruction.

The last line in the source file should be the .end directive.

The .equ directive can be used anywhere in the source file to assign

values to variables.

It is the responsibility of the programmer to make sure that code does not

overwrite data when the assembly process is performed, or vice versa. As

an example, this can happen if care is not exercised during the use of the

.org directive in the source file.

3. Using FALSIM:

Any address between 4 and 14 can be used in place of the displacement field in load or

store instructions. Recall that the displacement field is just 5 bits in the instruction word.

This restriction is because of the face that the immediate operand in the movi

instruction must fit an 8-bit field in the instruction word.

Page 6

Last Modified: 01-Nov-06

Advanced Computer Architecture-CS501

· To start FALSIM (the FALCON-A assembler and simulator), double

click on the FALSIM icon. This will display the assembler window,

as shown in the Figure 1.

· Select one or both assembler options shown on the top right corner of

the assembler window labeled as "2". If no option is selected, the

symbol table and the instruction table will not be generated in the list

(.lstfa) file.

· Click on the select assembly file button labeled as "1". This will open

the dialog box as shown in the Figure 2.

· Select the path and file containing the source program that is to be

assembled.

· Click on the open button. FALSIM will assemble the program and

generate two files with the same filename, but with different

extensions. A list file will be generated with an extension .lstfa, and a

binary (executable) file will be generated with an extension .binfa.

FALSIM will also display the list file and any error messages in two

separate panes, as shown in Figure 3.

· Double clicking on any error message highlights and displays the

corresponding erroneous line in the program listing window pane for

the user. This is shown in Figure 4. The highlight feature can also be

used to display any text string, including statements with errors in

them. If the assembler reported any errors in the source file, then these

errors should be corrected and the program should be assembled again

before simulation can be done. Additionally, if the source file had

been assembled correctly at an earlier occasion, and a correct binary

(.binfa) file exists, the simulator can be started directly without

performing the assembly process.

· To start the simulator, click on the start simulation button labeled as

"6". This will open the dialog box shown in Figure 6.

· Select the binary file to be simulated, and click open as shown in

Figure 7.

· This will open the simulation window with the executable program

loaded in it as shown in Figure 8. The details of the different panes in

Page 7

Last Modified: 01-Nov-06

Advanced Computer Architecture-CS501

this window were given in section 1 earlier. Notice that the first

instruction at address 0 is ready for execution. All registers are

initialized to 0. The memory contains the address of the ISR (i.e., 64h

which is 100 decimal) at location 2 and the address of the printer

driver at location 4. These two addresses are determined at assembly

time in our case. In a real situation, these addresses will be

determined at execution time by the operating system, and thus the

ISR and the printer driver will be located in the memory by the

operating system (called re-locatable code). Subsequent memory

locations contain constants defined in the program.

· Click single step button labeled as "19". FALSIM will execute the

jump [main] instruction at address 0 and the PC will change to 20h

(32 decimal), which is the address of the first instruction in the main

program (i.e., the value of main).

· Although in a real situation, there will be many instructions in the

main program, those instructions are not present in the dummy calling

program. The first useful instruction is shown next. It loads the

address of the printer driver in r6 from the pointer area in the memory.

The registers r5 and r7 are also set up for passing the starting address

of the print buffer and the number of bytes to be printed. In our

dummy program, we bring these values in to these registers from the

data area in the memory, and then pass these values to the printer

driver using these two registers. Clicking on the single step button twice,

executes these two instructions.

· The execution of the call instruction simulates the event of a print

request by the user. This transfers control to the printer driver. Thus,

when the call r4, r6 instruction is single stepped, the PC changes to

32h (50 decimal) for executing the first instruction in the printer

driver.

· Double click on memory location 000A, which is being used for

holding the PB (printer busy) flag. Enter a 1 and click the change

memory button. This will store a 0001 in this location, indicating that

a previous print job is in progress. Now click single step and note that

this value is brought from memory location 000E into register r1.

Clicking single step again will cause the jnz r1, [message] instruction

to execute, and control will transfer to the message routine at address

0046h. The nop instruction is used here as a place holder.

Page 8

Last Modified: 01-Nov-06

Advanced Computer Architecture-CS501

· Click again on the single step button. Note that when the ret r4

instruction executes, the value in r4 (i.e., 28h) is brought into the PC.

The blue highlight bar is placed on the next instruction after the call

r4, r6 instruction in the main program. In case of the dummy calling

program, this is the halt instruction.

· Double click on the value of the PC labeled as "20". This will open a

dialog box shown below. Enter a

value of the PC (i.e., 26h)

corresponding to the call r4, r6

instruction, so that it can be

executed again. A "list" of possible

PC values can also be pulled down

using, and 0026h can be selected

from there as well.

· Click single step again to enter the printer driver again.

· Change memory location 000A to a 0, and then single step the first

instruction in the printer driver. This will bring a 0 in r1, so that when

the next jnz r1, [message] instruction is executed, the branch will not

be taken and control will transfer to the next instruction after this

instruction. This is mivi r1, 1 at address 0036h.

· Continue single stepping.

· Notice that a 1 has been stored in memory location 000A, and r1

contains 11h, which is then transferred to the output port at address

3Ch (60 decimal) when the out r1, controlp instruction executes.

This can be verified by double clicking on the top left corner of the

I/O port pane, and changing the address to 3Ch. Another way to

display the value of an I/O port is to scroll the I/O window pane to

the desired position.

· Continue single stepping till the int instruction and note the changes

in different panes of the simulation window at each step.

· When the int instruction executes, the PC changes to 64h, which is the

address of the first instruction in the ISR. Clicking single step executes

this instruction, and loads the address of temp (i.e., 0010h) which is a

Page 9

Last Modified: 01-Nov-06

Advanced Computer Architecture-CS501

temporary memory area for storing the environment. The five store

instructions in the ISR save the CPU environment (working registers)

before the ISR change them.

· Single step through the ISR while noting the effects on various registers,

memory locations, and I/O ports till the iret instruction executes. This will

pass control back to the printer driver by changing the PC to the address of

the jump [finish] instruction, which is the next instruction after the int

instruction.

· Double click on the value of the PC. Change it to point to the int

instruction and click single step to execute it again. Continue to single step

till the in r1, statusp instruction is ready for execution.

· Change the I/O port at address 3Ah (which represents the status port at

address 58) to 80 and then single step the in r1, statusp instruction. The

value in r1 should be 0080.

· Single step twice and notice that control is transferred to the movi r7,

FFFF3 instruction, which stores an error code of 1 in r1.

The instruction was originally movi r7, -1. Since it was converted to machine language

by the assembler, and then reverse assembled by the simulator, it became movi r7,

FFFF. This is because the machine code stores the number in 16-bits after sign-

extension. The result will be the same in both cases.

Page 10

Last Modified: 01-Nov-06

Advanced Computer Architecture-CS501

Figure 1

Figure 2

Page 11

Last Modified: 01-Nov-06

Advanced Computer Architecture-CS501

Figure 3

Figure 4

Page 12

Last Modified: 01-Nov-06

Advanced Computer Architecture-CS501

Figure 5

Figur

Page 13

Last Modified: 01-Nov-06

Advanced Computer Architecture-CS501

Figure 7

Figure 8

4. FALCON-A assembly language programming techniques:

Page 14

Last Modified: 01-Nov-06

Advanced Computer Architecture-CS501

· If a signed value, x, cannot fit in 5 bits (i.e., it is outside the range -16 to

+15), FALSIM will report an error with a load r1, [x] or a store r1, [x]

instruction. To overcome this problem, use movi r2, x followed by load

r1, [r2].

· If a signed value, x, cannot fit in 8 bits (i.e., it is outside the range

128 to +127), even the previous scheme will not work. FALSIM will

report an error with the movi r2, x instruction. The following instruction

sequence should be used to overcome this limitation of the FALCON-A.

First store the 16-bit address in the memory using the .sw directive. Then

use two load instructions as shown below:

.sw x

load r2, [a]

load r1, [r2]

This is essentially a "memory-register-indirect" addressing. It has been

made possible by the .sw directive. The value of a should be less than 15.

· A similar technique can be used with immediate ALU instructions for

large values of the immediate data, and with the transfer of control (call

and jump) instructions for large values of the target address.

· Large values (16-bit values) can also be stored in registers using the mul

instruction combined with the addi instruction. The following

instructions bring a 201 in register r1.

movi r2, 10

movi r3, 20

mul r1, r2, r3

; r1 contains 200 after this instruction

addi r1, r1, 1

; r1 now contains 201

· Moving from one register to another can be done by using the instruction

addi r2, r1, 0.

· Bit setting and clearing can be done using the logical (and, or, not, etc)

instructions.

· Using shift instructions (shiftl, asr, etc.) is faster that mul and div, if the

multiplier or divisor is a power of 2.

Page 15

Last Modified: 01-Nov-06

Advanced Computer Architecture-CS501

Lecture Handout

Computer Architecture

Lecture No. 1

Reading Material

Vincent P. Heuring & Harry F. Jordan

Chapter 1

Computer Systems Design and Architecture

1.1, 1.2, 1.3, 1.4, 1.5

Summary

Distinction between computer architecture, organization and design

Levels of abstraction in digital design

Introduction to the course topics

Perspectives of different people about computers

General operation of a stored program digital computer

The Fetch-Execute process

Concept of an ISA(Instruction Set Architecture)

Introduction

This course is about Computer Architecture. We start by explaining a few key terms.

The General Purpose Digital Computer

How can we define a `computer'? There are several kinds of devices that can be termed

"computers": from desktop machines to the microcontrollers used in appliances such as a

microwave oven, from the Abacus to the cluster of tiny chips used in parallel processors,

etc. For the purpose of this course, we will use the following definition of a computer:

"an electronic device, operating

under the control of instructions

stored in its own memory unit, that

can accept data (input), process data

arithmetically and logically, produce

output from the processing, and store

the results for future use." [1]

Thus, when we use the term computer,

we actually mean a digital computer.

There are many digital computers,

which have dedicated purposes, for

example, a computer used in an

automobile that controls the spark

Page 16

Last Modified: 01-Nov-06

Advanced Computer Architecture-CS501

Advanced Computer Architecture

Lecture No. 6

Reading Material

Handouts

Slides

Summary

Using Behavioral RTL to Describe the SRC (continued)

Implementing Register Transfer using Digital Logic Circuits

Using behavioral RTL to Describe the SRC (continued)

Once the instruction is fetched and the PC is incremented, execution of the instruction

starts. In the following discussion, we denote instruction fetch by "iF" and instruction

execution by "iE".

iE:= (

(op<4..0>= 1) : R [ra] ← M [disp],

(op<4..0>= 2) : R [ra] ← M [rel],

...

(op<4..0>=31) : Run ← 0,); iF);

As shown above, instruction execution can be described by using a long list of

conditional operations, which are inherently "disjoint". Only one of these statements is

executed, depending on the condition met, and then the instruction fetch statement (iF) is

invoked again at the end of the list of concurrent statements. Thus, instruction fetch (iF)

and instruction execution statements invoke each other in a loop. This is the fetch-execute

cycle of the SRC.

Concurrent Statements

The long list of concurrent, disjoint instructions of the instruction execution (iE) is

basically the complete instruction set of the processor. A brief overview of these

instructions is given below:

Load-Store Instructions

(op<4..0>= 1) : R [ra] ← M [disp], load register (ld)

This instruction is to load a register using a displacement address specified by the

instruction, i.e., the contents of the memory at the address `disp' are placed in the register

R [ra].

(op<4..0>= 2) : R [ra] ← M [rel], load register relative (ldr)

Page 72

Last Modified: 01-Nov-06

Advanced Computer Architecture-CS501

If the operation field `op' of the instruction decoded is 2, the instruction that is executed

is loading a register (target address of this register is specified by the field ra) with

memory contents at a relative address, `rel'. The relative address calculation has been

explained in this section earlier.

(op<4..0>= 3) : M [disp] ← R [ra], store register (st)

If the op-code is 3, the contents of the register specified by address ra, are stored back to

the memory, at a displacement location `disp'.

(op<4..0>= 4) : M[rel] ← R[ra], store register relative (str)

If the op-code is 4, the contents of the register specified by the target register address ra,

are stored back to the memory, at a relative address location `rel'.

(op<4..0>= 5) : R [ra] ← disp,

load displacement address (la)

For op-code 5, the displacement address disp is loaded to the register R (specified by the

target register address ra).

(op<4..0>= 6) : R [ra] ← rel,

load relative address (lar)

For op-code 6, the relative address rel is loaded to the register R (specified by the target

Branch Instructions

(op<4..0>= 8) : (cond : PC ← R [rb]), conditional branch (br)

If the op-code is 8, a conditional branch is taken, that is, the program counter is set to the

target instruction address specified by rb, if the condition `cond' is true.

(op<4..0>= 9) : (R [ra] ← PC,

cond : (PC ← R [rb]) ), branch and link (brl)

If the op field is 9, branch and link instruction is executed, i.e. the contents of the

program counter are stored in a register specified by ra field, (so control can be returned

to it later), and then the conditional branch is taken to a branch target address specified by

rb. The branch and link instruction is useful for returning control to the calling program

after a procedure call returns.

The conditions that these `conditional' branches depend on, are specified by the field c3

that has 3 bits. This simply means that when c3<2..0> is equal to one of these six values,

we substitute the expression on the right hand side of the : in place of cond.

These conditions are explained here briefly.

cond := (

c3<2..0>=0 : 0,

never

If the c3 field is 0, the branch is never taken.

c3<2..0>=1 : 1,

always

If the field is 1, branch is taken

c3<2..0>=2 : R [rc]=0,

if register is zero

If c3 = 2, a branch is taken if the register rc = 0.

c3<2..0>=3 : R [rc] ≠ 0,

if register is nonzero

If c3 = 3, a branch is taken if the register rc is not equal to 0.

c3<2..0>=4 : R [rc]<31>=0 if positive or zero

If c3 is 4, a branch is taken if the register value in the register specified

by rc is greater than or equal to 0.

c3<2..0>=5 : R [rc]<31>=1), if negative

If c3 = 5, a branch is taken if the value stored in the register specified by

rc is negative.

Page 73

Last Modified: 01-Nov-06

Advanced Computer Architecture-CS501

Arithmetic and Logical instructions

(op<4..0>=12) : R [ra] ← R [rb] + R [rc],

If the op-code is 12, the contents of the registers rb and rc are added and the result is

stored in the register ra.

(op<4..0>=13) : R [ra] ← R [rb] + c2<16..0> {sign extended},

If the op-code is 13, the content of the register rb is added with the immediate data in the

field c2, and the result is stored in the register ra.

(op<4..0>=14) : R [ra] ← R [rb] R [rc],

If the op-code is 14, the content of the register rc is subtracted from that of rb, and the

result is stored in ra.

(op<4..0>=15) : R [ra] ← -R [rc],

If the op-code is 15, the content of the register rc is negated, and the result is stored in ra.

(op<4..0>=20) : R [ra] ← R [rb] & R [rc],

If the op field equals 20, logical AND of the contents of the registers rb and rc is obtained

and the result is stored in register ra.

(op<4..0>=21) : R [ra] ← R [rb] & c2<16..0> {sign extended},

If the op field equals 21, logical AND of the content of the registers rb and the immediate

data in the field c2 is obtained and the result is stored in register ra.

(op<4..0>=22) : R [ra] ← R [rb] ~ R [rc],

If the op field equals 22, logical OR of the contents of the registers rb and rc is obtained

and the result is stored in register ra.

(op<4..0>=23) : R [ra] ← R [rb] ~ c2<16..0> {sign extended},

If the op field equals 23, logical OR of the content of the registers rb and the immediate

data in the field c2 is obtained and the result is stored in register ra.

(op<4..0>=24) : R [ra] ← !R [rc],

If the op-code equals 24, the content of the logical NOT of the register rc is obtained, and

the result is stored in ra.

Shift instructions

(op<4..0>=26): R [ra]<31..0 > ← (n α 0) R [rb] <31..n>,

If the op-code is 26, the contents of the register rb are shifted right n bits times. The bits

that are shifted out of the register are discarded. 0s are added in their place, i.e. n number

of 0s is added (or concatenated) with the register contents. The result is copied to the

(op<4..0>=27) : R [ra]<31..0 > ← (n α R [rb] <31>) R [rb] <31..n>,

For op-code 27, shift arithmetic operation is carried out. In this operation, the contents of

the register rb are shifted right n times, with the most significant bit, i.e., bit 31, of the

(op<4..0>=28) : R [ra]<31..0 > ← R [rb] <31-n..0> (n α 0),

For op-code 28, the contents of the register rb are shifted left n bits times, similar to the

shift right instruction. The result is copied to the register ra.

(op<4..0>=29) : R [ra]<31..0 > ← R [rb] <31-n..0> R [rb]<31..32-n >,

The instruction corresponding to op-code 29 is the shift circular instruction. The contents

of the register rb are shifted left n times, however, the bits that move out of the register in

the shift process are not discarded; instead, these are shifted in from the other end (a

circular shifting). The result is stored in register ra.

where

n := ( (c3<4..0>=0) : R [rc],

Page 74

Last Modified: 01-Nov-06

Advanced Computer Architecture-CS501

(c3<4..0>!=0) : c3 <4..0> ),

Notation:

α means replication

means concatenation

Miscellaneous instructions

(op<4..0>= 0) ,

No operation (nop)

If the op-code is 0, no operation is carried out for that clock period. This instruction is

used as a stall in pipelining.

(op<4..0>= 31) : Run ← 0, Halt the processor (Stop)

); iF );

If the op-code is 31, run is set to 0, that is, the processor stops execution.

After one of these disjoint instructions is executed, iF, i.e. instruction Fetch is carried out

once again, and so the fetch-execute cycle continues.

Implementing Register Transfers using Digital Logic Circuits

We have studied the register transfers in the previous sections, and how they help in

implementing assembly language. In this section we will review how the basic digital

logic circuits are used to implement instructions register transfers. The topics we will

cover in this section include:

1. A brief (and necessary) review of logic circuits

2. Implementing simple register transfers

3. Register file implementation using a bus

4. Implementing register transfers with mathematical operations

5. The Barrel Shifter

6. Implementing shift operations

Review of logic circuits

Before we study the implementation of register transfers using logic circuits, a brief

overview of some of the important logic circuits will prove helpful. The topics we review

in this section include

1. The basic D flip flop

2. The n-bit register

3. The n-to-1 multiplexer

4. Tri-state buffers

The basic D flip flop

A flip-flop is a bi-stable device,

capable of storing one bit of

Information. Therefore, flip-flops

are used as the building blocks of a

computer's memory as well as CPU

registers.

Page 75

Last Modified: 01-Nov-06

Advanced Computer Architecture-CS501

There are various types of flip-flops; most common type, the D flip-flop is shown in the

figure given. The given truth table for this positive-edge triggered D flip-flop shows that

the flip-flop is set (i.e. stores a 1) when the data input is high on the leading (also called

the positive) edge of the clock; it is reset (i.e., the flip-flop stores a 0) when the data input

is 0 on the leading edge of the clock. The clear input will reset the flip-flop on a low

input.

The n-bit register

A n-bit register can be formed by

grouping n flip-flops together. So a

group of flip-flops operate

synchronously.

A register is useful for storing

binary data, as each flip-flop can

store one bit. The clock input of

the flip-flops is grouped

together, as is the enable input.

As shown in the figure, using

the input lines a binary number

can be stored in the register by

applying the corresponding

logic level to each of the flip-

flops simultaneously at the

positive edge of the clock.

The next figure shows the

symbol of a 4-bit register used

for an integrated circuit. In0

through In3 are the four input

lines, Out0 through Out3 are the

four output lines, Clk is the

clock input, and En is the enable

line.

get

better

understanding of this register,

consider the situation where we want

to store the binary number 1000 in the

the input lines, as shown in the figure given.

On the leading edge of the clock, the number will be stored in the register. The enable

input has to be high if the number is to be stored into the register.

Page 76

Last Modified: 01-Nov-06

Advanced Computer Architecture-CS501

Waveform/Timing diagram

The n-to-1 multiplexer

A multiplexer is a device, constructed

through combinational logic, which

takes n inputs and transfers one of

them as the output at a time. The input

that is selected as the output depends

on the selection lines, also called the

control input lines. For an n-to-1

Page 77

Last Modified: 01-Nov-06

Advanced Computer Architecture-CS501

multiplexer, there are n input lines, log2n control lines, and 1 output line. The given

figure shows a 4-to-1 multiplexer. There are 4 input lines; we number these lines as line 0

through line 3. Subsequently, there are 2 select lines (as log24 = 2).

For a better understanding, let us consider a case where we want to transfer the input of

line 3 to the output of the multiplexer. We will need to apply the binary number 11 on the

select lines (as the binary number 11 represents the decimal number 3). By doing so, the

output of the multiplexer will be the input on line 3, as shown in the test circuit given.

Timing waveform

Tri-state buffers

The tri-state buffer, also called the three-

state buffer, is another important

component in the digital logic domain. It

has a single input, a single output, and

an enable line. The input is concatenated

to the output only if it is enabled through

the enable line, otherwise it gives a high

impedance output, i.e. it is tri-stated, or

electrically disconnected from the input

These buffers are available both in the

inverting and the non-inverting form. The

inverting tri-state buffers output the

`inverted' input when they are enabled,

as opposed to their non-inverting

counterparts that simply output the input

when enabled. The circuit symbol of the

tri-state buffers is shown. The truth table

Page 78

Last Modified: 01-Nov-06

Advanced Computer Architecture-CS501

further clarifies the working of a non-inverting tri-state buffer.

We can see that when the enable input (or the control input) c is low (0), the output is

high impedance Z. The symbol of a 4-bit tri-state buffer unit is shown in the figure. There

are four input lines, an equal number of

output lines, and an enable line in this

unit. If we apply a high on the input 3

and 2, and a low on input 1 and 0, we

get the output 1100, only when the

enable input is high, as shown in the

given

figure.

Implementing simple register transfers

We now build on our knowledge of the primitive logic circuits to understand how register

transfers are implemented. In this section we will study the implementation of the

following

· Simple conditional transfer

· Concept of control signals

· Two-way transfers

· Connecting multiple registers

· Buses

· Bus implementations

Simple conditional transfer

In a simple conditional transfer, a condition is checked, and if it is true, the register

transfer takes place. Formally, a conditional transfer is represented as

Cond: RD ← RS

This means that if the condition `Cond' is true, the contents of the register named RS (the

source register) are copied to the register RD (the destination register). The following

figure shows how the registers may be interconnected to achieve a conditional transfer. In

Page 79

Last Modified: 01-Nov-06

Advanced Computer Architecture-CS501

this circuit, the output of the source register RS is connected to the input of the

destination registers RD. However, notice that the transfer will not take place unless the

enable input of the destination register is activated. We may say that the `transfer' is

being controlled by the enable line (or the control signal). Now, we are able to control the

transfer by selectively enabling the control signal, through the use of other combinational

logic that may be the equivalent of our condition. The condition is, in general, a Boolean

expression, and in this example, the condition is equivalent to LRD =1.

Two-way transfers

In the above example, only one-way transfer was possible, i.e., we could only copy the

contents of RS to RD if the condition was met. In order to be able to achieve two-way

transfers, we must also provide a path from the output of the register RD to input of

Cond1: RD ← RS

Cond2: RS ← RD

Connecting multiple registers

We have seen how two registers can be connected. However, in a computer we need to

connect more than just two registers. In order to connect these registers, one may argue

that a connection between the input and output of each be provided. This solution is

shown for a scenario where there are 5 registers that need to be interconnected.

We can see that in this solution, an m-bit register requires two connections of m-wires

each. Hence five m-bit registers in a "point-to-point" scheme require 20 connections;

each with m wires. In general, n registers in a point to point scheme require n (n-1)

connections. It is quite obvious that this solution is not going to scale well for a large

Page 80

Last Modified: 01-Nov-06

Advanced Computer Architecture-CS501

number of registers, as is the case in real machines. The solution to this problem is the

use of a bus architecture, which is explained in the following sections.

Buses

A bus is a device that provides a shared data

path to a number of devices that are connected

to it, via a `set of wires' or a `set of

conductors'. The modern computer systems

extensively employ the bus architecture.

Control signals are needed to decide which two

entities communicate using the shared medium,

i.e. the bus, at any given time. This control

signals can be open collector

gate based, tri-state buffer

based, or they can be

implemented

using

multiplexers.

Register file implementation

using the bus architecture

A number of registers can be

inter-connected to form a

bus. The given diagram shows

eight 4-bit registers (R0, R1, ...,

R7) interconnected through a 4-

bit bus using 4-bit tri-state

buffer units (labeled AA_TS4).

The contents of a particular

the bus by applying a logical

high input on the enable of the

corresponding tri-state buffer.

For instance, R1out can be used

to enable the tri-state buffers of

the register R1, and in turn

transfer the contents of the

Once the contents of a particular

contents may be transferred, or

read into any other register.

More than one register may be

written in this manner; however,

only one register can write its

value on the bus at a given time.

Page 81

Last Modified: 01-Nov-06

Advanced Computer Architecture-CS501

Implementing register transfers with mathematical operations

We have studied the implementation of simple register transfers; however, we frequently

encounter register transfers with mathematical operations. An example is

(opc=1): R4← R3 + R2;

These mathematical operations may be achieved by introducing appropriate

combinational logic; the above operation can be implemented in hardware by including a

4-bit adder with the register files connected through the bus. There are two more registers

in this configuration, one for holding one of the operands, and the other for holding the

result before it is transferred to the destination register. This is shown in the figure below.

Page 82

Last Modified: 01-Nov-06

Advanced Computer Architecture-CS501

We now take a look at

the steps taken for the

(conditional,

mathematical)

transfer

(opc=1): R4← R3 + R2.

First of all, if the

condition opc = 1 is met,

the contents of the first

operand register, R3, are

transferred

the

temporary register A

through the bus. This is

done

activating

R3out. It lets the contents of the register R3 to be loaded on the bus. At the same time,

applying a logical high input to LA enables the load for the register A. This lets the

binary number on the bus (the contents of register R3) to be loaded into the register A.

The next step is to enable R2out to load the contents of the register R2 onto the bus. As

can be observed from the figure, the output of the register A is one of the inputs to the 4-

bit adder; the other input to the adder is the bus itself. Therefore, as the contents of

output can then be stored to the register RC by enabling its write. So a high input is

applied to LC to store the result in register RC.

The third and final step is to store (transfer) the resultant number in the destination

then enabling the read of the register R4 by activating the control signal to LR4. These

steps are summarized in the given table.

The barrel shifter

Shift operations are frequently used operations, as shifts can be used for the

implementation of multiplication and division etc. A bi-directional shift register with a

parallel load capability can be used to perform shift operations. However, the delays in

such structures are dependent on the number of shifts that are to be performed, e.g., a 9

bit shift requires nine clock periods, as one shift is performed per clock cycle. This is not

an optimal solution. The barrel shifter is an alternative, with any number of shifts

accomplished during a single clock period. Barrel shifters are constructed by using

multiplexers. An n-bit barrel shifter is a combinational circuit implemented using n

multiplexers. The barrel provides a shifted copy of the input data at its output. Control

inputs are provided to specify the number of times the input data is to be shifted. The

shift process can be a simple one with 0s used as fillers, or it can be a rotation of the input

data. The corresponding figure shows a barrel shifter that shifts right the input data; the

number of shifts depends on the bit pattern applied on the control inputs S0, S1.

The function table for the barrel shifter is given. We see from the table that in order to

apply single shift to the input number, the control signal is 01 on (S1, S0), which is the

binary equivalent of the decimal number 1. Similarly, to apply 2 shifts, control signal 10

Page 83

Last Modified: 01-Nov-06

Advanced Computer Architecture-CS501

(on S1, S0) is applied; 10 is the binary

equivalent of the decimal number 2. A

control input of 11 shifts the number 3

places to the right.

Now we take a look at an example of

the shift operation being implemented

through the use of the barrel shifter:

R4← ror R3 (2 times);

The shift functionality can be

incorporated into the register file

circuit with the bus architecture we

have been building, by introducing the

barrel shifter, as shown in the given

figure.

To perform the operation,

R4← ror R3 (2 times),

the first step is to activate R3out, nb1

and LC. Activating R3out will load the

contents of the register R3 onto the bus.

Since the bus is directly connected to

the input of the barrel shifter, this

number is applied to the input side. nb1

and nb0 are the barrel shifter's control

lines for specifying the number of shifts

to be applied. Applying a high input to

nb1 and a low input to nb0 will shift the

number two places to the right.

Activating LC will load the shifted

output of the barrel shifter into the

Page 84

Last Modified: 01-Nov-06

Advanced Computer Architecture-CS501

This is done by activating the control Cout, which will load the contents of register C

onto the data bus, and by activating the control LR4, which will let the contents of the

bus be written to the register R4. This will complete the conditional shift-and-store

operation. These steps are summarized in the table shown below.

Page 85

Last Modified: 01-Nov-06

Table of Contents: