|
|||||
Advanced Computer
Architecture-CS501
________________________________________________________
Advanced
Computer Architecture
Lecture
No. 24
Reading
Material
Handouts
Slides
Summary
·
Designing
Parallel I/O Ports
·
Practical
Implementation of the
SAD
·
NUXI
Problem
·
Variation
in the Implementation of the Address
Decoder
·
Estimating
the Delay Interval
Designing
Parallel I/O Ports
This
section is about designing
parallel input and output
ports. As you already
know
from
the previous discussion, an
interface that is used to
connect the computer bus
with
I/O devices is
called an I/O port. This I/O
port can be connected directly to
the computer
bus (also
called the system bus) or
through an intermediate bus called
the I/O bus. This
intermediate
bus is also called the expansion bus or
the peripheral bus. In any
case, the
following
general information about I/O bus
cycles on a typical CPU
should be kept in
mind: At
the start of a particular bus
cycle (which will be an I/O bus cycle in
this case),
the
CPU places an address on its
address bus. This address
will identify the I/O device
to
be
involved in the transfer.
After some time the
CPU will activate certain
control signals,
which
will indicate whether the
particular I/O bus cycle, is an I/O read
or an I/O write
cycle.
Based on these control
signals, in case of I/O read
cycle, the CPU will
be
expecting
data from the selected
input device over the
data bus, and for an I/O
write cycle
the
CPU will provide data to the
selected device over the
data bus. At the end of this
I/O
bus
cycle, the address (and
data) information will be removed from
the buses and the
control
signals will be reset. It can be easily
understood from this
discussion that we
must
match the timing
requirements of the I/O ports to be
designed with the
timing
parameters
of the given CPU.
Additionally, the voltage and
current requirements of
the
I/O ports
must be matched with the
voltage and current specifications of
the CPU. For
simplicity,
we ignore the voltage and
current matching details in
this discussion and
only
focus on
the logic levels and timing
aspects of the design. Voltage and
current related
discussions
are the topic of an
electronics course.
Thus,
there are two important
functions which should be
built into I/O ports.
1. Address
decoding
2. Data
isolation for input ports or
data capturing for output
ports.
Page
245
Last
Modified: 01-Nov-06
Advanced Computer
Architecture-CS501
________________________________________________________
1.
Address decoding: Since
every I/O port has a unique
identifier associated with
it,
(which is
called its address, and no
other port in the system
should have the
same
address),
by monitoring the system
address bus, the I/O port
knows when it is its turn
to
participate
in a transfer. At this time,
the address decoder within
the I/O port
generates
an
asserted output which can be
applied to the enable input of
tri-state buffers in
input
ports or
the latch enable input of
latches in output
ports.
Our
definition of an address decoder:
An
"Address Decoder" is a combinational
(logic)
circuit
with n + r
inputs
and a single output,
where
n
=
the number of address lines
into the
decoder,
and
r
=
the number of control lines
into the
decoder.
The
output fD is
active only when
the
corresponding
address is present on the n
address
lines and
the corresponding r
control
lines hold
the
"proper" (active or inactive)
value. fD
is
inactive
for all other
situations.
Suggestions
for address decoder
design:
1.1
Start by thinking of the
address decoder as a
"big AND
gate". We will call this a
"skeleton
address
decoder" or SAD. The output
of the SAD will be active
only when the
correct
address
is present on the system address bus and
the relevant control bus
signals hold the
proper
values. At all other times,
the output of the SAD
should be deactivated.
1.2
Always write the port
address of the port to be
designed in binary. Associate
the
CPU's
address lines with each
bit. Those lines which
are zero will be inverted
before
being
fed into the "big AND
gate"; other address lines
will not be inverted.
1.3
List the relevant control
signals for the system to
which the port is to be attached.
If
the
"proper" value of the signal
is 0, it should be inverted before
applying to the SAD,
otherwise
it is fed directly into the
SAD.
1.4
Determine whether the
decoder output should be
active high or low. This
will depend
on the
type of latch or buffer used
in the design. If an active low
decoder output is
needed,
invert the output from
the "big AND gate".
1.5
Once the logic for
the address decoder is established,
the SAD can be
implemented
using
any of the available methods
of logic design. For example, HDL
code in Verilog or
VHDL can be
generated and the address
decoder can be implemented using
PLDs.
Alternately,
the SAD can be implemented
using SSI building
blocks.
2.
Data isolation or capturing: For
input ports, the in coming
data should be placed on
the
data bus only during the I/O
read bus cycle. At all other
times, this data should
be
isolated
from the data bus otherwise
it will cause "bus contention".
Tri-state buffers are
used
for this purpose. Their
input lines are connected to
the peripheral device
supplying
data and
their output lines are
connected to the data bus.
The common enable line of
such
Page
246
Last
Modified: 01-Nov-06
Advanced Computer
Architecture-CS501
________________________________________________________
buffers
is driven with the output of
the SAD. If this enable is
active low, the output of
the
big AND
gate in the SAD should be
inverted, as described
earlier.
For
output ports, data is made
available for the peripheral
device at the data bus
during
the I/O
write bus cycle. During
other bus cycles, this data
will be removed from the
data
bus by
the processor. Latches (or
registers) are used for
this purpose. Their input
lines are
connected
to the system data bus and
their output lines are
connected to the
peripheral
device
receiving data. The common
clock (or latch enable)
line of such latches is
driven
with
the output of the SAD. If
this clock is active low,
the output of the big AND
gate in
the
SAD should be
inverted.
Example
# 1
Problem
Statement:
Design a
16-bit parallel output port
mapped on address DEh of the
I/O space of the
FALCON-A
CPU.
Solution:
Using
the guidelines mentioned above, we
start with a
"big AND
gate" (SAD) and write the
address to be
decoded
(DEh) in binary.
Thus,
DEh → 1101 1110 b.
Associating one CPU
address
line
with each bit, we get A0 = 0,
A1=1, etc as shown in
the
table below.
Because
the I/O space on the
FALCON-A is only 256
bytes,
address lines A15 .. A8 are
don't cares, and will not
be
used in
this design.
1
1
0
1
1
1
1
0
A7 A6 A5 A4 A3 A2
A1 A0
Thus, A0
and A5 will be applied to the "big AND
gate" after inversion. The
remaining
address
lines will be connected directly to
the inputs of the
SAD.
Next, we
look at the relevant control
signals. The only signal
which should be used in
this
case is
IOW#. A logic 0 (zero) on
this line indicates
that
it is
active. Thus, it should be
inverted before being
applied
to the input of the
SAD.
We can
easily see that our
SAD intuitively conforms
to
the
way we defined an address decoder.
Its output is a 1
only
when the address (xxxx xxxx 1101 1110 b)
is
present on
the FALCON-A's address bus
during an I/O
write
cycle (By the way,
this will take place when
the
instruction
out
reg, addr with
addr=DEh
or 222d is
executing
on the FALCON-A). At all
other times, its output
will
be
inactive.
To make
things simple, we use a
circle (or a bubble) to
indicate
an
inverter, as shown .Since
this is a 16-bit output
port, we will
use
two 8-bit registers to capture
data from the
FALCON-A's
data
bus. The output of the
SAD will be connected to the
enable
Page
247
Last
Modified: 01-Nov-06
Advanced Computer
Architecture-CS501
________________________________________________________
inputs of
the two registers. The
D-inputs of the registers will be
connected to the data
bus
and the Q
outputs of the registers will be
connected to the peripheral
device.
Practical
implementation of the SAD
Our
SAD in this design is an AND gate
with 9 inputs. Using SSI
chips, we can
implement
this SAD using an 8-input
AND gate and a 2-input AND gate as
shown in the
figure
shown below.
Displaying
output data using LED
branches:
An "LED
branch" is a combination of a resistor
and a light emitting diode (LED)
in
series.
Sixteen LED branches can be used to
display the output data
captured by the
registers as
shown in the figure
below.
Example
# 2
Problem
statement:
Given a
16-bit parallel output port
attached with the FALCON-A
CPU as shown in the
figure.
The port is mapped onto
address DEh of the
FALCON-A's I/O space.
Sixteen
LED
branches are used to display
the data being received
from the FALCON-A's
data
bus.
Every LED branch is wired in
such a way that when a 1
appears on the
particular
data bus
bit, it turns the LED
on; a 0 turns it off.
Which
LEDs will be ON when the
instruction
out
r2, 222 13
executes on
the CPU? Assume r2 contains
1234h.
Solution:
13
Depending
on the way the assembler is
written, the syntax of the
out
instruction
may allow only
the
decimal
form of the port address, or
only the hexadecimal form,
or both. Our version of the
assembler for
the
FALCON-A allows the decimal
form only. It also requires
that the port address be
aligned on 16-bit
"word
boundaries", which means
that every port address
should be divisible by 2.
Page
248
Last
Modified: 01-Nov-06
Advanced Computer
Architecture-CS501
________________________________________________________
Since r2
contains 1234h, the bit
pattern corresponding to this
value will be sent out to
the
output
port at address 222 (or
DEh). This is the address of
the output port in
this
example.
Writing the bit pattern in
binary will help us determine
the LEDs which will
be
ON.
Now
1234h gives us the following
bit associations with the
data bus
0
0
0
1
0
0
1
0
0
0
1
1
0
1
0
0
D15
D14 D13 D12 D11
D10 D9 D8 D7 D6 D5 D4 D3 D2 D1 D0
MSB at
address DEh
LSB at
address DFh
Note
that the 8-bit register
which uses lines D15 .. D8
of the FALCON-A's data bus
is
actually
mapped onto address DEh of
the I/O space. This is
because the architect of
the
FALCON-A
had chosen a "byte-wide" (i.e., x8)
organization of the address
space, a 16-
bit
data bus width, and the
"big-endian" data format at
the ISA design stage.
Additionally,
data bus lines D15...D8 will
transfer the data byte of
higher significance
(MSB)
using address DEh, and
D7...D0 will transfer the
data byte of lower
significance
(LSB)
using address DFh. Thus
the LEDs at L12, L9,
L5, L4 and L2 will turn
on.
The
NUXI Problem
It can be
easily understood from the
previous example that the
big-endian format
results
in the
least significant byte being
transferred over the most
significant side of the
data
bus, and
vice versa. The situation
will be exactly opposite when the
little-endian format
is used.
In this case, the least
significant byte will be transferred
over the least side of
the
data
bus. Now imagine a computer
using the little-endian
format exchanging data with
a
computer
using the big-endian format
over a 16-bit parallel port.
(this may be the
case
when we
have a network of different
types of computer, for
example). The
data
transmitted
by one will be received in a "swapped"
form by the other, eg., the
string
"UN" will be
received as "NU" and the string "IX" will
be received as "XI". So UNIX
changes
to NUXI --- hence the name NUXI problem.
Special software is used to
resolve
this
problem.
Variation
in the Implementation of the
Address Decoder
The
implementation of the address
decoder shown in Example
#1(lec24) assumes that
the
FALCON-A
does not allow the
use of some part of its
data bus during an I/O
(or
memory)
transfer. Another restriction
that was imposed by the assembler was
that all port
addresses
should be divisible by 2. This
implies that address line A0
will always be zero.
If the
FALCON-A architect had allowed
the use some of part of
its data bus (eg,
8-bits)
during a
transfer, the situation
would be different.
The
logic diagram shown in the
next figure is a 16-bit
parallel output port at the
same
address
(DEh) for the FALCON-A
assuming that part of its
data bus (D15..D8) or
(D7..D0)
can be used independently during an I/O
transfer. Note that the
enable inputs of
the
two 8-bit registers are not
connected together in this
case. Moreover, since the
16-bit
port
uses two addresses, address
line A0 will be at a logic 0 for
address DEh, and at a
Page
249
Last
Modified: 01-Nov-06
Advanced Computer
Architecture-CS501
________________________________________________________
logic 1
for address DFh. This
means that it cannot be used
at the input of the big
AND
gate.
So, A0 has been used in a
different position with the
two 2-input AND gates.
The
2-input
AND gate where A0 is applied
after inversion will generate a 1 at
its output when
A0 = 0.
Thus, this output will enable
the 8-bit register mapped on
the even address
DEh.
In case
of the other AND gate, A0 is
not inverted. So the
corresponding 8-bit register
will
be mapped
on the odd address DFh. The
input that became available
after removing A0
from
its old position can be used
for the IOW# control
signal. The rest of the
circuit is the
same as
it was in the previous
figure.
We can
understand from the above
discussion that the decisions
made at the time of
ISA
design
have a strong bearing on the
implementation details and the
working of the
computer.
Suppose we assume that the
assembler developer had decided
not to restrict
the
port addresses to even
values, then what will be
the implications?
As an
example, consider the
execution of the instruction
out
r2, 223 assuming
r2
contains
1234h. This is a 16-bit
transfer at address 223 (DFh) and 224
(E0h).
For
the output port (shown in
the first figure) where
the CPU does not
allow the use of
some
part of its data bus in a
transfer, none of the registers will be
enabled as a result of
this
instruction because the
output of the 8-input AND
gate will be a zero for
both
addresses
DFh and E0h. Thus, that
output port cannot be
used.
In the
second figure, where the
CPU has allowed to use a
portion of its data bus in an
I/O
transfer,
the register at the address
DEh will not be enabled. The
CPU will send the
high
data
byte(12h) to the register at
the address DFh (because it
will be enabled at that time
due to
the address DFh) over
data lines D7...D0. The
fact that data lines
D7...D0 should
be used
for the transfer of high
byte, will be taken care of by
the hardware, internal to
the
CPU.
Now
the question is where the
low data byte (i.e.
34h) present at D15...D8 data
lines
would be
placed? If there exists an
output port at address E0h
in the system, then 34h
will
be placed
there (in
the next bus cycle), otherwise
it will be lost. Again, it is the
CPU's
Page
250
Last
Modified: 01-Nov-06
Advanced Computer
Architecture-CS501
________________________________________________________
responsibility
to check whether the next
address in the system exists
or not and if exists
then
enable that port so that the
low byte of data can be placed
there.
A possible
option for the architect in
this case would be to
revisit the design steps
and
allow
the use of part of the
CPU registers (or at least for
some of them) for I/O
transfers.
The
logic diagram shown below
shows an 8-bit parallel
output port at address FEF2h
of
the
Pentium's I/O address space. Since
the Pentium allows the
use of some part of
its
data bus
during a transfer, we can use
the BE2# signal in the
address decoder to enable
the
8-bit register. The
following instructions will access
this output port.
mov
dx, 0FEF2h
mov
al, 12h
out
dx, al
The
Pentium does
allow
the use of some part of
its 32-bit accumulator
register EAX. In
case
only 8-bits are to be
transferred, register AL can be used, as
shown in the program
fragment
above. The data byte
12h will be sent to the 8-bit
register over lines
D23..D16.
Since 12h
corresponds to 0001 0010 in binary, this will
cause the LEDs L4 and L1 to
turn
on.
Example
# 3
Problem
statement:
Write an
assembly language program to
turn on the 16 LEDs one by one on
the output
port of
Example #1(lec24). Each LED
should stay on for a
noticeable duration of
time.
Repeat
from the first LED
after the last LED is
turned on.
Solution:
Page
251
Last
Modified: 01-Nov-06
Advanced Computer
Architecture-CS501
________________________________________________________
The
solution is shown in the
text box with a filename:
Example_3.asmfa. The working
of
this
program is explained
below:
The
first two instructions turn
all the LEDs off by
sending a 0 to each bit of
the output
port at
address 222.
mov
r1,0
out
r1,222
Then a 1
is sent to L0 causing it to turn on, and
the program enters a loop
which executes
15 times
to cause the other
LEDs
(L1
through L15) to turn on, one
by ; filename:
Example_3.asmfa
one in
sequence. Register r5 is ;
being
used as loop counter. The
;ALL LEDS ARE turned Off
initially
instructions
;
following
three
movi
r1,0
introduce
a
delay
between
out
r1,222
successive
bit patterns sent to
the
output
port, so that each LED stays
;
;First
LED will be turned on each
time
on for a
noticeable duration of
time.
;
delay1:movi
r2,0
start:
movi r1,1
again1:subi
r2,r2,1
out
r1,222
jnz
r2,[again1]
Starting
with a value of 0 in r2 14, ;
movi
r5,15
this
value is decremented to FFFFh
when
the again1 loop is entered.
;
The
jnz
instruction
will cause r2 to ;DELAY
LOOP
decrement
again and again; thereby ;
executing
the loop 65,535 times. An
delay1: movi
r2,0
estimate of
the delay interval is
again1: subi
r2,r2,1
jnz
r2, [again1]
presented
at the end of this
section.
After
this delay, all the
LEDs are ;
movi
r3,0
; TURN
OFF ALL LEDS
turned
off, and a second delay
loop
out
r3,222
executes.
Finally, the next LED
on
the
left, in sequence, is turned on by
;
delay2:
movi r2,0
the
following two
instructions:
again2:
subi r2,r2,1
shiftl
r1,r1,1
jnz
r2, [again2]
out
r1, 222
After
the left most LED is
turned ;
shiftl
r1,r1,1 ; next LED
ON
on,
the process starts all over
again
out
r1,222
because
of the last jump
subi
r5,r5,1
instruction.
The outermost loop
jnz
r5, [delay1]
executes
indefinitely.
jump
[start]
halt
Estimating
the
Delay
Interval
14
this is
necessary because the immediate
operand with the movi
instruction of
the FALCON-A has a
range of
0h to FFh. This will not give us
the large loop counter
that we need here. So we use
the above
software
trick. An alternate way
would be to use nested
loops, but that will tie up
additional CPU
registers.
Page
252
Last
Modified: 01-Nov-06
Advanced Computer
Architecture-CS501
________________________________________________________
To make
things simple, assume that
the FALCON-A is operating at a
clock frequency of
1 MHz.
Also, assume that the
subi
and
the jnz
instructions
take 3 and 4 clock periods,
respectively,
to execute. Since these two
instructions execute 65,535
times each, we can
use
the following formula to
compute the execution time
of this loop:
ET = CPI
x IC x T = CPI x IC / f
where
CPI
= clocks per instruction
IC
=
instruction count
T
= time
period of the clock,
and
f
=
frequency of the
clock.
Using
the assumed values, we
get
ET =
(3+4) x 65535 / (1x106 ) = 0.
459
sec
Since the
movi
r2, 0 instruction
executes
only
once, the time it takes to
execute is
negligible
and has been ignored in
this
calculation.
Page
253
Last
Modified: 01-Nov-06
Table of Contents:
|
|||||