EVALUATION

<< DECIDE: A FRAMEWORK TO GUIDE EVALUATION

EVALUATION: SCENE FROM A MALL, WEB NAVIGATION >>

Human Computer Interaction (CS408)

Lecture

Lecture 31. Evaluation Part VII

Learning Goals

The aim of this lecture is to understand how to perform evaluation through usability

testing.

What is Usability Testing?

While there can be wide variations in where and how you conduct a usability test,

every usability test shares these five characteristics:

1. The primary goal is to improve the usability of a product. For each test, you also

have more specific goals and concerns that you articulate when planning the test.

2. The participants represent real users.

3. The participants do real tasks.

4. You observe and record what participants do and say.

5. You analyze the data, diagnose the real problems, and recommend changes to fix

those problems.

The Goal is to Improve the Usability of a Product

The primary goal of a usability test is to improve the usability of the product that is

being tested. Another goal, as we will discuss in detail later, is to improve the process

by which products are designed and developed, so that you avoid having the same

problems again in other products.

This characteristic distinguishes a usability test from a research study, in which the

goal is to investigate the existence of some phenomenon. Although the same facility

might be used for both, they have different purposes. This characteristic also

distinguishes a usability test from a quality assurance or function test, which has a

goal of assessing whether the product works according to its specifications.

Within the general goal of improving the product, you wilI have more specific goals

and concerns that differ from one test to another.

You might be particularly concerned about how easy it is for users to navigate

through the menus. You could test that concern before coding the product, by creating

an interactive prototype of the menus, or by giving users paper versions of each

screen.

You might be particularly concerned about whether the interface that you have

developed for novice users will also be easy for and acceptable to experienced users.

For one test, you might be concerned about how easily the customer representatives

who do installations will be able to install the product. For another test, you might be

concerned about how easily the client's nontechnical staff will be able to operate and

maintain the product.

285

Human Computer Interaction (CS408)

These more specific goals and concerns help determine which users are appropriate

participants for each test and which tasks are appropriate to have them do during the

test.

The Participants Represent Real Users

The people who come to test the product must be members of the group of people

who now use or who will use the product. A test that uses programmers when the

product is intended for legal secretaries is not a usability test.

The quality assurance people who conduct function tests may also find usability

problems, and the problems they find should not be ignored, but they are not

conducting a usability test. They are not real users-unless it is a product about

function testing. They are acting more like expert reviewers.

If the participants are more experienced than actual users, you may miss problems that

will cause the product to fail in the marketplace. If the participants are less

experienced than actual users, you may be led to make changes that aren't

improvements for the real users.

The Participants Do Real Tasks

The tasks that you have users do in the test must be ones that they will do with the

product on their jobs or in their homes. This means that you have to understand users'

jobs and the tasks for which this product is relevant.

In many usability tests, particularly of functionally rich and complex software

products, you can only test some of the many tasks that users will be able to do with

the product. In addition to being realistic and relevant for users, the tasks that you

include in a test should relate to your goals and concerns and have a high probability

of uncovering a usability problem.

Observe and Record What the Participants Do and Say

In a usability test, you usually have several people come, one at a time, to work with

the product. You observe the participant, recording both performance and comments.

You also ask the participant for opinions about the product. A usability test includes

both times when participants are doing tasks with the product and times when they are

filling out questionnaires about the product.

Observing and recording individual participant's behaviors distinguishes a usability

test from focus groups, surveys, and beta testing.

A typical focus group is a discussion among 8 to 10 real users, led by a professional

moderator. Focus groups provide information about users' opinions, attitudes,

preferences, and their self-report about their performance, but focus groups do not

usually let you see how users actually behave with the product.

Surveys, by telephone or mail, let you collect information about users' opinions,

attitudes, preferences, and their self-report of behavior, but you cannot use a survey to

observe and record what users actually do with a product.

A typical beta test (field test, clinical trial, user acceptance test) is an early release of a

product to a few users. A beta test has ecological validity, that is, real people are using

the product in real environments to do real tasks. However, beta testing seldom yields

any useful information about usability. Most companies have found beta testing to be

too little, too unsystematic, and much too late to be the primary test of usability.

286

Human Computer Interaction (CS408)

Analyze the Data, Diagnose the Real Problems, and Recommend Changes to Fix

Those Problems

Collecting the data is necessary, but not sufficient, for a usability test. After the test

itself, you still need to analyze the data. You consider the quantitative and qualitative

data from the participants together with your own observations and users' comments.

You use all of that to diagnose and document the product's usability problems and to

recommend solutions to those problems.

The Results Are Used to Change the Product - and the Process

We would also add another point. It may not be part of the definition of the usability

test itself, as the previous five points were, but it is crucial, nonetheless.

A usability test is not successful if it is used only to mark off a milestone on the

development schedule. A usability test is successful only if it helps to improve the

product that was tested and the process by which it was developed.

What Is Not Required for a Usability Test?

Our definition leaves out some features you may have been expecting

to see, such as:

a laboratory with one-way mirror

�

data-logging software

�

videotape

�

a formal test report

�

Each of these is useful, but not necessary, for a successful usability test. For example,

a memorandum of findings and recommendations or a meeting about the test results,

rather than a formal test report, may be appropriate in your situation.

Each of these features has advantages in usability testing that we discuss in detail

later, but none is an absolute requirement. Throughout the book, we discuss methods

that you can use when you have only a shoestring budget, limited staff, and limited

testing equipment.

When is a Usability Test Appropriate?

Nothing in our definition of a usability test limits it to a single, summative test at the

end of a project. The five points in our definition are relevant no matter where you are

in the design and development process. They apply to both informal and formal

testing. When testing a prototype, you may have fewer participants and fewer tasks,

take fewer measures, and have a less formal reporting procedure than in a later test,

but the critical factors we outline here and the general process we describe in this

book still apply. Usability testing is appropriate iteratively from predesign (test a

similar product or earlier version), through early design (test prototypes), and

throughout development (test different aspects, retest changes).

Questions that Remain in Defining Usability Testing

We recognize that our definition of usability testing still has some fuzzy edges.

� Would a test with only one participant be called a usability test? Probably not.

You probably need at least two or three people representing a subgroup of

users to feel comfortable that you are not seeing idiosyncratic behavior.

287

Human Computer Interaction (CS408)

Would a test in which there were no quantitative measures qualify as a

�

usability test? Probably not. To substantiate the problems that you report, we

assume that you will take at least some basic measures, such as number of

participants who had the problem, or number of wrong choices, or time to

complete a task. The actual measures will depend on your specific concerns

and the stage of design or development at which you are testing. The measures

could come from observations, from recording with a data-logging program,

or from a review of the videotape after the test. The issue is not which

measures or how you collect them, but whether you need to have some

quantitative data to have a usability test.

Usability testing is still a relatively new development; its definition is still emerging.

You may have other questions about what counts as a usability test. Our discussion of

usability testing and of other usability engineering methods, in this chapter and the

next three chapters, may help clarify your own thinking about how to define usability

testing.

Testing Applies to All Types of Products

If you read the literature on usability testing, you might think that it is

only about testing software for personal computers. Not so. Usability testing works

for all types of products. In the last several years, we've been involved in usability

testing of all these products:

Consumer products

Regular TVs

High-definition

TVs

VCRs

Cordless telephones

Telephone/answering machines

Business telephones

Medical products

Bedside terminal

Anesthesiologist's workstation

Patient monitor

Blood gas analyzer

Integrated communication system for wards

Nurse's workstation for intensive care units

Engineering devices

Digital oscilloscope

Network protocol analyzer (for maintaining computer networks)

Application software for microcomputers, minicomputers,

and mainframes

Electronic mail

Database management software

Spreadsheets Time management software

Compilers and debuggers for programming languages Operating system software

288

Human Computer Interaction (CS408)

Other

Voice response systems (menus on the telephone)

Automobile navigation systems (in-car information about how to

get where you want to go)

The procedures for the test may vary somewhat depending on what you are testing

and the questions you are asking. We give you hints and tips, where appropriate, on

special concerns when you are focusing the testing on hardware or documentation;

but, in general, we don't find that you need to change the approach much at all.

Most of the examples in this book are about testing some type of hardware or

software and the documentation that goes with it. In some cases, the hardware used to

be just a machine and is now a special purpose computer. For usability testing,

however, the product doesn't even have to involve any hardware or software. You can

use the techniques in this book to develop usable

. application or reporting forms

. instructions for noncomputer products, like bicycles . interviewing techniques

. nonautomated procedures

. questionnaires

Testing All Types of Interfaces

Any product that people have to use, whether it is computer-based or not, has a user

interface. Norman in his marvelous book, The Design of Everyday Things (1988)

points out problems with doors, showers, light switches, coffee pots, and many other

objects that we come into contact with in our daily lives. With creativity, you can plan

a test of any type of interface.

Consider an elevator. The buttons in the elevator are an interface- the way that you,

the user, talk to the computer that now drives the machine. Have you ever been

frustrated by the way the buttons in an elevator are arranged? Do you search for the

one you want? Do you press the wrong one by mistake?

You might ask: How could you test the interface to an elevator in a usability

laboratory? How could the developers find the problems with an elevator interface

before building the elevator-at which point it would be too expensive to change?

In fact, an elevator interface could be tested before it is built. You could create a

simulation of the proposed control panel on a touchscreen computer (a prototype).

You could even program the computer to make the alarm sound and to make the

doors seem to open and close, based on which buttons users touch. Then you could

bring in users one at a time, give them realistic situations, and have them use the

touchscreen as they would the panel in the elevator.

Testing All Parts of the Product

Depending on where in the development process you are and what you are

particularly concerned about, you may want to focus the usability test on a specific

part of the product, such as

. installing hardware

. operating hardware

. cleaning and maintaining hardware

289

Human Computer Interaction (CS408)

. understanding messages about the hardware

. installing software

. navigating through menus

. filling out fields

. recovering from errors

. learning from online or printed tutorials

. finding and following instructions in a user's guide . finding and following

instructions in the on line help

Testing Different Aspects of the Documentation

When you include documentation in the test, you have to decide if you are more

interested in whether users go to the documentation or in how well the documentation

works for them when they do go to it. It is difficult to get answers to both of those

concerns at the same time.

If you want to find out how much people learn from a tutorial when they use it, you

can set up a test in which you ask people to go through the tutorial. Your test

paticipants will do as you ask, and you will get useful information about the design,

content, organization, and language of the tutorial.

You will, however, not have any indication of whether anyone will actually open the

tutorial when they get the product. To test that, you have to set up your test

differently.

Instead of instructing people to use the tutorial, you have to give them tasks and let

them know the tutorial is available. In this second type of test, you will find out which

types of users are likely to try the tutorial, but if few participants use it, you won't get

much useful information for revising the tutorial.

Giving people instructions that encourage them to use the manual or tutorial may be

unrealistic in terms of what happens in the world outside the test laboratory, but it is

necessary if your concern is the usability of the documentation. At some point in the

process of developing the product, you should be testing the usability of the various

types of documentation that users will get with the product.

At other points, however, you should be testing the usability of the product in the

situation in which most people will receive it. Here's an example:

A major company was planning to put a new software product on its internal network.

The product has online help and a printed manual, but, in reality, few users will get a

copy of the manual.

The company planned to maintain a help desk, and a major concern for the usability

test was that if people don't get the manual, they would have to use the online help,

call the help desk, or ask a co-worker. The company wanted to keep calls to the help

desk to a minimum, and the testers knew that when one worker asks another for help,

two people are being unproductive for the company.

When they tested the product, therefore, this test team did not include the manual.

Participants were told that the product includes online help, and they were given the

phone number of the help desk to call if they were really stuck. The test team focused

on where people got stuck, how helpful the online help was, and at what points people

called the help desk.

290

Human Computer Interaction (CS408)

This test gave the product team a lot of information to improve the interface and the

online help to satisfy the concern that drove the test. However, this test yielded no

information to improve the printed manual. That would require a different test.

Testing with Different Techniques

In most usability tests, you have one participant at a time working with the product.

You usually leave that person alone and observe from a corner of the room or from

behind a one-way mirror. You intervene only when the person "calls the help desk,"

which you record as a need for assistance.

You do it this way because you want to simulate what will happen when individual.

users get the products in their offices or homes. They'll be working on their own, and

you won't be right there in their rooms to help them.

Sometimes, however, you may want to change these techniques. Two ideas that many

teams have found useful are:

. co-discovery, having two participants work together

. active intervention, taking a more active role in the test

Co-discovery

Co-discovery is a technique in which you have two participants work together to

perform the tasks (Kennedy, 1989). You encourage the participants to talk to each

other as they work.

Talking to another person is more natural than thinking out loud alone. Thus, co-

discovery tests often yield more information about what the users are thinking and

what strategies they are using to solve their problems than you get by asking

individual participants to think out loud.

Hackman and Biers (1992) have investigated this technique. They confirmed that co-

discovery participants make useful comments that provide insight into the design.

They also found that having two people work together does not distort other results.

Participants who worked together did not differ in their performance or preferences

from participants who worked alone.

Co-discovery is more expensive than single participant testing, because you have to

pay two people for each session. In addition, it may be more difficult to watch two

people working with each other and the product than to watch just one person at a

time. Co-discovery may be used anytime you conduct a usability test, but it is

especially useful early in design because of the insights that the participants provide

as they talk with each other.

Active Intervention

Active intervention is a technique in which a member of the test team sits in the room

with the participant and actively probes the participant's understanding of whatever is

being tested. For example, you might ask participants to explain what they would do

next and why as they work through a task. When they choose a particular menu

option, you might ask them to describe their understanding of the menu structure at

that moment. By asking probing questions throughout the test, rather than in one

interview at the end, you can get insights into participants' evolving mental model of

the product.

291

Human Computer Interaction (CS408)

You can get a better understanding of problems that participants are having than by

just watching them and hoping they'll think out loud.

Active intervention is particularly useful early in design. It is an

excellent technique to use with prototypes, because it provides a wealth of diagnostic

information. It is not the technique to use, however, if your primary concern is to

measure time to complete tasks or to find out how often users will call the help desk.

To do a useful active intervention test, you have to define your

goals and concerns, plan the questions you will use as probes, and be careful not to

bias participants by asking leading questions.

Additional Benefits of Usability Testing

Usability testing contributes to all the benefits of focusing on usability that we gave in

Chapter 1. In addition, the process of usability testing has two specific benefits that

may not be as strong or obvious from other usability techniques. Usability testing

helps

. change people's attitudes about users

. change the design and development process

Changing People's Attitudes About Users

Watching users is both inspiring and humbling. Even after watching hundreds of

people participate in usability tests, we are still amazed at the insights they give us

about the assumptions we make.

When designers, developers, writers, and managers attend a usability test or watch

videotapes from a usability test for the first time, there is often a dramatic

transformation in the way that they view users and usability issues. Watching just a

few people struggle with a product has a much greater impact on attitudes than many

hours of discussion about the importance of usability or of understanding users.

After an initial refusal to believe that the users in the test really do represent the

people for whom the product is meant, many observers become instant converts to

usability. They become interested not only in changing this product, but in improving

all future products, and in bringing this and other products back for more testing.

Changing the Design and Development Process

In addition to helping to improve a specific product, usability testing can help

improve the process that an organization uses to design and develop products (Dumas,

1989). The specific instances that you see in a usability test are most often symptoms

of broader and deeper global problems with both the product and the process.

Comparing Usability Testing to Beta Testing

Despite the surge in interest in usability testing, many companies still do not think

about usability until the product is almost ready to be

released. Their usability approach is to give some customers an early-release (almost

ready) version of the product and wait for feedback. Depending on the industry and

situation, these early�

release trials may be called beta testing, field testing, clinical trials, or user acceptance

testing.

In beta testing, real users do real tasks in their real environments. However, many

companies find that they get very little feedback from beta testers, and beta testing

seldom yields useful information about usability problems for these reasons:

. The beta test site does not even have to use the product.

292

Human Computer Interaction (CS408)

. The feedback is unsystematic. Users may report-after the fact-what they remember

and choose to report. They may get so busy that they forget to report even when

things go wrong.

. In most cases, no one observes the beta test users and records their behavior.

Because users are focused on doing their work, not on testing the product, they may

not be able to recall the actions they took that resulted in the problems. In a usability

test, you get to see the actions, hear the users talk as they do the actions, and record

the actions on videotape so that you can go back later and review them, if you aren't

sure what the user did.

. In a beta test, you do not choose the tasks. The tasks that get tested are whatever

users happen to do in the time they are working with the product. A situation that you

are concerned about may not arise. Even if it does arise, you may not hear about it. In

a usability test, you choose the tasks that participants do with the product. That way,

you can be sure that you get information about aspects of the product that relate to

your goals and concerns. That way, you also get comparable data across participants.

If beta testers do try the product and have major problems that keep them from

completing their work, they may report those problems. The unwanted by-product of

that situation, however, may be embarrassment at having released a product with

major problems, even to beta testers.

Even though beta testers know that they are working with an unfinished and possibly

buggy product, they may be using it to do real work where problems may have serious

consequences. They want to do their work easily and effectively. Your company's

reputation and sales may suffer if beta testers find the product frustrating to use. A

bad experience when beta testing your product may make the beta testers less willing

to buy the product and less willing to consider other products from your company.

You can improve the chances of getting useful information from beta test sites. Some

companies include observations and interviews with beta testing, going out to visit

beta test sites after people have been working with the product for a while. Another

idea would be to give tape recorders to selected people at beta test sites and ask them

to talk on tape while they use the product or to record observations and problems as

they occur.

Even these techniques, however, won't overcome the most significant disadvantage of

beta testing-that it comes too late in the process. Beta testing typically takes place

only very close to the end of development, with a fully coded product. Critical

functional bugs may get fixed after beta testing, but time and money generally mean

that usability problems can't be addressed.

Usability testing, unlike beta testing, can be done throughout the design and

development process. You can observe and record users as they work with prototypes

and partially developed products. People are more tolerant of the fact that the product

is still under development when they come to a usability test than when they beta test

it. If you follow the usability engineering approach, you can do usability testing early

enough to change the product-and retest the changes.

293

Table of Contents: