ZeePedia

EVALUATION

<< DECIDE: A FRAMEWORK TO GUIDE EVALUATION
EVALUATION: SCENE FROM A MALL, WEB NAVIGATION >>
img
Human Computer Interaction (CS408)
VU
Lecture
31
Lecture 31. Evaluation ­ Part VII
Learning Goals
The aim of this lecture is to understand how to perform evaluation through usability
testing.
What is Usability Testing?
While there can be wide variations in where and how you conduct a usability test,
every usability test shares these five characteristics:
1. The primary goal is to improve the usability of a product. For each test, you also
have more specific goals and concerns that you articulate when planning the test.
2. The participants represent real users.
3. The participants do real tasks.
4. You observe and record what participants do and say.
5. You analyze the data, diagnose the real problems, and recommend changes to fix
those problems.
The Goal is to Improve the Usability of a Product
The primary goal of a usability test is to improve the usability of the product that is
being tested. Another goal, as we will discuss in detail later, is to improve the process
by which products are designed and developed, so that you avoid having the same
problems again in other products.
This characteristic distinguishes a usability test from a research study, in which the
goal is to investigate the existence of some phenomenon. Although the same facility
might be used for both, they have different purposes. This characteristic also
distinguishes a usability test from a quality assurance or function test, which has a
goal of assessing whether the product works according to its specifications.
Within the general goal of improving the product, you wilI have more specific goals
and concerns that differ from one test to another.
You might be particularly concerned about how easy it is for users to navigate
through the menus. You could test that concern before coding the product, by creating
an interactive prototype of the menus, or by giving users paper versions of each
screen.
You might be particularly concerned about whether the interface that you have
developed for novice users will also be easy for and acceptable to experienced users.
For one test, you might be concerned about how easily the customer representatives
who do installations will be able to install the product. For another test, you might be
concerned about how easily the client's nontechnical staff will be able to operate and
maintain the product.
285
img
Human Computer Interaction (CS408)
VU
These more specific goals and concerns help determine which users are appropriate
participants for each test and which tasks are appropriate to have them do during the
test.
The Participants Represent Real Users
The people who come to test the product must be members of the group of people
who now use or who will use the product. A test that uses programmers when the
product is intended for legal secretaries is not a usability test.
The quality assurance people who conduct function tests may also find usability
problems, and the problems they find should not be ignored, but they are not
conducting a usability test. They are not real users-unless it is a product about
function testing. They are acting more like expert reviewers.
If the participants are more experienced than actual users, you may miss problems that
will cause the product to fail in the marketplace. If the participants are less
experienced than actual users, you may be led to make changes that aren't
improvements for the real users.
The Participants Do Real Tasks
The tasks that you have users do in the test must be ones that they will do with the
product on their jobs or in their homes. This means that you have to understand users'
jobs and the tasks for which this product is relevant.
In many usability tests, particularly of functionally rich and complex software
products, you can only test some of the many tasks that users will be able to do with
the product. In addition to being realistic and relevant for users, the tasks that you
include in a test should relate to your goals and concerns and have a high probability
of uncovering a usability problem.
Observe and Record What the Participants Do and Say
In a usability test, you usually have several people come, one at a time, to work with
the product. You observe the participant, recording both performance and comments.
You also ask the participant for opinions about the product. A usability test includes
both times when participants are doing tasks with the product and times when they are
filling out questionnaires about the product.
Observing and recording individual participant's behaviors distinguishes a usability
test from focus groups, surveys, and beta testing.
A typical focus group is a discussion among 8 to 10 real users, led by a professional
moderator. Focus groups provide information about users' opinions, attitudes,
preferences, and their self-report about their performance, but focus groups do not
usually let you see how users actually behave with the product.
Surveys, by telephone or mail, let you collect information about users' opinions,
attitudes, preferences, and their self-report of behavior, but you cannot use a survey to
observe and record what users actually do with a product.
A typical beta test (field test, clinical trial, user acceptance test) is an early release of a
product to a few users. A beta test has ecological validity, that is, real people are using
the product in real environments to do real tasks. However, beta testing seldom yields
any useful information about usability. Most companies have found beta testing to be
too little, too unsystematic, and much too late to be the primary test of usability.
286
img
Human Computer Interaction (CS408)
VU
Analyze the Data, Diagnose the Real Problems, and Recommend Changes to Fix
Those Problems
Collecting the data is necessary, but not sufficient, for a usability test. After the test
itself, you still need to analyze the data. You consider the quantitative and qualitative
data from the participants together with your own observations and users' comments.
You use all of that to diagnose and document the product's usability problems and to
recommend solutions to those problems.
The Results Are Used to Change the Product - and the Process
We would also add another point. It may not be part of the definition of the usability
test itself, as the previous five points were, but it is crucial, nonetheless.
A usability test is not successful if it is used only to mark off a milestone on the
development schedule. A usability test is successful only if it helps to improve the
product that was tested and the process by which it was developed.
What Is Not Required for a Usability Test?
Our definition leaves out some features you may have been expecting
to see, such as:
a laboratory with one-way mirror
·
data-logging software
·
videotape
·
a formal test report
·
Each of these is useful, but not necessary, for a successful usability test. For example,
a memorandum of findings and recommendations or a meeting about the test results,
rather than a formal test report, may be appropriate in your situation.
Each of these features has advantages in usability testing that we discuss in detail
later, but none is an absolute requirement. Throughout the book, we discuss methods
that you can use when you have only a shoestring budget, limited staff, and limited
testing equipment.
When is a Usability Test Appropriate?
Nothing in our definition of a usability test limits it to a single, summative test at the
end of a project. The five points in our definition are relevant no matter where you are
in the design and development process. They apply to both informal and formal
testing. When testing a prototype, you may have fewer participants and fewer tasks,
take fewer measures, and have a less formal reporting procedure than in a later test,
but the critical factors we outline here and the general process we describe in this
book still apply. Usability testing is appropriate iteratively from predesign (test a
similar product or earlier version), through early design (test prototypes), and
throughout development (test different aspects, retest changes).
Questions that Remain in Defining Usability Testing
We recognize that our definition of usability testing still has some fuzzy edges.
·  Would a test with only one participant be called a usability test? Probably not.
You probably need at least two or three people representing a subgroup of
users to feel comfortable that you are not seeing idiosyncratic behavior.
287
img
Human Computer Interaction (CS408)
VU
Would a test in which there were no quantitative measures qualify as a
·
usability test? Probably not. To substantiate the problems that you report, we
assume that you will take at least some basic measures, such as number of
participants who had the problem, or number of wrong choices, or time to
complete a task. The actual measures will depend on your specific concerns
and the stage of design or development at which you are testing. The measures
could come from observations, from recording with a data-logging program,
or from a review of the videotape after the test. The issue is not which
measures or how you collect them, but whether you need to have some
quantitative data to have a usability test.
Usability testing is still a relatively new development; its definition is still emerging.
You may have other questions about what counts as a usability test. Our discussion of
usability testing and of other usability engineering methods, in this chapter and the
next three chapters, may help clarify your own thinking about how to define usability
testing.
Testing Applies to All Types of Products
If you read the literature on usability testing, you might think that it is
only about testing software for personal computers. Not so. Usability testing works
for all types of products. In the last several years, we've been involved in usability
testing of all these products:
Consumer products
Regular TVs
High-definition
TVs
VCRs
Cordless telephones
Telephone/answering machines
Business telephones
Medical products
Bedside terminal
Anesthesiologist's workstation
Patient monitor
Blood gas analyzer
Integrated communication system for wards
Nurse's workstation for intensive care units
Engineering devices
Digital oscilloscope
Network protocol analyzer (for maintaining computer networks)
Application software for microcomputers, minicomputers,
and mainframes
Electronic mail
Database management software
Spreadsheets Time management software
Compilers and debuggers for programming languages Operating system software
288
img
Human Computer Interaction (CS408)
VU
Other
Voice response systems (menus on the telephone)
Automobile navigation systems (in-car information about how to
get where you want to go)
The procedures for the test may vary somewhat depending on what you are testing
and the questions you are asking. We give you hints and tips, where appropriate, on
special concerns when you are focusing the testing on hardware or documentation;
but, in general, we don't find that you need to change the approach much at all.
Most of the examples in this book are about testing some type of hardware or
software and the documentation that goes with it. In some cases, the hardware used to
be just a machine and is now a special purpose computer. For usability testing,
however, the product doesn't even have to involve any hardware or software. You can
use the techniques in this book to develop usable
. application or reporting forms
. instructions for noncomputer products, like bicycles . interviewing techniques
. nonautomated procedures
. questionnaires
Testing All Types of Interfaces
Any product that people have to use, whether it is computer-based or not, has a user
interface. Norman in his marvelous book, The Design of Everyday Things (1988)
points out problems with doors, showers, light switches, coffee pots, and many other
objects that we come into contact with in our daily lives. With creativity, you can plan
a test of any type of interface.
Consider an elevator. The buttons in the elevator are an interface- the way that you,
the user, talk to the computer that now drives the machine. Have you ever been
frustrated by the way the buttons in an elevator are arranged? Do you search for the
one you want? Do you press the wrong one by mistake?
You might ask: How could you test the interface to an elevator in a usability
laboratory? How could the developers find the problems with an elevator interface
before building the elevator-at which point it would be too expensive to change?
In fact, an elevator interface could be tested before it is built. You could create a
simulation of the proposed control panel on a touchscreen computer (a prototype).
You could even program the computer to make the alarm sound and to make the
doors seem to open and close, based on which buttons users touch. Then you could
bring in users one at a time, give them realistic situations, and have them use the
touchscreen as they would the panel in the elevator.
Testing All Parts of the Product
Depending on where in the development process you are and what you are
particularly concerned about, you may want to focus the usability test on a specific
part of the product, such as
. installing hardware
. operating hardware
. cleaning and maintaining hardware
289
img
Human Computer Interaction (CS408)
VU
. understanding messages about the hardware
. installing software
. navigating through menus
. filling out fields
. recovering from errors
. learning from online or printed tutorials
. finding and following instructions in a user's guide . finding and following
instructions in the on line help
Testing Different Aspects of the Documentation
When you include documentation in the test, you have to decide if you are more
interested in whether users go to the documentation or in how well the documentation
works for them when they do go to it. It is difficult to get answers to both of those
concerns at the same time.
If you want to find out how much people learn from a tutorial when they use it, you
can set up a test in which you ask people to go through the tutorial. Your test
paticipants will do as you ask, and you will get useful information about the design,
content, organization, and language of the tutorial.
You will, however, not have any indication of whether anyone will actually open the
tutorial when they get the product. To test that, you have to set up your test
differently.
Instead of instructing people to use the tutorial, you have to give them tasks and let
them know the tutorial is available. In this second type of test, you will find out which
types of users are likely to try the tutorial, but if few participants use it, you won't get
much useful information for revising the tutorial.
Giving people instructions that encourage them to use the manual or tutorial may be
unrealistic in terms of what happens in the world outside the test laboratory, but it is
necessary if your concern is the usability of the documentation. At some point in the
process of developing the product, you should be testing the usability of the various
types of documentation that users will get with the product.
At other points, however, you should be testing the usability of the product in the
situation in which most people will receive it. Here's an example:
A major company was planning to put a new software product on its internal network.
The product has online help and a printed manual, but, in reality, few users will get a
copy of the manual.
The company planned to maintain a help desk, and a major concern for the usability
test was that if people don't get the manual, they would have to use the online help,
call the help desk, or ask a co-worker. The company wanted to keep calls to the help
desk to a minimum, and the testers knew that when one worker asks another for help,
two people are being unproductive for the company.
When they tested the product, therefore, this test team did not include the manual.
Participants were told that the product includes online help, and they were given the
phone number of the help desk to call if they were really stuck. The test team focused
on where people got stuck, how helpful the online help was, and at what points people
called the help desk.
290
img
Human Computer Interaction (CS408)
VU
This test gave the product team a lot of information to improve the interface and the
online help to satisfy the concern that drove the test. However, this test yielded no
information to improve the printed manual. That would require a different test.
Testing with Different Techniques
In most usability tests, you have one participant at a time working with the product.
You usually leave that person alone and observe from a corner of the room or from
behind a one-way mirror. You intervene only when the person "calls the help desk,"
which you record as a need for assistance.
You do it this way because you want to simulate what will happen when individual.
users get the products in their offices or homes. They'll be working on their own, and
you won't be right there in their rooms to help them.
Sometimes, however, you may want to change these techniques. Two ideas that many
teams have found useful are:
. co-discovery, having two participants work together
. active intervention, taking a more active role in the test
Co-discovery
Co-discovery is a technique in which you have two participants work together to
perform the tasks (Kennedy, 1989). You encourage the participants to talk to each
other as they work.
Talking to another person is more natural than thinking out loud alone. Thus, co-
discovery tests often yield more information about what the users are thinking and
what strategies they are using to solve their problems than you get by asking
individual participants to think out loud.
Hackman and Biers (1992) have investigated this technique. They confirmed that co-
discovery participants make useful comments that provide insight into the design.
They also found that having two people work together does not distort other results.
Participants who worked together did not differ in their performance or preferences
from participants who worked alone.
Co-discovery is more expensive than single participant testing, because you have to
pay two people for each session. In addition, it may be more difficult to watch two
people working with each other and the product than to watch just one person at a
time. Co-discovery may be used anytime you conduct a usability test, but it is
especially useful early in design because of the insights that the participants provide
as they talk with each other.
Active Intervention
Active intervention is a technique in which a member of the test team sits in the room
with the participant and actively probes the participant's understanding of whatever is
being tested. For example, you might ask participants to explain what they would do
next and why as they work through a task. When they choose a particular menu
option, you might ask them to describe their understanding of the menu structure at
that moment. By asking probing questions throughout the test, rather than in one
interview at the end, you can get insights into participants' evolving mental model of
the product.
291
img
Human Computer Interaction (CS408)
VU
You can get a better understanding of problems that participants are having than by
just watching them and hoping they'll think out loud.
Active intervention is particularly useful early in design. It is an
excellent technique to use with prototypes, because it provides a wealth of diagnostic
information. It is not the technique to use, however, if your primary concern is to
measure time to complete tasks or to find out how often users will call the help desk.
To do a useful active intervention test, you have to define your
goals and concerns, plan the questions you will use as probes, and be careful not to
bias participants by asking leading questions.
Additional Benefits of Usability Testing
Usability testing contributes to all the benefits of focusing on usability that we gave in
Chapter 1. In addition, the process of usability testing has two specific benefits that
may not be as strong or obvious from other usability techniques. Usability testing
helps
. change people's attitudes about users
. change the design and development process
Changing People's Attitudes About Users
Watching users is both inspiring and humbling. Even after watching hundreds of
people participate in usability tests, we are still amazed at the insights they give us
about the assumptions we make.
When designers, developers, writers, and managers attend a usability test or watch
videotapes from a usability test for the first time, there is often a dramatic
transformation in the way that they view users and usability issues. Watching just a
few people struggle with a product has a much greater impact on attitudes than many
hours of discussion about the importance of usability or of understanding users.
After an initial refusal to believe that the users in the test really do represent the
people for whom the product is meant, many observers become instant converts to
usability. They become interested not only in changing this product, but in improving
all future products, and in bringing this and other products back for more testing.
Changing the Design and Development Process
In addition to helping to improve a specific product, usability testing can help
improve the process that an organization uses to design and develop products (Dumas,
1989). The specific instances that you see in a usability test are most often symptoms
of broader and deeper global problems with both the product and the process.
Comparing Usability Testing to Beta Testing
Despite the surge in interest in usability testing, many companies still do not think
about usability until the product is almost ready to be
released. Their usability approach is to give some customers an early-release (almost
ready) version of the product and wait for feedback. Depending on the industry and
situation, these early¬
release trials may be called beta testing, field testing, clinical trials, or user acceptance
testing.
In beta testing, real users do real tasks in their real environments. However, many
companies find that they get very little feedback from beta testers, and beta testing
seldom yields useful information about usability problems for these reasons:
. The beta test site does not even have to use the product.
292
img
Human Computer Interaction (CS408)
VU
. The feedback is unsystematic. Users may report-after the fact-what they remember
and choose to report. They may get so busy that they forget to report even when
things go wrong.
. In most cases, no one observes the beta test users and records their behavior.
Because users are focused on doing their work, not on testing the product, they may
not be able to recall the actions they took that resulted in the problems. In a usability
test, you get to see the actions, hear the users talk as they do the actions, and record
the actions on videotape so that you can go back later and review them, if you aren't
sure what the user did.
. In a beta test, you do not choose the tasks. The tasks that get tested are whatever
users happen to do in the time they are working with the product. A situation that you
are concerned about may not arise. Even if it does arise, you may not hear about it. In
a usability test, you choose the tasks that participants do with the product. That way,
you can be sure that you get information about aspects of the product that relate to
your goals and concerns. That way, you also get comparable data across participants.
If beta testers do try the product and have major problems that keep them from
completing their work, they may report those problems. The unwanted by-product of
that situation, however, may be embarrassment at having released a product with
major problems, even to beta testers.
Even though beta testers know that they are working with an unfinished and possibly
buggy product, they may be using it to do real work where problems may have serious
consequences. They want to do their work easily and effectively. Your company's
reputation and sales may suffer if beta testers find the product frustrating to use. A
bad experience when beta testing your product may make the beta testers less willing
to buy the product and less willing to consider other products from your company.
You can improve the chances of getting useful information from beta test sites. Some
companies include observations and interviews with beta testing, going out to visit
beta test sites after people have been working with the product for a while. Another
idea would be to give tape recorders to selected people at beta test sites and ask them
to talk on tape while they use the product or to record observations and problems as
they occur.
Even these techniques, however, won't overcome the most significant disadvantage of
beta testing-that it comes too late in the process. Beta testing typically takes place
only very close to the end of development, with a fully coded product. Critical
functional bugs may get fixed after beta testing, but time and money generally mean
that usability problems can't be addressed.
Usability testing, unlike beta testing, can be done throughout the design and
development process. You can observe and record users as they work with prototypes
and partially developed products. People are more tolerant of the fact that the product
is still under development when they come to a usability test than when they beta test
it. If you follow the usability engineering approach, you can do usability testing early
enough to change the product-and retest the changes.
293
Table of Contents:
  1. RIDDLES FOR THE INFORMATION AGE, ROLE OF HCI
  2. DEFINITION OF HCI, REASONS OF NON-BRIGHT ASPECTS, SOFTWARE APARTHEID
  3. AN INDUSTRY IN DENIAL, SUCCESS CRITERIA IN THE NEW ECONOMY
  4. GOALS & EVOLUTION OF HUMAN COMPUTER INTERACTION
  5. DISCIPLINE OF HUMAN COMPUTER INTERACTION
  6. COGNITIVE FRAMEWORKS: MODES OF COGNITION, HUMAN PROCESSOR MODEL, GOMS
  7. HUMAN INPUT-OUTPUT CHANNELS, VISUAL PERCEPTION
  8. COLOR THEORY, STEREOPSIS, READING, HEARING, TOUCH, MOVEMENT
  9. COGNITIVE PROCESS: ATTENTION, MEMORY, REVISED MEMORY MODEL
  10. COGNITIVE PROCESSES: LEARNING, READING, SPEAKING, LISTENING, PROBLEM SOLVING, PLANNING, REASONING, DECISION-MAKING
  11. THE PSYCHOLOGY OF ACTIONS: MENTAL MODEL, ERRORS
  12. DESIGN PRINCIPLES:
  13. THE COMPUTER: INPUT DEVICES, TEXT ENTRY DEVICES, POSITIONING, POINTING AND DRAWING
  14. INTERACTION: THE TERMS OF INTERACTION, DONALD NORMAN’S MODEL
  15. INTERACTION PARADIGMS: THE WIMP INTERFACES, INTERACTION PARADIGMS
  16. HCI PROCESS AND MODELS
  17. HCI PROCESS AND METHODOLOGIES: LIFECYCLE MODELS IN HCI
  18. GOAL-DIRECTED DESIGN METHODOLOGIES: A PROCESS OVERVIEW, TYPES OF USERS
  19. USER RESEARCH: TYPES OF QUALITATIVE RESEARCH, ETHNOGRAPHIC INTERVIEWS
  20. USER-CENTERED APPROACH, ETHNOGRAPHY FRAMEWORK
  21. USER RESEARCH IN DEPTH
  22. USER MODELING: PERSONAS, GOALS, CONSTRUCTING PERSONAS
  23. REQUIREMENTS: NARRATIVE AS A DESIGN TOOL, ENVISIONING SOLUTIONS WITH PERSONA-BASED DESIGN
  24. FRAMEWORK AND REFINEMENTS: DEFINING THE INTERACTION FRAMEWORK, PROTOTYPING
  25. DESIGN SYNTHESIS: INTERACTION DESIGN PRINCIPLES, PATTERNS, IMPERATIVES
  26. BEHAVIOR & FORM: SOFTWARE POSTURE, POSTURES FOR THE DESKTOP
  27. POSTURES FOR THE WEB, WEB PORTALS, POSTURES FOR OTHER PLATFORMS, FLOW AND TRANSPARENCY, ORCHESTRATION
  28. BEHAVIOR & FORM: ELIMINATING EXCISE, NAVIGATION AND INFLECTION
  29. EVALUATION PARADIGMS AND TECHNIQUES
  30. DECIDE: A FRAMEWORK TO GUIDE EVALUATION
  31. EVALUATION
  32. EVALUATION: SCENE FROM A MALL, WEB NAVIGATION
  33. EVALUATION: TRY THE TRUNK TEST
  34. EVALUATION – PART VI
  35. THE RELATIONSHIP BETWEEN EVALUATION AND USABILITY
  36. BEHAVIOR & FORM: UNDERSTANDING UNDO, TYPES AND VARIANTS, INCREMENTAL AND PROCEDURAL ACTIONS
  37. UNIFIED DOCUMENT MANAGEMENT, CREATING A MILESTONE COPY OF THE DOCUMENT
  38. DESIGNING LOOK AND FEEL, PRINCIPLES OF VISUAL INTERFACE DESIGN
  39. PRINCIPLES OF VISUAL INFORMATION DESIGN, USE OF TEXT AND COLOR IN VISUAL INTERFACES
  40. OBSERVING USER: WHAT AND WHEN HOW TO OBSERVE, DATA COLLECTION
  41. ASKING USERS: INTERVIEWS, QUESTIONNAIRES, WALKTHROUGHS
  42. COMMUNICATING USERS: ELIMINATING ERRORS, POSITIVE FEEDBACK, NOTIFYING AND CONFIRMING
  43. INFORMATION RETRIEVAL: AUDIBLE FEEDBACK, OTHER COMMUNICATION WITH USERS, IMPROVING DATA RETRIEVAL
  44. EMERGING PARADIGMS, ACCESSIBILITY
  45. WEARABLE COMPUTING, TANGIBLE BITS, ATTENTIVE ENVIRONMENTS