|
|||||
Human
Computer Interaction
(CS408)
VU
Lecture
30
Lecture
30. Evaluation
Part II
Learning
Goals
The
aim of this lecture is to
introduce you the study of
Human Computer
Interaction,
so that
after studying this you will
be able to:
· Understand
the DECIDE evaluation
framework
DECIDE: A
framework to guide evaluation
30.1
Well-planned
evaluations are driven by
clear goals and appropriate
questions (Basili
et al.,
1994). To guide our
evaluations we use the
DECIDE framework,
which
provides
the following checklist to
help novice
evaluators:
1.
Determine the overall goals
that
the evaluation
addresses.
2.
Explore the specific questions
to be
answered.
3. Choose
the evaluation
paradigm and
techniques
to answer
the questions.
4.
Identify the practical
issues that must
be addressed, such as selecting
participants.
5. Decide
how to deal with the
ethical
issues.
6.
Evaluate, interpret, and present
the data.
Determine
the goals
What
are the high-level goals of
the evaluation? Who wants it
and why? An
evaluation
to help clarify user needs
has different goals from an
evaluation to
determine
the best metaphor for a
conceptual design, or to fine-tune an
interface, or to
examine
how technology changes
working practices, or to inform
how the next
version
of a product should be
changed.
Goals
should guide an evaluation, so
determining what these goals
are is the first
step
in
planning an evaluation. For
example, we can restate the
general goal
statements
just
mentioned more clearly
as:
· Check
that the evaluators have
understood the users'
needs.
· Identify
the metaphor on which to
base the design.
· Check
to ensure that the final
interface is consistent.
· Investigate
the degree to which technology
influences working practices.
· Identify
how the interface of an
existing product could be
engineered to im-
prove
its usability.
These
goals influence the
evaluation approach, that
is, which evaluation
paradigm
guides
the study. For example,
engineering a user interface
involves a quantitative
engineering
style of working in which
measurements are used to
judge the quality of
the
interface. Hence usability
testing would be appropriate.
Exploring how
children
talk
together in order to see if an
innovative new groupware
product would help
them
to be
more engaged would probably be
better informed by a field
study.
279
Human
Computer Interaction
(CS408)
VU
Explore
the questions
In order
to make goals operational,
questions that must be answered to
satisfy them
have to
be identified. For example,
the goal of finding out
why many customers
prefer
to
purchase paper airline
tickets over the counter
rather than e-tickets can he
broken
down
into a number of relevant
questions for investigation.
What are customers'
attitudes
to these new tickets?
Perhaps they don't trust
the system and are
not sure that
they will
actually get on the flight
without a ticket in their
hand. Do customers
have
adequate
access to computers to make bookings?
Are they concerned about
security?
Does this
electronic system have a bad
reputation? Is the user
interface to the
ticketing
system so
poor that they can't
use it? Maybe very
few people managed to
complete
the
transaction.
Questions
can be broken down into
very specific sub-questions to make
the evaluation
even
more specific. For example,
what does it mean to ask, "Is
the user interface
poor?":
Is the system difficult to
navigate? Is the terminology
confusing because it is
inconsistent?
Is response time too slow?
Is the feedback confusing or
maybe
insufficient?
Sub-questions can, in turn, be further
decomposed into even
finer-
grained
questions, and so on.
Choose
the evaluation paradigm and
techniques
Having
identified the goals and
main questions, the next
step is to choose the
eval-
uation
paradigm and techniques. As
discussed in the previous
section, the
evaluation
paradigm
determines the kinds of
techniques that are used.
Practical and ethical
issues
(discussed
next) must also be considered and
trade-offs made. For example,
what
seems to
be the most appropriate set of
techniques may be too
expensive, or may
take
too
long, or may require
equipment or expertise that is
not available, so compromises
are
needed.
Identify
the practical issues
There
are many practical issues to
consider when doing any
kind of evaluation and
it
is
important to identify them before
starting.
Some issues that should be
considered
include
users, facilities and
equipment, schedules and
budgets, and
evaluators'
expertise.
Depending on the availability of
resources, compromises may
involve
adapting
or substituting techniques.
Users
It goes
without saying that a key
aspect of an evaluation is involving
appropriate
users.
For laboratory studies,
users must be found and
screened to ensure that
they
represent
the user population to which
the product is targeted. For
example, usability
tests
often need to involve users
with a particular level of
experience e.g., novices
or
experts,
or users with a range of expertise.
The number of men and women
within a
particular
age range, cultural
diversity, educational experience,
and personality
differences
may also need to be taken
into account, depending on
the kind of product
being
evaluated. In usability tests
participants are typically
screened to ensure
that
they meet
some predetermined characteristic.
For example, they might be
tested to
ensure
that they have attained a
certain skill level or fall
within a particular
demographic
range. Questionnaire surveys require
large numbers of participants so
ways of
identifying and reaching a
representative sample of participants are
needed.
280
Human
Computer Interaction
(CS408)
VU
For
field studies to be successful, an
appropriate and accessible
site must be found
where
the evaluator can work
with the users in their
natural setting.
Another
issue to consider is how the
users will be involved. The
tasks used in a
laboratory
study should be representative of
those for which the
product is de signed.
However,
there are no written rules
about the length of time
that a user should be
expected
to spend on an evaluation task.
Ten minutes is too short
for most tasks and
two
hours is a long time, but
what is reasonable? Task times will
vary according to
the
type of evaluation, but when
tasks go on for more than 20
minutes, consider
offering
breaks. It is accepted that people using
computers should stop, move
around
and
change their position regularly
after every 20 minutes spent at
the keyboard to
avoid
repetitive strain injury.
Evaluators also need to put
users at ease so they are
not
anxious
and will perform normally.
Even when users are paid to
participate, it is
important
to treat them courteously. At no
time should users be
treated
condescendingly
or made to feel uncomfortable when
they make mistakes.
Greeting
users,
explaining that it is the
system that is being tested
and not them, and
planning
an
activity to familiarize them
with the system before
starting the task all help
to put
users at
ease.
Facilities
and equipment
There
are many practical issues
concerned with using
equipment in an evaluation
For
example,
when using video you need to
think about how you will do
the recording:
how
many cameras and where do
you put them? Some
people are disturbed by
having
a camera
pointed at them and will not
perform normally, so how can
you avoid
making
them feel uncomfortable?
Spare film and batteries
may also be needed.
Schedule
and budget
constraints
Time
and budget constraints are
important considerations to keep in
mind. It might
seem
ideal to have 20 users test
your interface, but if you
need to pay them, then
it
could
get costly. Planning
evaluations that can be
completed on schedule is also
im-
portant,
particularly in commercial settings.
There is never enough time
to do
evaluations
as you would ideally like,
so you have to compromise
and plan to do a
good
job with the resources
and time available.
Expertise
Does the
evaluation team have the
expertise needed to do the evaluation?
For ex-
ample, if
no one has used models to
evaluate systems before,
then basing an eval-
uation on
this approach is not
sensible. It is no use planning to
use experts to review
an
interface if none are
available. Similarly, running
usability tests requires
expertise.
Analyzing
video can take many
hours, so someone with appropriate
expertise and
equipment
must be available to do it. If statistics
are to be used, then a
statistician
should be
consulted before starting
the evaluation and then
again later for analysis,
if
appropriate.
Decide
how to deal with the ethical
issues
The
Association for Computing
Machinery (ACM) and many
other professional
or-
ganizations
provide ethical codes that
they expect their members to
uphold,
particularly
if their activities involve
other human beings. For
example. people's
privacy
should be protected, which
means that their name
should not be
associated
281
Human
Computer Interaction
(CS408)
VU
with data
collected about them or
disclosed in written reports (unless
they give
permission).
Personal records containing details about
health, employment,
education,
financial
status, and where
participants live should be
confidential. Similarly, it
should
not be possible to identify
individuals from comments written in
reports For
example,
if a focus group involves
nine men and one
woman, the pronoun
"she"
should
not be used in the report
because it will be obvious to
whom it refers
Most
professional societies, universities,
government and other
research offices
require
researchers to provide information
about activities in which
human
participants
will be involved. This documentation is
reviewed by a panel and the
re-
searchers
are notified whether their
plan of work, particularly
the details about
how
human
participants will be treated, is
acceptable.
People
give their time and
their trust when they agree
to participate in an evaluation
study
and both should be respected.
But what does it mean to be
respectful to users?
What
should participants be told
about the evaluation? What
are participants' rights?
Many
institutions and project
managers require participants to read
and sign an
informed
consent. This form explains
the aim of the tests or
research and promises
participants
that their personal details
and performance will not be
made public and
will be
used only for the purpose
stated. It is an agreement between the
evaluator and
the
evaluation participants that
helps to confirm the
professional relationship
that
exists
between them. If your
university or organization does
not provide such a
form
it is
advisable to develop one,
partly to protect yourself in
the unhappy event of
litigation
and partly because the act
of constructing it will remind you
what you
should
consider.
The
following guidelines will help
ensure that evaluations are
done ethically and
that
adequate
steps to protect users'
rights have been
taken.
Tell
participants the goals of
the study and exactly
what they should expect
if
·
they
participate. The information
given to them should include
outlining the
process,
the approximate amount of
time the study will take,
the kind of data
that
will be collected, and how
that data will be analyzed. The
form of the
final
report should be described
and, if possible, a copy
offered to them. Any
payment
offered should also be clearly
stated.
Be sure
to explain that demographic,
financial, health, or other
sensitive in-
·
formation
that users disclose or is discovered
from the tests is
confidential. A
coding
system should be used to
record each user and, if a
user must be iden-
tified
for a follow-up interview,
the code and the person's
demographic details
should be
stored separately from the
data. Anonymity should also
be promised
if audio
and video are
used.
Make
sure users know that
they are free to stop the
evaluation at any time
if
·
they
feel uncomfortable with the
procedure.
Pay
users when possible because
this creates a formal
relationship in which
·
mutual
commitment and responsibility are
expected.
Avoid
including quotes or descriptions that
inadvertently reveal a
person's
·
identity,
as in the example mentioned
above, of avoiding use of
the pronoun
"she" in
the focus group. If quotes need to be
reported, e.g., to justify
con-
clusions,
then it is convention to replace
words that would reveal
the source
with
representative words, in square brackets.
Ask users' permission
in
advance to
quote them, promise them
anonymity, and offer to show
them a
copy of
the report before it is
distributed.
282
Human
Computer Interaction
(CS408)
VU
The
general rule to remember
when doing evaluations is do
unto others only what
you
would
not mind being done to
you.
The
recent explosion in Internet
and web usage has
resulted in more research on
how
people
use these technologies and
their effects on everyday
life. Consequently,
there
are many
projects in which developers
and researchers are logging
users' interactions,
analyzing
web traffic, or examining
conversations in chat rooms, bulletin
boards, or
on email.
Unlike most previous evaluations in
human-computer interaction,
these
studies
can be done without users
knowing that they are
being studied. This
raises
ethical
concerns, chief among which are
issues of privacy, confidentiality,
informed
consent,
and appropriation of others'
personal stories (Sharf, 1999).
People often say
things
online that they would
not say face to face.
Further more, many people
are
unaware
that personal information
they share online can be
read by someone with
technical
know-how years later, even
after they have deleted it
from their personal
mailbox
(Erickson et aL 1999).
Evaluate,
interpret, and present the
data
Choosing
the evaluation paradigm and
techniques to answer the questions
that satisfy
the
evaluation goal is an important
step. So is identifying the
practical and ethical
issues to
be resolved. However, decisions are
also needed about what data
to
collect,
how to analyze it, and
how to present the findings to
the development team.
To a
great extent the technique
used determines the type of
data collected, but
there
are still
some choices. For example,
should the data be treated
statistically? If
qualitative
data is collected, how should it be
analyzed and represented? Some
general
questions
also need to be asked (Preece et
al., 1994): Is the technique
reliable? Will
the
approach measure what is
intended, i.e., what is its
validity? Are biases
creeping
in that
will distort the results? Are
the results generalizable, i.e.,
what is their scope?
Is the
evaluation ecologically valid or is
the fundamental nature of
the process being
changed
by studying it?
Reliability
The
reliability or consistency of a technique
is how well it produces the same
results
on
separate occasions under the
same
circumstances.
Different evaluation
processes
have
different degrees of reliability.
For example, a carefully
controlled experiment
will have
high reliability. Another
evaluator or researcher who
follows exactly the
same
procedure should get similar
results. In contrast, an informal,
unstructured
interview
will have low reliability: it
would be difficult if not
impossible to repeat
exactly
the same discussion.
Validity
Validity
is concerned with whether the
evaluation technique measures
what it is
supposed
to measure. This encompasses
both the technique itself
and the way it is
performed.
If for example, the goal of
an evaluation is to find out
how users use a
new
product
in their homes, then it is not
appropriate to plan a laboratory
experiment. An
ethnographic
study in users' homes would
be more appropriate. If the
goal is to find
average
performance times for
completing a task, then
counting only the number
of
user
errors would be
invalid.
283
Human
Computer Interaction
(CS408)
VU
Biases
Bias
occurs when the results are
distorted. For example,
expert evaluators
performing
a
heuristic evaluation may be
much more sensitive to
certain kinds of design
flaws
than
others. Evaluators collecting
observational data may
consistently fail to
notice
certain
types of behavior because
they do not deem them
important.
Put
another way, they may
selectively gather data that
they think is
important.
Interviewers
may unconsciously influence
responses from interviewees by
their tone
of voice,
their facial expressions, or the
way questions are phrased, so it is
important
to be
sensitive to the possibility of
biases.
Scope
The
scope of an evaluation study
refers to how much its
findings can be
generalized.
For
example, some modeling
techniques, like the
keystroke model, have a
narrow,
precise
scope. The model predicts
expert, error-free behavior
so, for example,
the
results
cannot be used to describe novices
learning to use the
system.
Ecological
validity
Ecological
validity concerns how the
environment in which an evaluation
is
conducted
influences or even distorts
the results. For example,
laboratory experiments
are
strongly controlled and are
quite different from
workplace, home, or
leisure
environments.
Laboratory experiments therefore
have low ecological validity
because
the
results are unlikely to represent what
happens in the real world. In
contrast,
ethnographic
studies do not impact the
environment, so they have
high ecological
validity.
Ecological
validity is also affected
when participants are aware
of being studied.
This
is sometimes
called the Hawthorne
effect after a
series of experiments at the
Western
Electric
Company's Hawthorne factory in
the US in the 1920s and
1930s. The studies
investigated
changes in length of working
day, heating, lighting etc.,
but eventually it
was
discovered that the workers
were reacting positively to
being given special
treatment
rather than just to the
experimental conditions
284
Table of Contents:
|
|||||