|
|||||
Chapter
9
Software
Quality: The Key
to
Successful
Software Engineering
Introduction
The
overall software quality
averages for the United
States have
scarcely
changed since 1979. Although
national data is flat for
quality,
a
few companies have made
major improvements. These
happen to be
companies
that measure quality because
they define quality in such
a
way
that both prediction and
measurement are
possible.
The
same companies also use
full sets of defect removal
activities that
include
inspections and static
analysis as well as testing.
Defect preven-
tion
methods such as joint
application design (JAD) and
development
methods
that focus on quality such
as Team Software Process
(TSP)
are
also used, once the
importance of quality to successful
software
engineering
is realized.
Historically,
large software projects
spend more time and
effort on
finding
and fixing bugs than on
any other activity. Because
software
defect
removal efficiency only
averages about 85 percent,
the major
costs
of software maintenance are
finding and fixing bugs
accidentally
released
to customers.
When
development defect removal is
added to maintenance
defect
removal,
the major cost driver
for total cost of ownership
(TCO) is that
of
defect removal. Between 30
percent and 50 percent of
every dollar
ever
spent on software has gone to
finding and fixing
bugs.
When
software projects run late
and exceed their budgets, a
main
reason
is excessive defect levels,
which slow down testing
and force
applications
into delays and costly
overruns.
555
556
Chapter
Nine
When
software projects are
cancelled and end up in
court for breach
of
contract, excessive defect
levels, inadequate defect
removal, and poor
quality
measures are associated with
every case.
Given
the fact that software
defect removal costs have
been the pri-
mary
cost driver for all
major software projects for
the past 50 years, it
is
surprising that so little is
known about software
quality.
There
are dozens of books about
software quality and
testing, but very
few
of these books actually
contain solid and reliable
quantified data
about
basic topics such
as:
1.
How many bugs are going to
be present in specific new
software
applications?
2.
How many bugs are likely to
be present in legacy software
applica-
tions?
3.
How can software quality be
predicted and
measured?
4.
How effective are ISO standards in
improving quality?
5.
How effective are software
quality assurance organizations
in
improving
quality?
6.
How effective is software quality
assurance certification for
improv-
ing
quality?
7.
How effective is Six Sigma for
improving quality?
8.
How effective is quality function
deployment (QFD) for
improving
quality?
9.
How effective are the higher
levels of the CMMI in
improving
quality?
10.
How effective are the forms
of Agile development in
improving
quality?
11.
How effective is the Rational
Unified Process (RUP) in
improving
quality?
12.
How effective is the Team
Software Process (TSP) in
improving
quality?
13.
How effective are the ITIL
methods in improving
quality?
14.
How effective is service-oriented
architecture (SOA) for
improving
quality?
15.
How effective are certified
reusable components for
improving
quality?
16.
How many bugs can be
eliminated by inspections?
17.
How many bugs can be
eliminated by static
analysis?
18.
How many bugs can be
eliminated by testing?
Software
Quality: The Key to
Successful Software
Engineering
557
19.
How many different kinds of
testing are needed?
20.
How many test personnel are
needed?
21.
How effective are test
specialists compared with
developers?
22.
How effective is automated
testing?
23.
How many test cases are
needed for applications of
various sizes?
24.
How effective is test certification in
improving performance?
25.
How many bug repairs will
themselves include new
bugs?
26.
How many bugs will get
delivered to users?
27.
How much does it cost to
improve software
quality?
28.
How long does it take to
improve software
quality?
29.
How much will we save from
improving software
quality?
30.
How much is the return on
investment (ROI) for better
software
quality?
This
purpose of this chapter is to
show the quantified results
of every
major
form of quality assurance
activity, inspection stage, static
analysis,
and
testing stage on the
delivered defect levels of
software applications.
Defect
removal comes in "private"
and "public" forms. The
private
forms
of defect removal include
desk checking, static
analysis, and unit
testing.
They are also covered in
Chapter 8, because they
concentrate
on
code defects, and that
chapter deals with programming
and code
development.
The
public forms of defect
removal include formal
inspections, static
analysis
if run by someone other than
the software engineer who
wrote
the
code, and many kinds of
testing carried out by test
specialists rather
than
the developers.
Both
private and public forms of
defect removal are
important, but
it
is harder to get data on the
private forms because they
usually occur
with
no one else being present
other than the person
who is doing
the
desk checking or unit
testing. As pointed out in
Chapter 8, IBM
used
volunteers to record defects
found via private removal
activities.
Some
development methods such as
Watts Humphrey's Team
Software
Process
(TSP) and Personal Software
Process (PSP) also record
private
defect
removal.
This
chapter will also explain
how to predict the number of
bugs or
defects
that might occur, and
how to predict defect
removal efficiency
levels.
Not only code bugs, but
also bugs or defects in
requirements,
design,
and documents need to be
predicted. In addition, new
bugs acci-
dentally
included in bug repairs need
to be predicted. These are
called
"bad
fixes." Finally, there are
also bugs or errors in test
cases them-
selves,
and these need to be
predicted, too.
558
Chapter
Nine
This
chapter will discuss the
best ways of measuring
quality and will
caution
against hazardous metrics
such as "cost per defect"
and "lines
of
code," which distort results
and conceal the real
facts of software
quality.
In this chapter, several
critical software quality
topics will be
discussed:
Defining
Software Quality
■
Predicting
Software Quality
■
Measuring
Software Quality
■
Software
Defect Prevention
■
Software
Defect Removal
■
Specialists
in Software Quality
■
The
Economic Value of Software
Quality
■
Software
quality is the key to
successful software
engineering.
Software
has long been troubled by
excessive numbers of
software
defects
both during development and
after release. Technologies
are
available
that can reduce software
defects and improve quality
by sig-
nificant
amounts.
Carefully
planning and selecting an
effective combination of
defect
prevention
and defect removal
activities can shorten
software develop-
ment
schedules, lower software
development costs, significantly
reduce
maintenance
and customer support costs,
and improve both
customer
satisfaction
and employee morale at the
same time. Improving
software
quality
has the highest return on
investment of any current
form of
software
process improvement.
As
the recession continues,
every company is anxious to
lower both
software
development and software
maintenance costs. Improving
soft-
ware
quality will assist in improving
software economics more than
any
other
available technology.
Defining
Software Quality
A
good definition for software
quality is fairly difficult to
achieve. There
are
many different definitions
published in the software
literature.
Unfortunately,
some of the published
definitions for quality are
either
abstract
or off the mark. A workable
definition of software quality
needs
to
have six fundamental
features:
1.
Quality should be predictable
before a software application
starts.
2.
Quality needs to encompass
all deliverables and not
just the code.
3.
Quality should be measurable
during development.
Software
Quality: The Key to
Successful Software
Engineering
559
4.
Quality should be measurable
after release to
customers.
5.
Quality should be apparent to
customers and recognized by
them.
6.
Quality should continue
after release, during
maintenance.
Here
are some of the published
definitions for quality, and
explana-
tions
of why some of them don't
seem to conform to the six
criteria just
listed.
Quality
Definition 1: "Quality
means
conformance
to requirements."
There
are several problems with
this definition, but the
major problem
is
that requirements errors or
bugs are numerous and
severe. Errors in
requirements
constitute about 20 percent of
total software defects
and
are
responsible for more than 35
percent of high-severity
defects.
Defining
quality as conformance to a major
source of error is
circular
reasoning,
and therefore this must be
considered to be a flawed
and
unworkable
definition. Obviously, a workable
definition for quality
has
to
include errors in requirements
themselves.
Don't
forget that the famous Y2K
problem originated as a specific
user
requirement
and not as a coding bug.
Many software engineers
warned
clients
and managers that limiting
date fields to two digits
would cause
problems,
but their warnings were
ignored or rejected
outright.
The
author once worked (briefly) as an
expert witness in a
lawsuit
where
a company attempted to sue an
outsource vendor for using
two-
digit
date fields in a software
application developed under
contract.
During
the discovery phase, it was
revealed that the vendor
cautioned
the
client that two-digit date
fields were hazardous, but
the client
rejected
the advice and insisted
that the Y2K problem be
included in
the
application. In fact, the
client's own internal
standards mandated
two-digit
date fields. Needless to
say, the client dropped
the suit when it
became
evident that they themselves
were the cause of the
problem. The
case
illustrates that "user
requirements" are often
wrong and sometimes
even
dangerous or "toxic."
It
also illustrates another
point. Neither the corporate
executives nor
the
legal department of the
plaintiff knew that the Y2K
problem had
been
caused by their own policies
and practices. Obviously,
there is a
need
for better governance of
software from the top
when problems such
as
this are not understood by
corporate executives.
Using
modern terminology from the
recession, it is necessary to
remove
"toxic requirements" before
conformance can be safe. The
defi-
nition
of quality as "conformance to
requirements" does not lead
to any
significant
quality improvements over
time. No more requirements
are
being
met in 2009 than in
1979.
560
Chapter
Nine
If
software engineering is to become a
true profession rather than
an
art
form, software engineers
have a responsibility to help
customers
define
requirements in a thorough and
effective manner. It is the
job
of
a professional software engineer to
insist on effective
requirements
methods
such as joint application
design (JAD), quality
function deploy-
ment
(QFD), and requirements
inspections.
Far
too often the literature on
software quality is passive
and makes
the
incorrect assumption that
users will be 100 percent
effective in
identifying
requirements. This is a dangerous
assumption. User
require-
ments
are never complete and
they are often wrong.
For a software
project
to succeed, requirements need to be
gathered and analyzed
in
a
professional manner, and
software engineering is the
profession that
should
know how to do this
well.
It
should be the responsibility of
the software engineers to
insist that
proper
requirements methods be used.
These include joint
application
design
(JAD), quality function
deployment (QFD), and
requirements
inspections.
Other methods that benefit
requirements, such as
embedded
users
or use-cases, might also be recommended.
The users themselves
are
not software engineers and
cannot be expected to know
optimal
ways
of expressing and analyzing
requirements. Ensuring that
require-
ments
collection and analysis are
at state-of-the-art levels devolves
to
the
software engineering
team.
Once
user requirements have been
collected and analyzed, then
con-
formance
to them should of course
occur. However, before
conformance
can
be safe and effective,
dangerous or toxic requirements
have to be
weeded
out, excess and superfluous
requirements should be
pointed
out
to the users, and potential
gaps that will cause
creeping require-
ments
should be identified and
also quantified. The users
themselves
will
need professional assistance
from the software
engineering team,
who
should not be passive
bystanders for requirements
gathering and
analysis.
Unfortunately,
requirements bugs cannot be
removed by ordinary
testing.
If requirements bugs are not
prevented from occurring, or
not
removed
via formal inspections, test
cases that are constructed
from the
requirements
will confirm the errors and
not find them. (This is
why
years
of software testing never
found and removed the Y2K
problem.)
A
second problem with this
definition is that it is not
predictable
during
development. Conformance to requirements
can be measured
after
the fact, but that is
too late for cost-effective
recovery.
A
third problem with this
definition is that for
brand-new kinds of
innovative
applications, there may not
be any users other than
the
original
inventor. Consider the
history of successful software
innovation
such
as the APL programming language,
the first spreadsheet, and
the
early
web search engine that
later became Google.
Software
Quality: The Key to
Successful Software
Engineering
561
These
innovative applications were
all created by inventors to
solve
problems
that they themselves wanted
to solve. They were not
created
based
on the normal concept of
"user requirements." Until
prototypes
were
developed, other people
seldom even realized how
valuable the
inventions
would be. Therefore, "user
requirements" are not
completely
relevant
to brand-new inventions until
after they have been
revealed
to
the public.
Given
the fact that software
requirements grow and change
at mea-
sured
rates of 1 percent to more
than 2 percent every
calendar month
during
the subsequent design and
coding phases, it is apparent
that
achieving
a full understanding of requirements is a
difficult task.
Software
requirements are important,
but the combination of
toxic
requirements,
missing requirements, and
excess requirements
makes
simplistic
definitions such as "quality
means conformance to
require-
ments"
hazardous to the software
industry.
Quality
Definition 2: "Quality
means
reliability,
portability, and many other
-ilities."
The
problem with defining quality as a
set of words ending with
ility
is
that
many of these factors are
neither predictable before
they occur nor
easily
measurable when they do
occur.
While
most of the -ility
words
are useful properties for
software
applications,
some don't seem to have
much to do with quality as we
would
consider the term for a
physical device such as an
automobile or
a
toaster. For example,
"portability" may be useful
for a software
vendor,
but
it does not seem to have
much relevance to quality in
the eyes of a
majority
of users.
The
use of -ility
words
to define quality does not
lead to quality
improvements
over time. In 2009, the
software industry is no better
in
terms
of many of these -ilities
than
it was in 1979. Using modern
lan-
guage
from the recession, many of
the -ilities
are
"subprime" definitions
that
don't prevent serious
quality failures. In fact,
using -ilities
rather
than
focusing on defect prevention
and removal slows down
progress
on
software quality
control.
Among
the many words that
are cited when using
this definition can
be
found (in alphabetical
order):
1.
Augmentability
2.
Compatibility
3.
Expandability
4.
Flexibility
5.
Interoperability
562
Chapter
Nine
6.
Maintainability
7.
Manageability
8.
Modifiability
9.
Operability
10.
Portability
11.
Reliability
12.
Scalability
13.
Survivability
14.
Understandability
15.
Usability
16.
Testability
17.
Traceability
18.
Verifiability
Of
the words on this list,
only a few such as
"reliability" and
"test-
ability"
seem to be relevant to quality as
viewed by users. The
other
terms
range from being obscure
(such as "survivability") to useful
but
irrelevant
(such as "portability"). Other
terms may be of interest to
the
vendor
or development team, but not
to customers (such as
"maintain-
ability").
The
-ility
words
seem to have an academic
origin because they
don't
really
address some of the
real-world quality issues
that bother cus-
tomers.
For example, none of these
terms addresses ease or
difficulty
of
reaching customer support to
get help when a bug is
noted or the
software
misbehaves. None of the
terms deals with the speed of
fixing
bugs
and providing the fix to
users in a timely
manner.
The
new Information Technology
Infrastructure Library (ITIL)
does a
much
better job of dealing with
issues of quality in the
eyes of users, such
as
customer support, incident
management, and defect
repairs intervals
than
does the standard literature
dealing with software
quality.
More
seriously, the list of
-ility
words
ignores two of the main
topics
that
have a major impact on
software quality when the
software is
finally
released to customers: (1) defect
potentials and (2) defect
removal
efficiency
levels.
The
term defect
potential refers
to the total quantity of
defects that
will
likely occur when designing
and building a software
application.
Defect
potentials include bugs or
defects in requirements, design,
code,
user
documents, and bad fixes or
secondary defects. The term
defect
removal
efficiency refers
to the percentage of defects
found by any
sequence
of inspection, static analysis,
and test stages.
Software
Quality: The Key to
Successful Software
Engineering
563
To
reach acceptable levels of
quality in the view of
customers, a com-
bination
of low defect potentials and
high defect removal
efficiency rates
(greater
than 95 percent) is needed.
The current U.S. average
for soft-
ware
quality is a defect potential of
about 5.0 bugs per
function point
coupled
with 85 percent defect removal
efficiency. This
combination
yields
a total of delivered defects of
about 0.75 per function
point, which
the
author regards as unprofessional
and unacceptable.
Defect
potentials need to drop below
2.5 per function point
and defect
removal
efficiency needs to average greater
than 95 percent for
software
engineering
to be taken seriously as a true
engineering discipline.
This
combination
would result in a delivered defect
total of only 0.125 defect
per
function
point or about one-sixth of
today's averages. Achieving or
exceed-
ing
this level of quality is
possible today in 2009, but
seldom achieved.
One
of the reasons that good
quality is not achieved as
widely as it
might
be is that concentrating on the
-ility
topics
rather than measuring
defects
and defect removal
efficiency leads to gaps and
failures in defect
removal
activities. In other words,
the -ilities
definitions
of quality are
a
distraction from serious
study of software defect
causes and the
best
methods
of preventing and removing
software defects.
Specific
levels of defect potentials
and defect removal
efficiency levels
could
be included in outsource agreements.
These would probably
be
more
effective than current
contracting practices for
quality, which are
often
nonexistent or merely insist on a
certain CMMI level.
If
software is released with excessive
quantities of defects so that
it
stops,
behaves erratically, or runs
slowly, it will soon be discovered
that
most
of the -ility
words
fall by the wayside.
Defect
quantities in released software
tend to be the paramount
qual-
ity
issue with users of software
applications, coupled with what
kinds of
corrective
actions the software vendor
will take once defects are
reported.
This
brings up a third and more
relevant definition of software
quality.
Quality
Definition 3: "Quality is the
absence
of
defects that would cause an
application to
stop
working or to produce incorrect
results."
A
software defect is a bug or
error that causes software
to either stop
operating
or to produce invalid or unacceptable
results. Using IBM's
severity
scale, defects have four
levels of severity:
Severity
1 means that the software
application does not work at
all.
■
Severity
2 means that major functions
are disabled or produce
incor-
■
rect
results.
Severity
3 means that there are
minor issues or minor
functions are
■
not
working.
Severity
4 means a cosmetic problem
that does not affect
operation.
■
564
Chapter
Nine
There
is some subjectivity with these
defect severity levels
because
they
are assigned by human
beings. Under the IBM model,
the ini-
tial
severity level is assigned
when the bug is first
reported, based on
symptoms
described by the customer or
user who reported the
defect.
However,
a final severity level is
assigned by the change team
when the
defect
is repaired.
This
definition of quality is one
favored by the author for
several reasons.
First,
defects can be predicted before they
occur and measured when
they
do
occur. Second, customer satisfaction
surveys for many software
applica-
tions
appear to correlate more
closely to delivered defect
levels than to any
other
factor. Third, many of the
-ility
factors
also correlate to defects, or to
the
absence of defects. For example,
reliability correlates exactly to
the
number
of defects found in software. Usability,
testability, traceability,
and
verifiability
also have indirect
correlations to software defect
levels.
Measuring
defect volumes and defect
severity levels and then
taking
effective
steps to reduce those
volumes via a combination of
defect pre-
vention
and defect removal
activities is the key to
successful software
engineering.
This
definition of software quality
does lead to quality
improvements
over
time. The companies that
measure defect potentials,
defect removal
efficiency
levels, and delivered
defects have improved both
factors by
significant
amounts. This definition of
quality supports process
improve-
ments,
predicting quality, measuring
quality, and customer
satisfaction
as
measured by surveys.
Therefore,
companies that measure
quality such as IBM,
Dovél
Technologies,
and AT&T have made
progress in quality control.
Also,
methods
that integrate defect
tracking and reporting such
as Team
Software
Process (TSP) have made
significant progress in
reducing
delivered
defects. This is also true
for some open-source
applications
that
have added static-analysis to
their suite of defect
removal tools.
Defect
and removal efficiency
measures have been used to
validate
the
effectiveness of formal inspections,
show the impact of static
analy-
sis,
and fine-tune more than 15
kinds of testing. The
subjective mea-
sures
have no ability to deal with
such issues.
Every
software engineer and every
software project manager
should
be
trained in methods for
predicting software defects,
measuring soft-
ware
defects, preventing software
defects, and removing
software
defects.
Without knowledge of effective
quality and defect control,
soft-
ware
engineering is a hoax.
The
full definition of quality
suggested by the author
includes these
nine
factors:
1.
Quality implies low levels
of defects when software is
deployed,
ideally
approaching zero
defects.
Software
Quality: The Key to
Successful Software
Engineering
565
2.
Quality implies high
reliability, or being able to
run without stop-
page
or strange and unexpected
results or sluggish
performance.
3.
Quality implies high levels
of user satisfaction when
users are sur-
veyed
about software applications
and its features.
4.
Quality implies a feature
set that meets the
normal operational
needs
of a majority of customers or
users.
5.
Quality implies a code
structure and comment
density that minimize
bad
fixes or accidentally inserting
new bugs when attempting to
repair
old
bugs. This same structure will
facilitate adding new
features.
6.
Quality implies effective
customer support when
problems do occur,
with
minimal difficulty for
customers in contacting the
support
team
and getting
assistance.
7.
Quality implies rapid
repairs of known defects,
and especially so
for
high-severity defects.
8.
Quality should be supported by
meaningful guarantees and
war-
ranties
offered by software developers to
software users.
9.
Effective definitions of quality
should lead to quality
improvements.
This
means that quality needs to
be defined rigorously enough so
that
both improvements and
degradations can be identified,
and
also
averages. If a definition for
quality cannot show changes
or
improvements,
then it is of very limited
value.
The
6th, 7th, 8th, and
9th of these quality issues
tend to be sparsely
covered
by the literature on software
quality, other than the
new ITIL
books.
Unfortunately, the ITIL coverage is used
only for internal
software
applications
and is essentially ignored by
commercial software
vendors.
The
definition of quality as an absence of
defects, combined with
sup-
plemental
topics such as ease of
customer support and
maintenance
speed,
captures the essence of quality in
the view of many
software
users
and customers.
Consider
how the three definitions of
quality discussed in this
chapter
might
relate to a well-known software
product such as Microsoft
Vista.
Vista
has been selected as an
example because it is one of the
best-
known
large software applications in
the world, and therefore a
good
test
bed for trying out
various quality
definitions.
Applying
Definition 1 to Vista:
"Quality
means
conformance to requirements."
The
first definition would be
hard to use for Vista,
since no ordinary
cus-
tomers
were asked what features
they wanted in the operating
system,
although
focus groups were probably
used at some point.
566
Chapter
Nine
If
you compare Vista with XP,
Leopard, or Linux, it seems to
include
a
superabundance of features and
functions, many of which
were
neither
requested nor ever used by a
majority of users. One
topic
that
the software engineering
literature does not cover
well, or at
all,
is that of overstuffing applications with
unnecessary and
useless
features.
Most
people know that ordinary
requirements usually omit
about
20
percent of functions that
users want. However, not
many people
know
that for commercial software
put out by companies such
as
Microsoft,
Symantec, Computer Associates,
and the like,
applications
may
have more than 40 percent
features that customers
don't want and
never
use.
Feature
stuffing is essentially a competitive
move to either
imitate
what
competitors do, or to attempt to pull
ahead of smaller
competi-
tors
by providing hundreds of costly
but marginal features that
small
competitors
could not imitate. In either
case, feature stuffing is
not a
satisfactory
conformance to user
requirements.
Further,
certain basic features such
as security and
performance,
which
users of operating systems do
appreciate, are not
particularly
well
embodied in Vista.
The
bottom line is that defining
quality as conformance to
require-
ments
is almost useless for
applications with greater than 1
million
users
such as Vista, because it is impossible
to know what such a
large
group
will want or not
want.
Also,
users seldom are able to
articulate requirements in an
effective
manner,
so it is the job of professional
software engineers to help
users
in
defining requirements with care
and accuracy. Too often
the software
literature
assumes that software
engineers are only passive
observers of
user
requirements, when in fact,
software engineers should be
playing
the
role of physicians who are
diagnosing medical conditions in
order
to
prescribe effective
therapies.
Physicians
don't just passively ask
patients what the problem is
and
what
kind of medicine they want
to take. Our job as software
engineers
is
to have professional knowledge
about effective requirement
gather-
ing
and analysis methods (i.e.,
like medical diagnostic
tests) and to also
know
what kinds of applications
might provide effective
"therapies" for
user
needs.
Passively
waiting for users to define
requirements without
assisting
them
in using joint application
design (JAD) or quality
function deploy-
ment
(QFD) or data mining of
legacy applications is unprofessional
on
the
part of the software
engineering community. Users
are not trained
in
requirements definition, so we need to
step up to the task of
assist-
ing
them.
Software
Quality: The Key to
Successful Software
Engineering
567
Applying
Definition 2 to Vista:
"Quality
means
adherence to -ility
terms."
When
Vista is judged by matching
its features against the
list of -ility
terms
shown earlier, it can be seen
how abstract and difficult
to apply
such
a list really is
1.
Augmentability
Ambiguous
and difficult to apply to
Vista
2.
Compatibility
Poor
for Vista; many old
applications don't
work
3.
Expandability
Applicable
to Vista and fairly
good
4.
Flexibility
Ambiguous
and difficult to apply to
Vista
5.
Interoperability
Ambiguous
and difficult to apply to
Vista
6.
Maintainability
Unknown
to users but probably poor
for Vista
7.
Manageability
Ambiguous
and difficult to apply to
Vista
8.
Modifiability
Unknown
to users but probably poor
for Vista
9.
Operability
Ambiguous
and difficult to apply to
Vista
10.
Portability
Poor
for Vista
11.
Reliability
Originally
poor for Vista but
improving
12.
Scalability
Marginal
for Vista
13.
Survivability
Ambiguous
and difficult to apply to
Vista
14.
Understandability
Poor
for Vista
15.
Usability
Asserted
to be good for Vista, but
questionable
16.
Testability
Poor
for Vista: complexity far
too high
17.
Traceability
Poor
for Vista: complexity far
too high
18.
Verifiability
Ambiguous
and difficult to apply to
Vista
The
bottom line is that more
than half of the -ility
words
are difficult
or
ambiguous to apply to Vista or
any other commercial
software appli-
cation.
Of the ones that can be
applied to Vista, the
application does not
seem
to have satisfied any of
them but expandability and
usability.
Many
of the -ility
words
cannot be predicted nor can
they be mea-
sured.
Worse, even if they could be
predicted and measured, they
are of
marginal
interest in terms of serious
quality control.
Applying
Definition 3 to Vista: "Quality
means
an
absence of defects, plus
corollary factors."
Released
defects can and should be
counted for every software
applica-
tion.
Other related topics such as
ease of reporting defects
and speed of
repairing
defects should also be
measured.
Unfortunately,
for commercial software, not
all of these nine
topics
can
be evaluated. Microsoft together with
many other software
vendors
does
not publish data on bad-fix
injections or even on total
numbers
568
Chapter
Nine
of
bugs reported. However, six
of the eight factors can be
evaluated by
means
of journal articles and
limited Microsoft
data.
1.
Vista was released with
hundreds or thousands of defects,
although
Microsoft
will not provide the exact
number of defects found
and
reported
by users.
2.
At first Vista was not
very reliable, but achieved
acceptable reli-
ability
after about a year of usage.
Microsoft does not report
data
on
mean time to failure or
other measures of
reliability.
3.
Vista never achieved high
levels of user satisfaction
compared with
XP.
The major sources of
dissatisfaction include lack of
printer driv-
ers,
poor compatibility with older
applications, excessive
resource
usage,
and sluggish performance on
anything short of
high-end
computer
chips and lots of
memory.
4.
The feature set of Vista
has been noted as adequate
in customer
surveys,
other than excessive
security vulnerabilities.
5.
Microsoft does not release
statistics on bad-fix injections or on
num-
bers
of defect reports, so this
factor cannot be known by
the general
public.
6.
Microsoft customer support is
marginal and troublesome to
access
and
use. This is a common
failing of many software
vendors.
7.
Some known bugs have
remained in Microsoft Vista
for several
years.
Microsoft is marginally adequate in
defect repair speed.
8.
There is no effective warranty
for Vista (or for
other commercial
applications).
Microsoft's end-user license
agreement (EULA)
absolves
Microsoft of any liabilities
other than replacing a
defec-
tive
disk.
9.
Microsoft's new operating
system is not yet available
as this book
is
published, so it is not possible to
know if Microsoft has
used
methods
that will yield better
quality than Vista. However,
since
Microsoft
does have substantial
internal defect tracking and
quality
assurance
methods, hopefully quality will be
better. Microsoft has
shown
some improvements in quality
over time.
Based
on this pattern of analysis
for the nine factors, it
cannot be said
that
Vista is a high-quality application
under any of the
definitions. Of
the
three major definitions,
defining quality as conformance to
require-
ments
is almost impossible to use with
Vista because with millions of
users,
nobody can define what
everybody wants.
The
second definition of quality as a
string of -ility
words
is difficult
to
apply, and many are
irrelevant. These words
might be marginally
useful
for small internal
applications, but are not
particularly helpful
Software
Quality: The Key to
Successful Software
Engineering
569
for
commercial software. Also,
many key quality issues
such as cus-
tomer
support and maintenance
repair times are not
found in any of
the
-ility
words.
The
third definition that
centers on defects, customer
support, defect
repairs,
and better warranties seems
to be the most relevant. The
third
also
has the advantage of being
both predictable and
measurable, which
the
first two lack.
Given
the high costs of commercial
software, the marginal or
use-
less
warranties of commercial software,
and the poor customer
sup-
port
offered by commercial software
vendors, the author would
favor
mandatory
defect reporting that
required commercial vendors
such as
Microsoft
to produce data on defects
reported by customers, sorted
by
severity
levels.
Mandatory
defect reporting is already a
requirement for many
prod-
ucts
that affect human life or
safety, such as medicines,
aircraft engines,
automobiles,
and many other consumer
products. Mandatory
reporting
of
business and financial
information is also required.
Software affects
human
life and safety in critical
ways, and it affects
business operations
in
critical ways, but to date
software has been exempt
from serious study
due
to the lack of any mandate
for measuring and reporting
released
defect
levels.
Somewhat
surprisingly, the open-source
software community
appears
to
be pulling ahead of old-line
commercial software vendors in
terms
of
measuring and reporting
defects. Many open-source
companies have
added
defect tracking and
static-analysis tools to their
quality arsenal,
and
are making data available to
customers that is not
available from
many
commercial software
vendors.
The
author would also favor a
"lemon law" for commercial
software
similar
to the lemon law for
automobiles. If serious defects
occur that
users
cannot get repaired when
making good-faith effort to
resolve the
situation
with vendors, vendors should be
required to return the
full
purchase
or lease price of the
offending software
application.
A
form of lemon law might also
be applied to outsource
contracts,
except
the litigation already
provides relief for
outsource failures
that
cannot
be used against commercial
software vendors due to
their one-
sided
EULA agreements, which disclaim
any responsibility for
quality
other
than media
replacement.
No
doubt software vendors would
object to both mandatory
defect
tracking
and also to a lemon law. But
shrewd and farsighted
vendors
would
soon perceive that both
topics offer significant
competitive advan-
tages
to software companies that
know how to control quality.
Since
high-quality
software is also cheaper and
faster to develop and
has
lower
maintenance costs than buggy
software, there are even
more
important
economic advantages for
shrewd vendors.
570
Chapter
Nine
The
author hypothesizes that a
combination of mandatory
defect
reporting
by software vendors plus a
lemon law would have the
effect
of
improving software quality by
about 50 percent every five
years for
perhaps
a 20-year period.
Software
quality needs to be taken
much more seriously than it
has
been.
Now that the recession is
expanding, better software
quality con-
trol
is one of the most effective
strategies for lowering
software costs.
But
effective quality control
depends on better measures of
quality
and
on proven combinations of defect
prevention and defect
removal
activities.
Quality
prediction, quality measurement,
better defect
prevention,
and
better defect removal are on
the critical path for
advancing software
engineering
to the status of a true
engineering discipline instead
of
a
craft or art form as it is
today in 2009.
Defining
and Predicting Software
Defects
If
delivered defects are the
main quality problem for
software, it is
important
to know what causes these
defects, so that they can be
pre-
vented
from occurring or removed
before delivery.
The
software quality literature
includes a great deal of
pedantic
bickering
about various terms such as
"fault," "error," "bug,"
"defect"
and
many other terms. For
this book, if software stops
working, won't
load,
operates erratically, or produces
incorrect results due to
mis-
takes
in its own code, then that
is called a "defect." (This
same defi-
nition
has been used in 14 of the
author's previous books and
also in
more
than 30 journal articles.
The author's first use of
this definition
started
in 1978.)
However,
in the modern world, the
same set of problems can
occur
without
the developers or the code
being the cause. Software
infected
by
a virus or spyware can also
stop working, refuse to
load, operate
erratically,
and produce incorrect
results. In today's world,
some defect
reports
may well be caused by
outside attacks.
Attacks
on software from hackers are
not the same as
self-inflicted
defects,
although successful attacks do
imply security
vulnerabilities.
In
this book and the
author's previous books,
software defects have
five
main points of origin:
1.
Requirements
2.
Design
3.
Code
4.
User documents
5.
Bad fixes (new defects
due to repairs of older
defects)
Software
Quality: The Key to
Successful Software
Engineering
571
Because
the author worked for IBM
when starting research on
quality,
the
IBM severity scale for
classifying defect severity
levels is used in
this
book
and the author's previous
books. There are four
severity
levels:
Severity
1: Software does not operate
at all
■
Severity
2: Major features disabled or
incorrect
■
Severity
3: Minor features disabled or
incorrect
■
Severity
4: Cosmetic error that does
not affect operation
■
There
are other methods of
classifying severity levels,
but these four
are
the most common due to IBM
introducing them in the
1960s, so they
became
a de facto standard.
Software
defects have seven kinds of
causes,
with
the major causes
including
Errors
of omission:
Something
needed was accidentally left
out
Errors
of commission:
Something
needed is incorrect
Errors
of ambiguity:
Something
is interpreted in several
ways
Errors
of performance:
Some
routines are too slow to be
useful
Errors
of security:
Security
vulnerabilities allow attacks
from outside
Errors
of excess:
Irrelevant
code and unneeded features
are included
Errors
of poor removal:
Defects
that should easily have
been found
These
seven causes occur with
different frequencies for
different
deliverables.
For paper documents such as
requirements and
design,
errors
of ambiguity are most
common, followed by errors of
omission.
For
source code, errors of
commission are most common,
followed by
errors
of performance and
security.
The
seventh category, "errors of poor
removal," would require
root-cause
analysis
for identification. The
implication is that the
defect was neither
subtle
nor hard to find, but
was missed because test
cases did not cover
the
code
segment or because of partial
inspections that overlooked
the defect.
In
a sense, all delivered defects
might be viewed as errors of
poor
removal,
but it is important to find
out why various kinds of
inspection,
static
analysis, or testing missed
obvious bugs. This category
should not
be
assigned for subtle defects,
but rather for obvious
defects that should
have
been found but for
some reason escaped to the
outside world.
The
main reason for including
errors of poor removal is to
encourage
more
study and research on the
effectiveness of various kinds of
defect
removal
operations. More solid data
is needed on the removal
efficiency
levels
of inspections, static analysis,
automatic testing, and all
forms of
manual
testing.
The
combination of defect origins,
defect severity, and defect
causes
provides
a useful taxonomy for
classifying defects for
statistical analysis
or
root-cause analysis. For
example, the Y2K problem was
cited earlier
572
Chapter
Nine
in
this chapter. In its most
common manifestation, the Y2K
problem
might
have this description using
the taxonomy just
discussed:
Y2K
origin:
Requirements
Y2K
severity:
Severity
2 major features
disabled
Y2K
primary cause:
Error
of commission
Y2K
secondary cause:
Error
of poor removal
Note
that this taxonomy allows
the use of primary and
secondary fac-
tors
since sometimes more than
one problem is behind having a
defect
in
software.
Note
also that the Y2K problem
did not have the
same severity for
every
application. An approximate distribution
of Y2K severity levels
for
several hundred applications
noted that the software
stopped in
about
15 percent of instances, which
are severity 1 problems; it
created
severity
2 problems in about 50 percent; it
created severity 3
problems
in
about 25 percent; and had no
operational consequences in about
10
percent
of the applications in the
sample.
To
know the origin of a defect,
some research is required.
Most defects
are
initially found because the
code stops working or
produces erratic
results.
But it is important to know if upstream
problems such as
requirements
or design issues are the
true cause. Root-cause
analysis
can
find the true causes of
software defects.
Several
other factors should be
included in a taxonomy for
tracking
defects.
These include whether a
reported defect is valid or
invalid.
(Invalid
defects are common and
fairly expensive, since they
still require
analysis
and a response.) Another
factor is whether a defect
report is
new
and unique, or merely a
duplicate of a prior defect
report.
For
testing and static analysis,
the category of "false
positives" needs
to
be included. A false
positive is
the mistaken identification of a
code
segment
that initially seems to be
incorrect, but which later
research
reveals
is actually correct.
A
third factor deals with
whether the repair team
can make the
same
problem
occur on their own systems,
or whether the defect was
caused
by
a unique configuration on the
client's system. When
defects cannot be
duplicated,
they were termed abeyant
defects by IBM, since
additional
information
needed to be collected to solve
the problem.
Adding
these additional topics to
the Y2K example would result
in
an
expanded taxonomy:
Y2K
origin:
Requirements
Y2K
validity:
Valid
defect report
Y2K
uniqueness:
Duplicate
(this problem was reported
millions of times)
Y2K
severity:
Severity
2 major features
disabled
Y2K
primary cause:
Error
of commission
Y2K
secondary cause:
Error
of poor removal
Software
Quality: The Key to
Successful Software
Engineering
573
When
defects are being counted or
predicted, it is useful to have
a
standard
metric for normalizing the
results. As discussed in Chapter
5,
there
are at least ten candidates
for such a normalizing
metric, including
function
points, story points,
use-case points, lines of
code, and so on.
In
this book and also in
the author's previous books,
the function
point
metric defined by the
International Function Point
Users Group
(IFPUG)
is used to quantify and
normalize data for both
defects and
productivity.
There
are several reasons for
using IFPUG function points.
The most
important
reason in terms of measuring
software defects is that
non-
code
defects in requirements, design,
and documents are major
defect
sources
and cannot be measured using
the older "lines of code"
metric.
Another
important reason is that all
of the major benchmark
data
collections
for productivity and quality
use function point metrics,
and
data
expressed via IFPUG function
points composes about 85
percent
of
all known benchmarks.
It
is not impossible to use
other metrics for
normalization, but if
results
are to be compared against
industry benchmarks such as
those
published
by the International Software
Benchmarking Standards
Group
(ISBSG), the IFPUG function
points are the most
convenient.
Later
in the discussion of defect
prediction, examples will be given
of
using
other metrics in addition to IFPUG
function points.
It
is interesting to combine the
origin, severity, and cause
factors to
examine
the approximate frequency of
each.
Table
9-1 shows the combination of
these factors for software
applica-
tions
during development. Therefore,
Table 9-1 shows defect
potentials,
or
the probable numbers of
defects that will be encountered
during
development
and after release. Only
severity 1 and severity 2
defects
are
shown in Table 9-1.
Data
on defect potentials is based on
long-range studies of defects
and
defect
removal efficiency carried
out by organizations such as
the IBM
Software
Quality Assurance groups,
which have been studying
software
quality
for more than 35
years.
TABLE
9-1
Overview
of Software Defect Potentials
Defect
Defects
per
Severity
1
Severity
2
Most
Frequent
Origins
Function
Point
Defects
Defects
Defect
Cause
Requirements
1.00
11.00%
15.00%
Omission
Design
1.25
15.00%
20.00%
Omission
Code
1.75
70.00%
57.00%
Commission
Documents
0.60
1.00%
1.00%
Ambiguity
Bad
fixes
0.40
3.00%
7.00%
Commission
TOTAL
5.00
100.00%
100.00%
Omission
574
Chapter
Nine
Other
corporations such as AT&T, Coverity,
Computer Aid Inc.
(CAI),
Dovél
Technologies, Motorola, Software
Productivity Research
(SPR),
Galorath
Associates, the David
Consulting Group, the
Quality and
Productivity
Management Group (QPMG),
Unisys, Microsoft, and
the
like,
also carry out long-range
studies of defects and
removal efficiency
levels.
Most
such studies are carried
out by corporations rather
than uni-
versities
because academia is not really
set up to carry out
longitudinal
studies
that may last more
than ten years.
While
coding bugs or coding
defects are the most
numerous during
development,
they are also the
easiest to find and to get
rid of. A
combination
of inspections, static analysis,
and testing can wipe
out
more
than 95 percent of coding
defects and sometimes top 99
percent.
Requirements
defects and bad fixes
are the toughest categories
of defect
to
eliminate.
Table
9-2 uses Table 9-1 as a
starting point, but shows
the latent
defects
that will still be present
when the software
application is deliv-
ered
to users. Table 9-2 shows
approximate U.S. averages
circa 2009.
Note
the variations in defect
removal efficiency by
origin.
It
is interesting that when the
software is delivered to clients,
require-
ments
defects are the most
numerous, primarily because they
are the
most
difficult to prevent and
also the most difficult to
find. Only formal
requirements-gathering
methods combined with formal
requirements
inspections
can improve the situation
for finding and removing
require-
ments
defects.
If
not prevented or removed,
both requirements bugs and
design bugs
eventually
find their way into
the code. These are
not coding bugs
per
se,
such as branching to a wrong
address, but more serious
and deep-
seated
kinds of bugs or
defects.
It
was noted earlier in this
chapter that requirements
defects cannot
be
found and removed by means
of testing. If a requirements defect
is
not
prevented or removed via
inspection, all test cases
created using the
requirements
will confirm the defect and
not identify it.
TABLE
9-2
Overview
of Delivered Software Defects
Defect
Defects
per
Removal
Delivered
Defects per
Most
Frequent
Origins
Function
Point
Efficiency
Function
Point
Defect
Cause
Requirements
1.00
70.00%
0.30
Commission
Design
1.25
85.00%
0.19
Commission
Code
1.75
95.00%
0.09
Commission
Documents
0.60
91.00%
0.05
Omission
Bad
fixes
0.40
70.00%
0.12
Commission
TOTAL
5.00
85.02%
0.75
Commission
Software
Quality: The Key to
Successful Software
Engineering
575
Since
Table 9-2 reflects
approximate U.S. averages,
the methods
assumed
are those of fairly careless
requirements gathering:
water-
fall
development, CMMI level 1, no formal
inspections of requirements,
design,
or code; no static analysis;
and using only five
forms of testing:
(1)
unit test, (2) new
function test, (3)
regression test, (4) system
test,
and
(5 )acceptance test.
Note
also that during
development, requirements will continue
to
grow
and change at rates of 1
percent to 2 percent every
calendar month.
These
changing requirements have
higher defect potentials
than the
original
requirements and lower
levels of defect removal
efficiency. This
is
yet another reason why
requirements defects cause
more problems
than
any other defect origin
point.
Software
requirements are the most
intractable source of
software
defects.
However, methods such as
joint application design
(JAD), qual-
ity
function deployment (QFD), Six
Sigma analysis, root-cause
analy-
sis,
embedding users with the
development team as practiced by
Agile
development,
prototypes, and the use of
formal requirements
inspec-
tions
can assist in bringing
requirements defects under
control.
Table
9-3 shows what quality
might look like if an
optimal combina-
tion
of defect prevention and
defect removal activities
were utilized.
Table
9-3 assumes formal
requirements methods, rigorous
development
such
as practiced using the Team
Software Process (TSP) or
the higher
CMMI
levels, prototypes and JAD,
formal inspections of all
deliverables,
static
analysis of code, and a full
set of eight testing stages:
(1) unit
test,
(2) new function test,
(3) regression test, (4)
performance test, (5)
security
test, (6) usability test,
(7) system test, and
(8) acceptance test.
Table
9-3 also assumes a software
quality assurance (SQA)
group
and
rigorous reporting of software
defects starting with
requirements,
continuing
through inspections, static
analysis and testing, and
out
into
the field with multiple
years of customer-reported defects,
main-
tenance,
and enhancements. Accumulating
data such as that shown
in
Tables
9-1 through 9-3 requires
longitudinal data collection
that runs
for
many years.
TABLE
9-3
Optimal
Defect Prevention and Defect
Removal Activities
Defect
Defects
per
Removal
Delivered
Defects per Most
Frequent
Origins
Function
Point
Efficiency
Function
Point
Defect
Cause
Requirements
0.50
95.00%
0.03
Omission
Design
0.75
97.00%
0.02
Omission
Code
0.50
99.00%
0.01
Commission
Documents
0.40
96.00%
0.02
Omission
Bad
fixes
0.20
92.00%
0.02
Commission
TOTAL
2.35
96.40%
0.08
Omission
576
Chapter
Nine
This
combination has the effect
of cutting defect potentials by
more
than
50 percent and of raising
cumulative defect removal
efficiency from
today's
average of 85 percent up to more
than 96 percent.
It
might be possible to even
exceed the results shown in
Table 9-3,
but
doing so would require
additional methods such as
the availability
of
a full suite of certified
reusable materials.
Tables
9-2 and 9-3 are
oversimplifications of real-life results.
Defect
potentials
vary with the size of the
application and with other
factors.
Defect
removal efficiency levels
also vary with application
size. Bad-fix
injections
also vary by defect origins.
Both defect potentials and
defect
removal
efficiency levels vary by
methodology, by CMMI levels, and
by
other
factors as well. These will be
discussed later in the
section of this
chapter
dealing with defect
prediction.
Because
of the many definitions of
quality used by the
industry, it is
best
to start by showing what is
predictable and measurable
and what is
not.
To sort out the relevance of
the many quality
definitions, the
author
has
developed a 10-point scoring
method for software quality
factors.
If
a factor leads to improvement in
quality, its maximum score is
3.
■
If
a factor leads to improvement in
customer satisfaction, its
maxi-
■
mum
score is 3.
If
a factor leads to improvement in
team morale, its maximum
score
■
is
2.
If
a factor is predictable, its
maximum score is 1.
■
If
a factor is measurable, its
maximum score is 1.
■
The
total maximum score is
10.
■
The
lowest possible score is 0.
■
Table
9-4 lists all of the
quality factors discussed in
this chapter in
rank
order by using the scoring
factor just outlined. Table
9-4 shows
whether
a specific quality factor is
measurable and predictable,
and
also
the relevance of the factor
to quality as based on surveys of
soft-
ware
customers. It also includes a
weighted judgment as to
whether
the
factor has led to
improvements in quality among
the organizations
that
use it.
The
quality definitions with a score of 10
have been the most
effec-
tive
in leading to quality improvements
over time. As a rule, the
quality
definitions
scoring higher than 7 are
useful. However, the quality
defi-
nitions
that score below 5 have no
empirical data available
that shows
any
quality improvement at
all.
While
Table 9-4 is somewhat
subjective, at least it provides a
math-
ematical
basis for scoring the
relevance and importance of
the rather
Software
Quality: The Key to
Successful Software
Engineering
577
TABLE
9-4
Rank
Order of Quality Factors by
Importance to Quality
Measurable
Predictable
Relevance
to
Property?
Property?
Quality
Score
Best
Quality Definitions
Defect
potentials
Yes
Yes
Very
high
10.00
Defect
removal efficiency
Yes
Yes
Very
high
10.00
Defect
severity levels
Yes
Yes
Very
high
10.00
Defect
origins
Yes
Yes
Very
high
10.00
Reliability
Yes
Yes
Very
high
10.00
Good
Quality Definitions
Toxic
requirements
Yes
No
Very
high
9.50
Missing
requirements
Yes
No
Very
high
9.50
Requirements
conformance
Yes
No
Very
high
9.00
Excess
requirements
Yes
No
Medium
9.00
Usability
Yes
Yes
Very
high
8.00
Testability
Yes
Yes
High
8.00
Defect
causes
Yes
No
Very
high
8.00
Fair
Quality Definitions
Maintainability
Yes
Yes
High
7.00
Understandability
Yes
Yes
Medium
6.00
Traceability
Yes
No
Low
6.00
Modifiability
Yes
No
Medium
5.00
Verifiability
Yes
No
Medium
5.00
Poor
Quality Definitions
Portability
Yes
Yes
Low
4.00
Expandability
Yes
No
Low
3.00
Scalability
Yes
No
Low
2.00
Interoperability
Yes
No
Low
1.00
Survivability
Yes
No
Low
1.00
Augmentability
No
No
Low
0.00
Flexibility
No
No
Low
0.00
Manageability
No
No
Low
0.00
Operability
No
No
Low
0.00
vague
and ambiguous collection of
quality factors used by the
software
industry.
In essence, Table 9-4 makes
these points:
1.
Conformance to requirements is hazardous
unless incorrect,
toxic,
or
dangerous requirements are
weeded out. This definition
has not
demonstrated
any improvements in quality
for more than 30
years.
2.
Most of the -ility
quality
definitions are hard to
measure, and many
are
of marginal significance. Some
are not measurable either.
None
of
the -ility
words
tend to lead to tangible
quality gains.
578
Chapter
Nine
3.
Quantification of defect potentials
and defect removal
efficiency
levels
have had the greatest
impact on improving quality
and also
the
greatest impact on customer
satisfaction levels.
If
software engineering is to evolve
from a craft or art form
into a true
engineering
field, it is necessary to put
quality on a firm quantitative
basis
and to move away from
vague and subjective quality
definitions.
These
will still have a place, of
course, but they should
not be the pri-
mary
definitions for software
quality.
Predicting
Software Defect
Potentials
To
predict software quality, it is
necessary to measure software
quality.
Since
companies such as IBM have
been doing this for
more than 40
years,
the best available data
comes from companies that
have full life-
cycle
quality measurement programs
that start with requirements,
con-
tinue
through development, and
then extend out to
customer-reported
defects
for as long as the software
is used, which may be 25
years or
more.
The next best source of
data comes from benchmark
and com-
mercial
software estimating tool
companies, since they
collect historical
data
on quality as well as on
productivity.
Because
software defects come from
five different sources, the
quick-
est
way to get a useful
approximation of software defect
potentials is to
use
IFPUG function point
metrics.
The
basic sizing rule for
predicting defect potentials with
function
point
is: Take
the size of a software application in
function points
and
raise
it to the 1.25 power. The
result will be a useful
approximation
of
software defect potentials
for applications between a low of
about
10
function points and a high
of about 5000 function
points.
The
exponent for this rule of
thumb would need to be
adjusted down-
wards
for the higher CMMI levels,
Agile, RUP, and the
Team Software
Process
(TSP). But since the rule is
intended to be applied early,
before
any
costs are expended, it still
provides a useful starting
point. Readers
might
want to experiment with local
data and find an exponent
that
gives
useful results against local
quality and defect
data.
Table
9-5 shows approximate U.S.
averages for defect
potentials.
Recall
that defect potentials are
the sum of five defect
origins: require-
ments
defects, design defects,
code defects, document
defects, and bad-
fix
injections.
As
can be seen from Table 9-5,
defect potentials increase with
applica-
tion
size. Of course, other
factors can reduce or
increase the
potentials,
as
will be discussed later in the
section on defect
prevention.
While
the total defect potential
is useful, it is also useful to
know
the
distribution of defects among
the five origins or sources.
Table 9-6
Software
Quality: The Key to
Successful Software
Engineering
579
TABLE
9-5
U.S.
Averages for Software Defect
Potentials
Size
in FP
Defects
per
Defect
Function
Points
Function
Point
Potentials
1
1.50
2
10
2.34
23
100
3.04
304
1,000
4.62
4,621
10,000
6.16
61,643
100,000
7.77
777,143
1,000,000
8.56
8,557,143
Average
4.86
1,342,983
illustrates
typical defect distribution
percentages using
approximate
average
values.
Applying
the distribution shown in
Table 9-6 to a sample
application
of
1500 function points, Table
9-7 illustrates the
approximate defect
potential,
or the total number of
defects that might be found
during
development
and by customers.
These
simple overall examples are
not intended as substitutes
for
commercial
quality estimation tools
such as KnowledgePlan and
SEER,
which
can adjust their predictions
based on CMMI levels;
development
methods
such as Agile, TSP, or RUP;
use of inspections; use of
static
analysis;
and other factors which
would cause defect
potentials to vary
and
also which cause defect
removal efficiency levels to
vary.
Rules
of thumb are never very
accurate, but their
convenience and
ease
of use provide value for
rough estimates and early
sizing. However,
such
rules should not be used
for contracts or serious
estimates.
Predicting
Code Defects
Using
function point metrics as an
overall tool for quality
prediction is
useful
because noncoding defects outnumber
code defects. That
being
said,
there are more coding
defects than any other
single source.
TABLE
9-6
Percentages
of Defects by Origin
Defects
per
Percent
of Total
Defect
Origins
Function
Point
Defects
Requirements
1.00
20.00%
Design
1.25
25.00%
Source
code
1.75
35.00%
User
documents
0.60
12.00%
Bad
fixes
0.40
8.00%
TOTAL
5.00
100.00%
580
Chapter
Nine
TABLE
9-7
Defect
Potentials for a Sample
Application
(Application
size = 1500 Function
Points)
Defects
per
Defect
Percent
of
Defect
Origins
Function
Point
Potentials
Total
Defects
Requirements
1.00
1,500
20.00%
Design
1.25
1,875
25.00%
Source
code
1.75
2,625
35.00%
User
documents
0.60
900
12.00%
Bad
fixes
0.40
600
8.00%
TOTAL
5.00
7,500
100.00%
Predicting
code defects is fairly
tricky for six
reasons:
1.
More than 2,500 programming
languages are in existence,
and they
are
not equal as sources of
defects.
2.
A majority of modern software
applications use more than
one language,
and
some use as many as 15 different
programming languages.
3.
The measured range of
performance by a sample of
programmers
using
the same language for
the same test application
varies by
more
than 10 to 1. Individual skills
and programming styles
create
significant
variations in the amount of
code written for the
same
problem,
in defect potentials, and
also in productivity.
4.
Lines of code can be counted
using either physical lines
or logical
statements.
For some languages, the
two counts are identical,
but
for
others, there may be as much
as a 500 percent variance
between
physical
and logical counts.
5.
For a number of languages
starting with Visual Basic,
some program-
ming
is done by means of buttons or
pull-down menus.
Therefore,
programming
is done without using procedural source
code. There are
no
effective rules for counting
source code with such
languages.
6.
Reuse of source code from
older applications or from
libraries of
reusable
code is quite common. If the
reused code is certified, it
will
have
very few defects compared
with new custom code.
To
predict coding defects, it is
necessary to know the
level
of a
pro-
gramming
language. The concept of the
level of a language is often
used
informally
in phrases such as "high-level" or
"low-level" languages.
Within
IBM in the 1970s, when
research was first carried
out on
predicting
code defects, it was
necessary to give a formal
mathematical
definition
to language levels. Within IBM
the level
was
defined as the
number
of statements in basic assembly
language needed to equal
the
functionality
of 1 statement in a higher-level
language.
Software
Quality: The Key to
Successful Software
Engineering
581
Using
this definition, COBOL was a
level 3 language, because it
took
3
basic assembly statements to
equal 1 COBOL statement.
Using the
same
rule, SMALLTALK is a level 15
language.
(For
several years before
function points were
invented, IBM used
"equivalent
assembly statements" as the
basis for estimating
non-code
work
such as user manuals. Thus,
instead of basing a publication
budget
on
10 percent of the effort for
writing a program in PL/S,
the budget
would
be based on 10 percent of the
effort if the code were
basic assem-
bly
language. This method was
crude but reasonably
effective.)
Dissatisfaction
with the equivalent assembler
method for estimation
was
one of the reasons IBM assigned
Allan Albrecht and his
colleagues
to
develop function point
metrics.
Additional
programming languages such as
APL, Forth, Jovial,
and
others
were starting to appear, and
IBM wanted both a metric and
esti-
mating
methods that could deal with
both noncoding and coding
work
in
an accurate fashion. IBM also
wanted to predict coding
defects.
The
use of macro-assembly language
had introduced reuse, and
this
caused
measurement problems, too. It
raised the issue of how to
count
reused
code in software applications or
any other reused material.
The
solution
here was to separate
productivity and quality
into two topics:
(1)
development and (2)
delivery.
The
former dealt with the code
and materials that had to be
constructed
from
scratch. The latter dealt
with the final application as
delivered,
including
reused material. For
example, using macro-assembly
language
a
productivity rate for
development
productivity might
be 300 lines of
code
per month. But due to
reusing code in the form of
macro expansions,
delivery
productivity might
be as high as 750 lines of
code per month.
The
same distinction affects
quality, too. Assume a
program had 1000
lines
of new code and 1000
lines of reused code. There
might be 15 bugs
per
KLOC in the new code
but 0 bugs per KLOC in
the reused code.
This
is an important business distinction
that is not well
understood
even
in 2009. The true goal of
software engineering is to improve
the
rate
of delivery productivity and
quality rather than
development pro-
ductivity
and quality.
After
function point metrics were
developed circa 1975, the
defini-
tion
of "language level" was
expanded to include the
number of logical
code
statements equivalent to 1 function
point. COBOL, for
example,
requires
about 105 statements per
function point in the
procedure and
data
divisions. (This expansion is
the mathematical basis for
backfiring,
or
direct conversion from
source code to function
points.)
Table
9-8 illustrates how code
size and coding defects
would vary if
15
different programming languages
were used for the
same applica-
tion,
which is 1000 function
points. Table 9-8 assumes a
constant value
of
15 potential coding defects
per KLOC for all languages.
However,
582
Chapter
Nine
TABLE
9-8
Examples
of Defects per KLOC and
Function Point for 15
Languages
(Assumes
a constant of 15 defects per KLOC
for all languages)
Language
Sample
Source
Code per Source Code
per Coding
Defects
per
Level
Languages
Function Point
1000
FP
Defects
Function
Point
1.
Assembly
320
320,000
4,800
4.80
2.
C
160
160,000
2,400
2.40
3.
COBOL
107
106,667
1,600
1.60
4.
PL/I
80
80,000
1,200
1.20
5.
Ada95
64
64,000
960
0.96
6.
Java
53
53,333
800
0.80
7.
Ruby
46
45,714
686
0.69
8.
E
40
40,000
600
0.60
9.
Perl
36
35,556
533
0.53
10.
C++
32
32,000
480
0.48
11.
C#
29
29,091
436
0.44
12.
Visual
27
26,667
400
0.40
Basic
13.
ASP
NET
25
24,615
369
0.37
14.
Objective
C
23
22,857
343
0.34
15.
Smalltalk
21
21,333
320
0.32
the
15 languages have levels
that vary from 1 to 15, so
very different
quantities
of code will be created for the
same 1000 function
points.
Note:
Language levels are variable
and change based on volumes
of
reused
code or calls to external
functions. The levels shown
in Table 9-8
are
only approximate and are
not constants.
As
can be seen from Table 9-8,
in order to predict coding
defects, it is
critical
to know the programming
language (or languages) that
will be
used
and also the size of
the application using both
function point and
lines
of code metrics.
The
situation is even more
tricky when combinations of
two or more
languages
are used within the
same application. However,
this prob-
lem
is handled by commercial software
cost-estimating tools such
as
KnowledgePlan,
which include multilanguage
estimating capabilities.
Reused
code also adds to the
complexity of predicting coding
defects.
To
show the results of multiple
languages in the same
application, let
us
consider two case
studies.
In
Case Study A, there are
three different languages
and each lan-
guage
has 1000 lines of code,
counted using logical
statements. In Case
Study
B, we have the same three
languages, but now each
language
comprises
exactly 25 function points
each.
For
Case A, the total volume of
source code is 3000 lines of
code; total
function
points are 73; and
total code defect potentials
are 45.
Software
Quality: The Key to
Successful Software
Engineering
583
Case
A: Three Languages with 1000
Lines of Code
Each
Lines
of
LOC
per
Function
Defect
Languages
Levels
Code (LOC) Function
Point
Points
Potential
6
15
C
2.00
1,000
160
Java
6.00
1,000
53
19
15
Smalltalk
15.00
1,000
21
48
15
TOTAL
3,000
73
45
AVERAGE
7.76
41
When
we change the assumptions to
Case B and use a constant
value
of
25 function points for each
language, the total number
of function
points
only changes from 73 to 75.
But the volume of source
code almost
doubles,
as do numbers of defects. This is because
of the much greater
impact
of the lowest-level language,
the C programming
language.
When
considering either Case A or
Case B, it is easily seen that
pre-
dicting
either size or quality for a
multi language application is a
great
deal
more complicated than for a
single-language application.
Case
B: Three Languages with 25
Function Points
Each
Lines
of
LOC
per
Function
Defect
Languages
Levels
Code (LOC) Function
Point
Points
Potential
C
2.00
4,000
160
25
60
Java
6.00
1,325
53
25
20
Smalltalk
15.00
525
21
25
8
TOTAL
5,850
75
88
AVERAGE
4.10
78
It
is interesting to look at Case A
and Case B in a side-by-side
format
to
highlight the differences.
Note that in Case B the
influence of the
lowest-level
language, the C programming
language, increases
both
code
volumes and defect
potentials:
Source
Code
(Logical
statements)
Case
A
Case
B
C
1,000
4,000
Java
1,000
1,325
Smalltalk
1,000
525
Total
lines of code
3,000
5,850
Total
KLOC
3.00
5.85
Function
Points
73
75
Code
Defects
45
88
Defects
per KLOC
15
15
Defects
per Function Point
0.62
1.17
584
Chapter
Nine
Cases
A and B oversimplify real-life
problems because each case
study
uses
constants for data items
that in real-life are
variable. For
example,
the
constant of 15 defects per
KLOC for code defects is
really a variable
that
can range from less
than 5 to more than 25
defects per KLOC.
The
number of source code
statements per function
point is also a vari-
able,
and each language can vary
by perhaps a range of 2 to 1 around
the
average
values shown by the nominal
language "level" default
values.
These
variables illustrate why predicting
quality and defect
levels
depends
so heavily upon measuring
quality and defect levels.
The exam-
ples
also illustrate why definitions of
quality need to be both
measurable
and
predictable.
Other
variables can affect the
ability to predict size and
defects as well.
Suppose,
for example, that reused
code composed 50 percent of
the code
volume
in Case A. Suppose also that
the reused code is certified
and has
zero
defects. Now the calculations
for defect predictions need
to include
reuse,
which in this example lowers
defect potentials by 50
percent.
When
the size of the application
is used for productivity
calculations,
it
is necessary to decide whether
development productivity or
delivery
productivity,
or both, are the figures of
interest.
Predicting
software defects is possible to
accomplish with fairly
good
accuracy, but the
calculations are not
trivial, and they need
to
include
a number of variables that
can only be determined by
careful
measurements.
The
Quality Impacts of Creeping
Requirements
Function
point analysis at the end of
the requirements phase and
then
again
at application delivery shows
that requirements grow and
change
at
rates in excess of 1 percent
per calendar month during
the design
and
coding phases. The total
growth in creeping requirements
ranges
from
a low of less than 10
percent of total requirements to a
high of
more
than 50 percent. (One unique
project had requirements
growth in
excess
of 200 percent.)
As
an example, if an application is sized at
1000 function points
when
the
initial requirements phase is
over, then every month at
least 10 new
function
points will be added in the
form of new requirements.
This
growth
might continue for perhaps
six months, and so increase
the size
of
the application from 1000 to
1060 function points. For
small projects,
the
growth of creeping requirements is
more of an inconvenience
than
a
serious issue.
Larger
applications have longer
schedules and usually higher
rates of
requirements
change as well. For an
application initially sized at
10,000
function
points, new requirements
might lead to monthly growth
rates of
125
function points for perhaps
20 calendar months. The
delivered applica-
tion
might be 12,500 function
points rather than 10,000
function points.
Software
Quality: The Key to
Successful Software
Engineering
585
As
this example illustrates,
creeping requirements growth of a
full 25
percent
can have a major impact on
development schedules, costs,
and
also
on quality and delivered
defects.
Because
new and changing
requirements are occurring
later in devel-
opment
than the original
requirements, they are often
rushed. As a
result,
defect potentials for
creeping requirements are
about 10 percent
greater
than for the original
requirements. This is true
for toxic require-
ments
and design errors. Code
bugs may or may not
increase, based
upon
the schedule pressure
applied to the software
engineering team.
Creeping
requirements also tend to
bypass formal inspections
and
also
have fewer test cases
created for them. As a
result, defect
removal
efficiency
is lower against both toxic
requirements and also
design
errors
by at least 5 percent. This
seems to be true for code
errors as well,
with
the exception that
applications coded in C or Java that
use static
analysis
tools will still achieve
high levels of defect
removal efficiency
against
code bugs.
The
combined results of higher
defect potentials and lower
levels
of
defect removal for creeping
requirements result in a much
greater
percentage
of delivered defects stemming
from changed
requirements
than
any other source of error.
This has been a chronic
problem for the
software
industry.
The
bottom line is that creeping
requirements combined with
below
optimum
levels of defect prevention
and defect removal are a
primary
cause
of cancelled projects, schedule
delays, and cost
overruns.
As
will be discussed later in the
sections on defect prevention
and
defect
removal, there are
technologies available for
minimizing the harm
from
creeping requirements. However,
these effective methods,
such as
formal
requirements and design
inspections, are not widely
used.
Measuring
Software Quality
In
spite of the fact that
defect removal efficiency is a
critical topic for
suc-
cessful
software projects, measuring
defect removal efficiency or
software
quality
in general are seldom done.
From visiting over 300
companies
in
the United States, Europe,
and Asia, the author
found the following
distribution
of the frequency of various
kinds of quality
measures:
No
quality measures at all
44%
Measuring
only customer-reported
defects
30%
Measuring
test and customer-reported
defects
18%
Measuring
inspection, static analysis,
test, and customer-reported
defects
7%
Using
volunteers for measuring
personal defect
removal
1%
Overall
Distribution
100%
586
Chapter
Nine
The
mathematics of measuring defect
removal efficiency is not
com-
plicated.
Twelve steps in the sequence of
data collection and
calculations
are
needed to quantify defect
removal efficiency
levels:
1.
Accumulate data on every
defect that occurs, starting
with
requirements.
2.
Assign severity levels to
each reported defect as it is
fixed.
3.
Measure how many defects
are removed by every defect
removal
activity.
4.
Use root-cause analysis to
identify origins of high-severity
defects.
5.
Measure invalid defects,
duplicates, and false
positives, too.
6.
After the software is
released, measure customer-reported
defects.
7.
Record hours worked for
defect prevention, removal,
and repairs.
8.
Select a fixed point such as
90 days after release for
the calculations.
9.
Use volunteers to record
private defect removal such
as desk
checking.
10.
Calculate cumulative defect
removal efficiency for the
entire series.
11.
Calculate the defect removal
efficiency for each step in
the series.
12.
Use the data to improve
both defect prevention and
defect removal.
The
effort and costs required to
measure defect removal
efficiency
levels
are trivial compared with
the value of such
information. The
total
effort
required to measure each
defect and its associated
repair work
amounts
to only about an hour. Of
this time, probably half is
expended
on
customer-reported defects, and
the other half is expended
on internal
defect
reports.
However,
step 4, root-cause analysis,
can take several
additional
hours
based on how well
requirements and design are
handled by the
development
team.
The
value of measuring defect
removal efficiency encompasses
the
following
benefits:
Finding
and fixing bugs is the
most expensive activity in
all of software,
■
so
reducing these costs yields
a very large return on
investment.
Excessive
numbers of bugs constitute
the main reason for
schedule
■
slippage,
so reducing defects in all
deliverables will shorten
develop-
ment
schedules.
Delivered
defects are the major
cost driver of software
maintenance
■
for
the first two years
after release, so improving
removal efficiency
lowers
maintenance costs.
Software
Quality: The Key to
Successful Software
Engineering
587
Customer
satisfaction correlates inversely to
numbers of delivered
■
defects,
so reducing delivered defects will
result in happier
customers.
Team
morale correlates with both
effective defect prevention
and
■
effective
defect removal.
Later
in the section on the
economics of quality, these
benefits will
be
quantified to show the
overall value of defect
prevention and defect
removal.
Many
companies and government
organizations track
software
defects
found during static
analysis, testing, and also
defects reported
by
customers. In fact, a number of
commercial software defect
tracking
tools
are available.
These
tools normally track defect
symptoms, applications
containing
defects,
hardware and software
platforms, and other kinds
of indicative
data
such as release number,
built number, and so
on.
However,
more sophisticated organizations
also utilize formal
inspec-
tions
of requirements, design, and
other materials. Such
companies
often
utilize static analysis in
addition to testing and
therefore measure
a
wider range of defects than
just those found in source
code by ordinary
testing.
Some
additional information is needed in order
to use expanded
defect
data
for root-cause analysis and
other forms of defect
prevention. These
additional
topics include
It
is important to record information on
the point
Defect
discovery point
at
which any specific defect is
found. Since requirements
defects cannot
normally
be found via testing, it is
important to try and identify
noncode
defect
discovery points.
Collectively,
noncode defects in requirements
and design are
more
numerous
than coding defects, and
also may be high in severity
levels.
Defect
repair costs for noncode
defects are often higher
than for coding
defects.
Note that there are
more than 17 kinds of
software testing, and
companies
do not use the same
names for various test
stages.
Date
of defect discovery:
________________
Defect
Discovery Point:
Customer
defect report
■
Quality
assurance defect
report
■
Test
stage _________________ defect
report
■
Static
analysis defect
report
■
Code
inspection defect
report
■
Document
inspection defect
report
■
588
Chapter
Nine
Design
inspection defect
report
■
Architecture
inspection defect
report
■
Requirements
inspection defect
report
■
Other
____________________ defect
report
■
Defect
origin point It is
also important to record
information on where
software
defects originate. This
information requires careful
analysis
on
the part of the change
team, so many companies limit
defect origin
research
to high-severity defects such as
Severity 1 and Severity
2.
Date
of defect origination:
____________________
Defect
Origin Point:
Application
name
■
Release
number
■
Build
number
■
Source
code (internal)
■
Source
code (reused from legacy
application)
■
Source
code (reused from commercial
source)
■
Source
code (commercial software
package)
■
Source
code (bad-fix or previous
defect repair)
■
User
manual
■
Design
document
■
Architecture
document
■
Requirement
document
■
Other
_____________________ origination
point
■
Ideally,
the lag time between
defect origins and defect
discovery will
be
less than a month and
hopefully less than a week.
It is very impor-
tant
that defects that originate
within a phase such as the
requirements
or
design phases should also be
discovered and fixed during
the same
phase.
When
there is a long gap between
origins and discovery, such
as not
finding
a design problem until
system test, it is a sign
that software
development
and quality control processes
need to improve.
The
best solution for shortening
the gap between defect
origination
and
defect discovery is that of
formal inspections of
requirements,
design,
and other deliverables. Both
static analysis and code
inspections
are
also valuable for shortening
the intervals between defect
origination
and
defect discovery.
Software
Quality: The Key to
Successful Software
Engineering
589
TABLE
9-9
Best-Case
Defect Discovery
Points
Defect
Origins
Optimal
Defect Discovery
Requirements
Requirements
inspection
Design
Design
inspection
Code
Static
analysis
Bad
fixes
Static
analysis
Documentation
Editing
Test
cases
Test
case inspection
Table
9-9 shows the best-case
scenario for defect
discovery methods
for
various defect
origins.
Inspections
are best at finding subtle
and complex bugs and
problems
that
are difficult to find via
testing because sometimes no test
cases
are
created for them. The
example of the Y2K problem is
illustrative
of
a problem that could be
found via testing so long as
two-digit dates
were
mistakenly believed to be acceptable.
Code inspections are
useful
for
finding subtle problems such
as security vulnerabilities that
may
escape
both testing and even
static analysis.
Static
analysis is best at finding
common coding errors such
as
branches
to incorrect locations, overflow
conditions, poor error
handling,
and
the like. Static analysis
prior to testing or as an adjunct to
testing
will
lower testing costs.
Testing
is best at finding problems
that only show up when
the code is
operating,
such as performance problems,
usability problems,
interface
problems,
and other issues such as
mathematical errors or format
errors
for
screens and reports.
Given
the diverse nature of
software bugs and defects,
it is obvious
that
all three defect removal
methods are important for
success: inspec-
tions,
static analysis, and
testing.
Table
9-10 illustrates the fact
that long delays between
defect origins
and
defect discovery lead to
very troubling situations.
Long gaps also
raise
bad-fix injections, accidentally
including new defects in
attempts
to
repair older defects.
TABLE
9-10
Worst-Case
Defect Discovery
Points
Defect
Origins
Latest
Defect Discovery
Requirements
Deployment
Design
System
testing
Code
New
function testing
Bad
fixes
Regression
testing
Documentation
Deployment
Test
cases
Not
discovered
590
Chapter
Nine
In
the worst-case scenario,
requirements defects are not
found until
deployment,
while design defects are
not found until system
test, when
it
is difficult to fix them without
extending the overall
schedule for
the
project. Note that in the
worst-case scenario, bugs or
errors in
test
cases themselves are never
discovered, so they fester on
for many
releases.
Defect
prevention and early defect
removal are far more
cost-effective
than
depending on testing
alone.
Other
quality measures include
some or all of the
following:
Since
it is possible to predict defect
potentials
Earned
quality value
(EQV)
and
also to predict defect
removal efficiency levels,
some companies
such
as IBM have used a form of
"earned value" where
predictions of
defects
that would probably be found
via inspections, static
analysis,
and
testing are compared with
actual defect discovery
rates. Predicted
and
actual defect removal costs
are also compared.
If
fewer defects are found
than predicted, then
root-cause analysis
can
be
applied to discover if quality is
really better than planned
or if defect
removal
is lax. (Usually quality is
better when this
happens.)
If
more defects are found
than predicted, then
root-cause analysis
can
be applied to discover if quality is
worse than planned or if
defect
removal
is more effective than
anticipated. (Usually, quality is
worse
when
this happens.)
Cost
of quality (COQ) Collectively,
the costs of finding and
fixing bugs are
the
most expensive known
activity in the history of
software. Therefore,
it
is important to gather effort
and cost data in such a
fashion that cost
of
quality (COQ) calculations
can be performed.
However,
for software, normal COQ
calculations need to be
tailored
to
match the specifics of
software engineering. Usually,
data is recorded
in
terms of hours and then
converted into costs by
applying salaries,
burden
rates, and other cost
items.
Defect
discovery activity:
___________________
■
Defect
prevention activities:
___________________
■
Defect
effort reported by
users
■
Defect
damages reported by
users
■
Preparation
hours for inspections
■
Preparation
hours for static
analysis
■
Preparation
hours for testing
■
Defect
discovery hours
■
Defect
reporting hours
■
Software
Quality: The Key to
Successful Software
Engineering
591
Defect
analysis hours
■
Defect
repair hours
■
Defect
inspection hours
■
Defect
static analysis hours
■
Test
stages used for
defect
■
Test
cases created for
defect
■
Defect
test hours
■
The
software industry has long
used the "cost per
defect" metric
without
actually analyzing how this
metric works. Indeed,
hundreds
of
articles and books parrot
similar phrases such as "it
costs 100 times
as
much to fix a bug after
release than during coding"
or some minor
variation
on this phrase. The gist of
these dogmatic statements is
that
the
cost per defect rises
steadily as the later
defects are found.
What
few people realize is that
cost per defect is always
cheapest
where
the most bugs are
found and is most expensive
where the fewest
bugs
are found. In fact, as
normally calculated, this
metric violates stan-
dard
economic assumptions because it ignores
fixed costs. The cost
per
defect
metric actually penalizes
quality and achieves the
lowest results
for
the buggiest
applications!
Following
is an analysis of why cost per
defect penalizes quality
and
achieves
its best results for
the buggiest applications.
The same math-
ematical
analysis also shows why
defects seem to be cheaper if
found
early
rather than found
later.
Furthermore,
when zero-defect applications
are reached, there
are
still substantial appraisal
and testing activities that
need to be
accounted
for. Obviously, the cost
per defect metric is useless
for zero-
defect
applications.
Because
of the way cost per
defect is normally measured, as
quality
improves,
cost per defect steadily
increases until zero-defect
software is
achieved,
at which point the metric
cannot be used at
all.
As
with the errors in KLOC
metrics, the main source of
error is that
of
ignoring fixed costs. Three
examples will illustrate how
cost per defect
behaves
as quality improves.
In
all three cases, A, B, and
C, we can assume that test
personnel work
40
hours per week and
are compensated at a rate of
$2500 per week or
$62.50
per hour. Assume that
all three software features
that are being
tested
are 100 function
points.
Case
A: Poor Quality
Assume
that a tester spent 15 hours
writing test cases, 10 hours
run-
ning
them, and 15 hours fixing 10
bugs. The total hours
spent was 40,
592
Chapter
Nine
and
the total cost was
$2500. Since 10 bugs were
found, the cost
per
defect
was $250. The cost
per function point for
the week of testing
would
be $25.00.
Case
B: Good Quality
In
this second case, assume
that a tester spent 15 hours
writing test
cases,
10 hours running them, and 5
hours fixing one bug, which
was
the
only bug discovered.
However, since no other
assignments were
waiting
and the tester worked a
full week, 40 hours were
charged to
the
project.
The
total cost for the
week was still $2500, so
the cost per defect
has
jumped
to $2500. If the 10 hours of
slack time are backed
out, leaving
30
hours for actual testing
and bug repairs, the
cost per defect would
be
$1875.
As quality improves, cost
per defect rises
sharply.
Let
us now consider cost per
function point. With the
slack removed,
the
cost per function point
would be $18.75. As can
easily be seen, cost
per
defect goes up as quality
improves, thus violating the
assumptions
of
standard economic
measures.
However,
as can also be seen, testing
cost per function point
declines
as
quality improves. This
matches the assumptions of
standard econom-
ics.
The 10 hours of slack time
illustrate another issue:
when quality
improves,
defects can decline faster
than personnel can be
reassigned.
Case
C: Zero Defects
In
this third case, assume that
a tester spent 15 hours
writing test
cases
and 10 hours running them.
No bugs or defects were
discovered.
Because
no defects were found, the
cost per defect metric
cannot be
used
at all.
But
25 hours of actual effort
were expended writing and
running test
cases.
If the tester had no other
assignments, he or she would
still have
worked
a 40-hour week, and the
costs would have been
$2500. If the 15
hours
of slack time are backed
out, leaving 25 hours for
actual testing,
the
costs would have been
$1562.
With
slack time removed, the
cost per function point
would be $15.63.
As
can be seen again, testing
cost per function point
declines as quality
improves.
Here, too, the decline in
cost per function point
matches the
assumptions
of standard economics.
Time
and motion studies of defect
repairs do not support the
aphorism
that
it costs 100 times as much
to fix a bug after release as
before. Bugs
typically
require between 15 minutes
and 4 hours to
repair.
Some
bugs are expensive; these
are called abeyant
defects by
IBM.
Abeyant
defects are customer-reported
defects that the repair
center
cannot
re-create, due to some
special combination of hardware
and
Software
Quality: The Key to
Successful Software
Engineering
593
software
at the client site. Abeyant
defects constitute less than
5 percent
of
customer-reported defects.
Because
of the fixed or inelastic
costs associated with defect
removal
operations,
cost per defect always
increases as numbers of
defects
decline.
Because more defects are
found at the beginning of a
testing
cycle
than after release, this
explains why cost per defect
always goes
up
later in the cycle. It is because
the costs of writing test
cases, running
them,
and having maintenance
personnel available act as
fixed costs.
In
any manufacturing cycle with a
high percentage of fixed
costs, the
cost
per unit will go up as the
number of units goes down.
This basic fact
of
manufacturing economics is why cost
per defect metrics are
hazard-
ous
and invalid for economic
analysis of software
applications.
What
would be more effective is to
record the hours spent
for all forms
of
defect removal activity.
Once the hours are
recorded, the data
could
be
converted into cost data,
and also normalized by
converting hours
into
standard units such as hours
per function point.
Table
9-11 shows a sample of the
kinds of data that are
useful in
assessing
cost of quality and also
doing economic studies and
effective-
ness
studies.
Of
course, knowing defect
removal hours implies that
data is also
collected
on defect volumes and
severity levels. Table 9-12
shows the
same
set of activities as Table
9-11, but switches from
hours to defects.
Both
Tables 9-11 and 9-12
could also be combined into
a single large
spreadsheet.
However, defect counts and
defect effort
accumulation
tend
to come from different
sources and may not be
simultaneously
available.
Defect
effort and discovered defect
counts are important data
ele-
ments
for long-range quality
improvements. In fact, without
such data,
quality
improvement is likely to be minimal or
not even occur at
all.
Failure
to record defect volumes and
repair effort is a chronic
weak-
ness
of the software engineering
domain. However, several
software
development
methods such as Team
Software Process (TSP) and
the
Rational
Unified Process (RUP) do
include careful defect
measures.
The
Agile method, on the other
hand, is neither strong nor
consistent
on
software quality
measures.
For
software engineering to become a
true engineering discipline
and
not
just a craft as it is in 2009,
defect measurements, defect
prediction,
defect
prevention, and defect
removal need to become a
major focus of
software
engineering.
Measuring
Defect Removal
Efficiency
One
of the most effective
metrics for demonstrating
and improving soft-
ware
quality is that of defect
removal efficiency. This
metric is simple in
concept
but somewhat tricky to
apply. The basic idea of
this metric is to
594
Chapter
Nine
TABLE
9-11
Software
Defect Removal Effort
Accumulation
Defect
Removal Effort (Hours
Worked)
Preparation
Execution
Repair
TOTAL
Removal
Stage
Hours
Hours
Hours
HOURS
Inspections:
Requirements
Architecture
Design
Source
code
Documents
Static
analysis
Test
stages:
Unit
New
function
Regression
Performance
Usability
Security
System
Independent
Beta
Acceptance
Supply
chain
Maintenance:
Customers
Internal
SQA
calculate
the percentage of software
defects found by means of
defect
removal
operations such as inspections,
static analysis, and
testing.
What
makes the calculations for
defect removal efficiency
tricky is
that
it includes noncode defects
found in requirements, design,
and
other
paper deliverables, as well as
coding defects.
Table
9-13 illustrates an example of
defect removal efficiency
levels
for
a full suite of removal
operations starting with requirements
inspec-
tions
and ending with Acceptance
testing.
Table
9-13 makes a number of
simplifying assumptions. One of
these
is
the assumption that all
delivered defects will be found by
customers
in
the first 90 days of usage.
In real life, of course,
many latent defects
in
delivered
software will stay hidden
for months or even years.
However,
after
90 days, new releases will
usually occur, and they
make it difficult
to
measure defects for prior
releases.
Software
Quality: The Key to
Successful Software
Engineering
595
TABLE
9-12
Software
Defect Severity Level
Accumulation
Defect
Severity Levels
Severity
1
Severity
2
Severity
3
Severity
4
TOTAL
Removal
Stage
(Critical)
(Serious)
(Minor)
(Cosmetic)
DEFECTS
Inspections:
Requirements
Architecture
Design
Source
code
Documents
Static
Analysis
Test
stages:
Unit
New
function
Regression
Performance
Usability
Security
System
Independent
Beta
Acceptance
Supply
chain
Maintenance:
Customers
Internal
SQA
It
is interesting to see what
kind of defect removal
efficiency levels
occur
with less sophisticated series of
defect removal steps that do
not
include
either formal inspections or
static analysis.
Since
noncode defects that originate in
requirements and design
even-
tually
find their way into
the code, the overall
removal efficiency
levels
of
testing by itself without
any precursor inspections or
static analysis
are
seriously degraded, as shown in
Table 9-14.
When
comparing Tables 9-13 and
9-14, it can easily be seen
that a
full
suite of defect removal
activities is more efficient
and effective than
testing
alone in finding and
removing software defects
that originate
outside
of the source code. In fact,
inspections and static
analysis are
also
very efficient in finding
coding defects and have
the additional
property
of raising testing efficiency
and lowering testing
costs.
596
Chapter
Nine
TABLE
9-13
Software
Defect Removal Efficiency
Levels
(Assumes
inspections, static analysis,
and normal testing)
Application
size
(function
points)
1,000
Language
C
Code
size
125,000
Noncode
defects
3,000
Code
defects
2,000
TOTAL
DEFECTS
5,000
Defect
Removal Efficiency by Removal
Stage
Noncode
Code
Total
Removal
Removal
Stage
Defects
Defects
Defects
Efficiency
Inspections:
Requirements
750
0
750
Architecture
200
0
200
Design
1,250
0
1,250
Source
code
100
800
900
Documents
250
0
250
Subtotal
2,550
800
3,350
67.00%
Static
Analysis
0
800
800
66.67%
Test
stages:
Unit
0
50
50
New
function
50
100
150
Regression
0
25
25
Performance
0
10
10
Usability
50
0
50
Security
0
20
20
System
25
50
75
Independent
0
5
5
Beta
25
15
40
Acceptance
25
15
40
Supply
chain
25
10
35
Subtotal
200
300
500
58.82%
Prerelease
Defects
2,750
1,900
4,650
93.00%
Maintenance:
Customers
(90 days)
250
100
350
100.00%
TOTAL
3,000
2,000
5,000
Removal
Efficiency
91.67%
95.00%
93.00%
Software
Quality: The Key to
Successful Software
Engineering
597
TABLE
9-14
Software
Defect Removal Efficiency
Levels
(Assumes
normal testing without
inspections or static
analysis)
Application
size
1000
(function
points)
Language
C
Code
size
125,000
Noncode
defects
3,000
Code
defects
2,000
TOTAL
DEFECTS
5,000
Defect
Removal Efficiency by Removal
Stage
Noncode
Code
Total
Removal
Removal
Stage
Defects
Defects
Defects
Efficiency
Inspections:
Requirements
0
0
0
Architecture
0
0
0
Design
0
0
0
Source
code
0
0
0
Documents
0
0
0
Subtotal
0
0
0
0.00%
Static
Analysis
0
0
0
0.00%
Test
stages:
Unit
200
350
550
New
function
450
600
1,050
Regression
0
100
100
Performance
0
50
50
Usability
200
75
275
Security
0
50
50
System
300
200
500
Independent
50
10
60
Beta
150
25
175
Acceptance
175
20
195
Supply
chain
75
20
95
Subtotal
1,600
1,500
3,100
62.00%
Prerelease
Defects
1,600
1,500
3,100
62.00%
Maintenance:
Customers
(90 days)
1,400
500
1,900
100.00%
TOTAL
3,000
2,000
5,000
Removal
Efficiency
53.33%
75.00%
62.00%
598
Chapter
Nine
Without
pretest inspections and
static analysis, testing will
find hun-
dreds
of bugs, but the overall
defect removal efficiency of
the full suite of
test
activities will be lower than if
inspections and static
analysis were
part
of the suite of removal
activities.
In
addition to elevating defect
removal efficiency levels,
adding formal
inspections
and static analysis to the
suite of defect removal
opera-
tions
also lowers development and
maintenance costs.
Development
schedules
are also shortened, because
traditional lengthy test
cycles are
usually
the dominant part of
software development schedules.
Indeed,
poor
quality tends to stretch out
test schedules by significant
amounts
because
the software does not
work well enough to be
released.
Table
9-15 shows a side-by-side
comparison of cost structures
for the
two
examples discussed in this
section. Case X is derived
from Table
9-13
and uses a sophisticated
combination of formal inspections,
static
analysis,
and normal testing.
Case
Y is derived from Table 9-14
and uses only normal
testing, with-
out
any inspections or static
analysis being
performed.
The
costs in Table 9-15 assume a
fully burdened compensation
struc-
ture
of $10,000 per month. The
defect-removal costs assume
prepara-
tion,
execution, and defect
repairs for all defects
found and identified.
In
addition to the cost
advantages, excellence in quality
control also
correlates
with customer satisfaction and with
reliability. Reliability
and
customer satisfaction both
correlate inversely with levels of
deliv-
ered
defects.
The
more defects there are at
delivery, the more unhappy
custom-
ers
are. In addition, mean time
to failure (MTTF) goes up as
delivered
defects
go down. The reliability
correlation is based on
high-severity
defects
in the Severity 1 and
Severity 2 classes.
Table
9-16 shows the approximate
relationship between
delivered
defects,
reliability in terms of mean
time to failure (MTTF)
hours, and
customer
satisfaction with software
applications.
Table
9-16 uses integer values, so
interpolation between these
dis-
crete
values would be necessary.
Also, the reliability levels
are only
approximate.
Table 9-13 deals only with
the C programming
language,
so
adjustments in defects per
function point would be
needed for the
700
other languages that exist.
Additional research is needed on
the
topics
of reliability and customer
satisfaction and their
correlations
with
delivered defect
levels.
However,
not only do excessive levels
of delivered defects
generate
negative
scores on customer satisfaction surveys,
but they also show
up
in
many lawsuits against
outsource contractors and
commercial soft-
ware
developers. In fact, one lawsuit
was even filed by
shareholders of
a
major software corporation
who claimed that excessive
defect levels
were
lowering the value of their
stock.
Software
Quality: The Key to
Successful Software
Engineering
599
TABLE
9-15
Comparison
of Software Defect Removal Efficiency
Costs
(Case
X = inspections, static analysis,
normal testing)
(Case
Y = normal testing
only)
Application
size
1,000
(function
points)
Language
C
Code
size
125,000
Noncode
defects
3,000
Code
defects
2,000
TOTAL
DEFECTS
5,000
Defect
Removal Costs by
Activity
Case
X
Case
Y
Removal
Stage
Removal
$
Removal
$
Difference
Inspections:
Requirements
Architecture
Design
Source
code
Documents
Subtotal
$168,750
$0
$168,750
Static
Analysis
$81,250
$0
$81,250
Test
stages:
Unit
New
function
Regression
Performance
Usability
Security
System
Independent
Beta
Acceptance
Supply
chain
Subtotal
$150,000
$775,000
$625,000
Prerelease
Defects
$400,000
$775,000
$375,000
Maintenance:
Customers
(90 days)
$175,000
$950,000
$775,000
TOTAL
COSTS
$575,000
$1,725,000
$1,150,000
Cost
per Defect
$115.00
$345.00
$230.00
Cost
per Function Pt.
$575.00
$1,725.00
$1,150.00
Cost
per LOC
$4.60
$13.80
$9.20
ROI
from inspections,
$3.00
static
analysis
Development
Schedule
12.00
16.00
4.00
(Calendar
months)
600
Chapter
Nine
TABLE
9-16
Delivered
Defects, Reliability, Customer
Satisfaction
(Note
1: Assumes the C programming
language)
(Note
2: Assumes 125 LOC per
function point)
(Note
3: Assumes severity 1 and 2
delivered defects)
Delivered
Defects
Defects
per
Mean
Time
Customer
per
KLOC
Function
Point
to
Failure (MTTF hours)
Satisfaction
0.00
0.00
Infinite
Excellent
1.00
0.13
303
Very
good
2.00
0.25
223
Good
3.00
0.38
157
Fair
4.00
0.50
105
Poor
5.00
0.63
66
Very
poor
6.00
0.75
37
Very
poor
7.00
0.88
17
Very
poor
8.00
1.00
6
Litigation
9.00
1.13
1
Litigation
10.00
1.25
0
Malpractice
Better
quality control is the key
to successful software
engineering.
Software
quality needs to be definable,
predictable, measurable,
and
improvable
in order for software
engineering to become a true
engineer-
ing
discipline.
Defect
Prevention
The
phrase "defect prevention"
refers to methods and
techniques that
lower
the odds of certain kinds of
defects occurring at all.
The liter-
ature
of defect prevention is very
sparse, and academic
research is
even
sparser. The reason for
this is that studying defect
prevention is
extremely
difficult and also somewhat
ambiguous at best.
Defect
prevention is analogous to vaccination
against serious
illness
such
as pneumonia or flu. There is
statistical evidence that
vaccination
will
lower the odds of patients
contracting the diseases for
which they
are
vaccinated. However, there is no
proof that any specific
patient
would
catch the disease whether
receiving a vaccination or not.
Also,
a
few patients who are
vaccinated might contract
the disease anyway,
because
vaccines are not 100
percent effective. In addition,
some vac-
cines
may have serious and
unexpected side-effects.
All
of these issues can occur
with software defect prevention,
too.
While
there is statistical evidence
that certain methods such as
pro-
totypes,
joint application design
(JAD), quality function
deployment
(QFD),
and participation in inspections
prevent certain kinds of
defects
Software
Quality: The Key to
Successful Software
Engineering
601
from
occurring, it is hard to prove
that those defects would
definitely
occur
in the absence of the preventive
methodologies.
The
way defect prevention is
studied experimentally is to have
two
versions
of similar or identical applications
developed, with one version
using
a particular defect prevention
method while the other
version
did
not. Obviously, experimental
studies such as this must be
small in
scale.
The
easiest experiments in defect
prevention are those dealing
with
formal
inspections of requirements, design,
and code. Because
inspec-
tions
record all defects,
companies that utilize
formal inspections soon
accumulate
enough data to analyze both
defect prevention and
defect
removal.
Formal
inspections are so effective in
terms of defect prevention
that
they
reduce defect potentials by
more than 25 percent per
year. In fact,
one
issue with inspections is that
after about three years of
continuous
usage,
so few defects occur that
inspections become
boring.
The
more common method for
studying defect prevention is to
exam-
ine
the results of large samples
of applications and note
differences in
the
defect potentials among
them. In other words, if 100
applications
that
used prototypes are compared
with 100 similar applications
that
did
not use prototypes, are
requirements defects lower
for the prototype
sample?
Are creeping requirements
lower for the prototype
sample?
This
kind of study can only be
carried out internally by
rather sophis-
ticated
companies that have very
sophisticated defect and
quality mea-
surement
programs; that is, companies
such as IBM, AT&T, Microsoft,
Raytheon,
Lockheed, and the like.
(Consultants who work for a
number
of
companies in the same
industry can often observe
the effects of defect
prevention
by noting similar applications in
different companies.)
However,
the results of such
large-scale statistical studies
are some-
times
published from benchmark
collections by organizations
such
as
the International Software
Benchmarking Standards Group
(ISBSG),
the
David Consulting Group,
Software Productivity Research
(SPR),
and
the Quality and Productivity
Management Group
(QPMG).
In
addition, consultants such as
the author who work as
expert wit-
nesses
in software litigation may
have access to data that is
not oth-
erwise
available. This data shows
the negative effects of
failing to use
defect
prevention on projects that
ended up in court.
Table
9-17 illustrates a large
sample of 30 methods and
techniques
that
have been observed to
prevent software defects
from occurring.
Although
the table shows specific
percentages of defect prevention
effi-
ciency,
the actual data is too
sparse to support the
results. The percent-
ages
are only approximate and
merely serve to show the
general order
of
effectiveness.
602
Chapter
Nine
TABLE
9-17
Methods
and Techniques that Prevent
Defects
Activities
Observed to Prevent Software
Defects
Defect
Prevention Efficiency
1.
Reuse
(certified sources)
80.00%
2.
Inspection
participation
60.00%
3.
Prototyping-functional
55.00%
4.
PSP/TSP
53.00%
5.
Six
Sigma for software
53.00%
6.
Risk
analysis (automated)
50.00%
7.
Joint
application design
(JAD)
45.00%
8.
Test-driven
development (TDD)
45.00%
9.
Defect
origin measurements
44.00%
10.
Root
cause analysis
43.00%
11.
Quality
function deployment
(QFD)
40.00%
12.
CMM
5
37.00%
13.
Agile
embedded users
35.00%
14.
Risk
analysis (manual)
32.00%
15.
CMM
4
27.00%
16.
Poka-yoke
23.00%
17.
CMM
3
23.00%
18.
Scrum
sessions (daily)
20.00%
19.
Code
complexity analysis
19.00%
20.
Use-cases
18.00%
21.
Reuse
(uncertified sources)
17.00%
22.
Security
plans
15.00%
23.
Rational
Unified Process (RUP)
15.00%
24.
Six
Sigma (generic)
12.50%
25.
Clean-room
development
12.50%
26.
Software
Quality Assurance
(SQA)
12.50%
27.
CMM
2
12.00%
28.
Total
Quality Management
(TQM)
10.00%
29.
No
use of CMM
0.00%
30.
CMM
1
5.00%
Average
30.12%
Note
that because defect prevention
deals with reducing defect
poten-
tials,
percentages show negative
values for methods that
lower defects.
Positive
values indicate methods that
raise defect
potentials.
The
two top-ranked items deserve
comment. The phrase "reuse
from
certified
sources" implies formal
reusability where specifications,
source
code,
test cases, and the
like have gone through
rigorous inspection
and
test
stages, and have proven
themselves to be reliable in field
trials.
Certified
reusable components may
approach zero defects, and
in any
Software
Quality: The Key to
Successful Software
Engineering
603
case
contain very few defects.
Reuse of uncertified material is
somewhat
hazardous
by comparison.
The
second method, or participation in formal
inspections, has more
than
40 years of empirical data.
Inspections of requirements,
design,
and
other deliverables are very
effective and efficient in
terms of defect
removal
efficiency. But in addition, participants
in formal inspections
become
aware of defect patterns and
categories, and
spontaneously
avoid
them in their own
work.
One
emerging form of risk
analysis is so new that it
lacks empirical
data.
This new method consists of
performing very early sizing
and risk
analysis
prior to starting a software
application or spending any
money
on
development.
If
the risks for the
project are significantly
higher than its value,
not
doing
it at all will obviously prevent
100 percent of potential
defects. The
Victorian
state government in Australia
has started such a
program,
and
by eliminating hazardous software
applications before they
start,
they
have saved millions of
dollars.
New
sizing methods based on
pattern matching can shift
the point
at
which risk analysis can be
performed about six months
earlier than
previously
possible. This new approach
is promising and needs
addi-
tional
study.
There
are other things that
also have some impact in
terms of defect
prevention.
One of these is certification of
personnel either for
testing
or
for software quality
assurance knowledge. Certification
also has an
effect
on defect removal. The
defect prevention effects
are shown using
negative
percentages, while the
defect removal effects are
shown with
positive
percentages.
Here
too the data is only
approximate, and the
specific percentages
are
derived from very sparse
sources and should not be
depended upon.
Table
9-18 is sorted in terms of
defect prevention.
The
data in Table 9-18 should
not be viewed as accurate,
but only
approximate.
A great deal more research
is needed on the
effectiveness
of
various kinds of certification.
Also, the software industry
circa 2009
has
overlapping and redundant
forms of certification. There
are mul-
tiple
testing and quality
associations that offer
certification, but
these
separate
groups certify using
different methods and are
not coordinated.
In
the absence of a single association or
certification body, these
various
nonprofit
and for-profit test and
quality assurance associations
offer
rival
certificates that use very
different criteria.
Yet
another set of factors that
has an effect in terms of
defect pre-
vention
are various kinds of metrics
and measurements, as
discussed
earlier
in this book.
For
metrics and measurements to
have an effect, they need to
be
capable
of demonstrating quality levels
and measuring changes
against
604
Chapter
Nine
TABLE
9-18
Influence
of Certification on Defect Prevention and
Removal
Defect
Defect
Removal
Prevention
Benefit
Benefit
Certificate
31.
Six
Sigma black belt
12.50%
10.00%
32.
International
Software Testing Quality
Board (ISTQB)
12.00%
10.00%
33.
Certified
Software Quality Engineer
(CSQE)-ASQ
10.00%
10.00%
34.
Certified.
Software Quality Analyst
(CSQA)
10.00%
10.00%
35.
Certified
Software Test Manager
(CSTM)
7.00%
7.00%
36.
Six
Sigma green belt
6.00%
5.00%
37.
Microsoft
certification (testing)
6.00%
6.00%
38.
Certified
Software Test Professional
(CSTP)
5.00%
12.00%
39.
Certified
Software Tester
(CSTE)
5.00%
12.00%
40.
Certified
Software Project Manager
(CSPM)
3.00%
3.00%
Average
7.65%
8.50%
quality
baselines. Therefore, many of
the -ility
measures
and metrics
cannot
even be included because they
are not measurable.
Table
9-19 shows the approximate
impacts of various
measurements
and
metrics on defect prevention
and defect removal. IFPUG
function
points
are top-ranked because they
can be used to quantify
defects in
TABLE
9-19
Software
Metrics, Measures, and Defect Prevention
and Removal
Defect
Prevention
Defect
Removal
Metric
Benefit
Benefit
41.
IFPUG
function points
30.00%
15.00%
42.
Six
Sigma
25.00%
20.00%
43.
Cost
of quality (COQ)
22.00%
15.00%
44.
Root
cause analysis
20.00%
12.00%
45.
TSP/PSP
20.00%
18.00%
46.
Monthly
rate of requirements
change
17.00%
5.00%
47.
Goal-question
metrics
15.00%
10.00%
48.
Defect
removal efficiency
12.00%
35.00%
49.
Use-case
points
12.00%
5.00%
50.
COSMIC
function points
10.00%
10.00%
51.
Cyclomatic
complexity
10.00%
7.00%
52.
Test
coverage percent
10.00%
22.00%
53.
Percent
of requirements missed
7.00%
3.00%
54.
Story
points
5.00%
5.00%
55.
Cost
per defect
10.00%
15.00%
56.
Lines
of code (LOC)
15.00%
12.00%
Average
11.25%
9.06%
Software
Quality: The Key to
Successful Software
Engineering
605
requirements
and design as well as code.
IFPUG function points
can
also
be used to measure software
defect removal costs and
quality eco-
nomics.
Note
that the bottom two
metrics, cost per defect
and lines of code,
are
shown
as harmful metrics rather
than beneficial because they
violate
the
assumptions of standard
economics.
Note
that the two bottom-ranked
measurements from Table 9-16
have
a
negative impact; that is,
they make quality worse
rather than better.
As
commonly used in the
software literature, both
cost per defect
and
lines
of code are close to being
professional malpractice, because
they
violate
the canons of standard
economics and distort
results.
The
lines of code metric
penalizes high-level languages
and makes
both
the quality and productivity
of low-level languages look
better
than
it really is. In addition,
this metric cannot even be
used to measure
requirements
and design defects or any
other form of noncode
defect.
The
cost per defect metric
penalizes quality and makes
buggy applica-
tions
look better than
applications with few defects.
This metric cannot
even
be used for zero-defect
applications. A nominal quality
metric that
penalizes
quality and can't even be
used to show the highest
level of
quality
is a good candidate for
being professional
malpractice.
The
final aspect of defect
prevention discussed in this
chapter is that
of
the effectiveness of various
international standards.
Unfortunately,
the
effectiveness of international standards
has very little
empirical
data
available.
There
are no known controlled
studies that demonstrate if
adherence
to
standards improves quality.
There is some anecdotal
evidence that at
least
some standards, such as ISO
9001-9004, degrade quality
because
some
companies that did not
use these standards had
higher quality on
similar
applications than companies
that had been certified.
Table 9-20
shows
approximate results, but the
table has two flaws. It
only shows a
small
sample of standards, and the
rankings are based on very
sparse
and
imperfect information.
In
fields outside of software
such as medical practice,
standards are
normally
validated by field trials,
controlled studies, and
extensive anal-
ysis.
For software, standards are
not validated and are
based on the
subjective
views of the standards
committees. While some of
these com-
mittees
are staffed by noted experts
and the standards may be
useful,
the
lack of validation and field
trials prior to publication is a
sign that
software
engineering needs additional
evolution before being
classified
as
a full engineering
discipline.
Tables
9-17 through 9-20 illustrate
a total of 65 defect
preven-
tion
methods and practices. These
are not all used at
the same time.
Table
9-18 shows the approximate
usage patterns observed in
several
hundred
U.S. companies (and in about
50 overseas companies).
606
Chapter
Nine
TABLE
9-20
International
Standards, Defect Prevention
and Removal
Defect
Defect
Prevention
Removal
Benefit
Benefit
Standard
or Government Mandate
57.
ISO/IEC
10181 Security
Frameworks
25.00%
25.00%
58.
ISO
17799 Security
15.00%
15.00%
59.
Sarbanes-Oxley
12.00%
6.00%
60.
ISO/IEC
25030 Software Product
Quality Requirements
10.00%
5.00%
61.
ISO/IEC
9126-1 Software Engineering
Product Quality
10.00%
5.00%
62.
IEEE
730-1998 Software Quality
Assurance Plans
8.00%
5.00%
63.
IEEE
1061-1992 Software
Metrics
7.00%
2.00%
64.
ISO
9000-9003 Quality
Management
6.00%
5.00%
65.
ISO
9001:2000 Quality Management
System
4.00%
7.00%
Average
10.78%
8.33%
Table
9-21 is somewhat troubling because
the three top-ranked
meth-
ods
have been demonstrated to be
harmful and make quality
worse
rather
than better. In fact, of the
really beneficial defect
prevention
methods,
only a handful such as
prototyping, measuring test
coverage,
and
joint application design
(JAD) have more than 50
percent usage in
the
United States.
Usage
of many of the most powerful
and effective methods such
as
inspections
or measuring cost of quality
(COQ) have less than 33
per-
cent
usage or penetration. The
data shown in Table 9-18 is
not precise,
since
much larger samples would be
needed. However, it does
illustrate
a
severe disconnect between
effective methods of defect
prevention and
day-to-day
usage in the United
States.
Part
of the reason for the
dismaying patterns of usage is
because
of
the difficulty of actually
measuring and studying
defect prevention
methods.
Only a few large and
sophisticated corporations are
able to
carry
out studies of defect
prevention. Most universities
cannot study
defect
prevention because they lack
sufficient contacts with
corpora-
tions
and therefore have little
data available.
In
conclusion, defect prevention is
sparsely covered in the
software
literature.
There is very little
empirical data available,
and a great deal
more
research is needed on this
topic.
One
way to improve defect
prevention and defect
removal would be to
create
a nonprofit foundation or association
that studied a wide
range
of
quality topics. Both defect
prevention and defect
removal would be
included.
Following is the hypothetical
structure and functions of a
pro-
posed
nonprofit International Software
Quality Foundation
(ISQF).
Software
Quality: The Key to
Successful Software
Engineering
607
TABLE
9-21
Usage
Patterns of Software Defect Prevention
Methods
Defect
Prevention Method
Percent
of U.S. Projects
1.
Reuse
(uncertified sources)
90.00%
2.
Cost
per defect
75.00%
3.
Lines
of code (LOC)
72.00%
4.
Prototyping-functional
70.00%
5.
Test
coverage percent
67.00%
6.
No
use of CMM
50.00%
7.
Joint
application design
(JAD)
45.00%
8.
Percent
of requirements missed
38.00%
9.
Software
quality assurance
(SQA)
36.00%
10.
Use-cases
33.00%
11.
IFPUG
function points
33.00%
12.
Test-driven
development (TDD)
30.00%
13.
Cost
of quality (COQ)
29.00%
14.
Scrum
sessions (daily)
28.00%
15.
CMM
3
28.00%
16.
Agile
embedded users
27.00%
17.
Six
Sigma
24.00%
18.
Risk
analysis (manual)
22.00%
19.
Rational
Unified Process (RUP)
22.00%
20.
Cyclomatic
complexity
21.00%
21.
CMM
1
20.00%
22.
Monthly
rate of requirements
change
20.00%
23.
Code
complexity analysis
19.00%
24.
ISO
9001:2000 Quality Management
System
19.00%
25.
Microsoft
certification (testing)
18.00%
26.
ISO
9000-9003 Quality
Management
18.00%
27.
Root
cause analysis
17.00%
28.
ISO/IEC
9126-1 Software Engineering
Product
17.00%
Quality
29.
TSP/PSP
16.00%
30.
ISO/IEC
25030 Software Product
Quality
16.00%
Requirements
31.
IEEE
1061-1992 Software
Metrics
16.00%
32.
Defect
origin measurements
15.00%
33.
Root
cause analysis
15.00%
34.
IEEE
730-1998 Software Quality
Assurance Plans
15.00%
35.
PSP/TSP
14.00%
36.
Six
Sigma for software
13.00%
37.
Six
Sigma (generic)
13.00%
38.
Story
points
13.00%
(Continued)
608
Chapter
Nine
TABLE
9-21
Usage
Patterns of Software Defect Prevention Methods
(continued)
Defect
Prevention Method
Percent
of U.S. Projects
39.
Inspection
participation
12.00%
40.
CMM
2
12.00%
41.
Sarbanes-Oxley
12.00%
42.
Six
Sigma green belt
11.00%
43.
ISO/IEC
10181 Security
Frameworks
11.00%
44.
Six
Sigma black belt
10.00%
45.
Defect
removal efficiency
10.00%
46.
Use-case
points
10.00%
47.
ISO
17799 Security
10.00%
48.
Goal-Question
Metrics
9.00%
49.
CMM
4
8.00%
50.
Certified
Software Test Professional
(CSTP)
8.00%
51.
Security
plans
7.00%
52.
Quality
function deployment
(QFD)
6.00%
53.
Total
quality management
(TQM)
6.00%
54.
Certified
Software Project Manager
(CSPM)
6.00%
55.
International
Software Testing Quality
Board
4.00%
(ISTQB)
56.
Certified
Software Quality Analyst
(CSQA)
4.00%
57.
Certified
Software Tester
(CSTE)
4.00%
58.
COSMIC
function points
4.00%
59.
Certified
Software Quality Engineer
(CSQE) ASQ
3.00%
60.
Risk
analysis (automated)
2.00%
61.
Certified
Software Test Manager
(CSTM)
2.00%
62.
Reuse
(certified sources)
1.00%
63.
CMM
5
1.00%
64.
Poka-yoke
0.10%
65.
Clean-room
development
0.10%
Proposal
for a Nonprofit International
Software Quality Foundation
(ISQF)
The
ISQF will be a nonprofit foundation
that is dedicated to
improv-
ing
the quality and economic
value of software applications.
The form
of
incorporation is to be decided by the
initial board of directors.
The
intent
is to incorporate under section
501(c) of the Internal
Revenue
Code
and thereby be a tax-exempt
organization that is authorized
to
receive
donations.
The
fundamental principles of ISQF
are the following:
1.
Poor quality has been
and is damaging the
professional reputation
of
the software
community.
Software
Quality: The Key to
Successful Software
Engineering
609
2.
Poor quality has been
and is causing significant
litigation between
clients
and software development
corporations.
3.
Significant software quality
improvements are technically
possible.
4.
Improved software quality
has substantial economic
benefits in
reducing
software costs and
schedules.
5.
Improved software quality
depends upon accurate
measurement
of
quality in many forms,
including, but not limited
to, measuring
software
defects, software defect
origins, software defect
severity
levels,
methods of defect prevention,
methods of defect
removal,
customer
satisfaction, and software
team morale.
6.
The major cost of software
development and maintenance is
that
of
eliminating defects. ISQF will
mount major studies on
measur-
ing
the economic value of defect
prevention, defect removal,
and
customer
satisfaction.
7.
Measurement and estimation
are synergistic technologies.
ISQF
will
evaluate software quality
and reliability estimation
methods,
and
will publish the results of
their evaluations. No fees from
esti-
mation
tool vendors will be accepted.
The evaluations will be
inde-
pendent
and based on standard
benchmarks and test
cases.
8.
Software defects can
originate in requirements, design,
coding, user
documents,
and also in test plans
and test cases themselves.
In
addition,
there are secondary defects
that are introduced
while
attempting
to repair earlier defects.
ISQF will study all sources
of
software
problems and attempt to
improve all sources of
software
defects
and user
dissatisfaction.
9.
ISQF will sponsor research in
technical topics that may
include, but
are
not be limited to,
inspections, static analysis,
test case design,
test
coverage analysis, test
tools, defect reporting,
defect tracking
tools,
bad-fix injections, error-prone
module removal,
complexity
analysis,
defect prevention, formal
inspections, quality
measure-
ments,
and quality metrics.
10.
The ISQF will also sponsor
research to quantify the
effects of all
social
factors that influence
software quality, including
the effective-
ness
of software quality assurance
organizations (SQA),
separate
test
organizations, separate maintenance
organizations, interna-
tional
standards, and the value of
certification. Methods of
studying
software
customer satisfaction will also be
supported.
11.
The service metrics defined
in the Information
Technology
Infrastructure
Library (ITIL) are all
dependent upon
achieving
satisfactory
levels of quality. ISQF will
incorporate principles
from
the
ITIL library, and will also
sponsor research studies to
show the
610
Chapter
Nine
correlations
between reliability and
availability and quality
levels
in
terms of delivered
defects.
12.
As new technologies appear in
the software industry, it is
impor-
tant
to stay current with their
quality impacts. ISQF will
perform
or
commission studies on the
quality results of a variety of
new
approaches
including but not limited to
Agile development,
cloud
computing,
crystal development, extreme
programming, open-
source
development, and service-oriented
architecture (SOA).
13.
ISQF will provide model
curricula for university
training in soft-
ware
measurement, metrics, defect
prevention, defect removal,
cus-
tomer
support, customer satisfaction,
and the economic value
of
software
quality.
14.
ISQF will provide model
curricula for MBA programs
that deal with
the
economics of software and
the principles of software
manage-
ment.
The economics of quality will be a
major subtopic.
15.
ISQF will provide model
curricula for corporate and
in-house train-
ing
in software measurement, metrics,
defect prevention,
defect
removal,
customer support, customer
satisfaction, and the
economic
value
of software quality.
16.
ISQF will provide recommended
skill profiles for the
occupations of
software
quality assurance (SQA),
software testing, software
cus-
tomer
support, and software
quality measurement.
17.
ISQF will offer examinations
and licensing certificates
for the
occupations
of software quality assurance
(SQA), software
testing,
software
customer support, and
software quality measurement.
Of
these,
software quality measurement
has no current
certification.
18.
ISQF will establish boards of
competence to administer
examina-
tions
and define the state of
the art for software
quality assurance
(SQA),
software testing, and
software quality measurement.
Other
boards
and specialties may be added
at future times.
19.
ISQF will define the
conditions of professional malpractice as
they
apply
to inadequate methods of software
quality control.
Examples
of
such conditions may include
failing to keep adequate
records of
software
defects, failing to utilize sufficient
test stages and test
cases,
and
failing to perform adequate
inspections of critical
materials.
20.
ISQF will cooperate with other
nonprofit organizations that
are
concerned
with similar issues. These
organizations include but
are
not
limited to the Global
Association for Software
Quality (GASQ)
in
Belgium, the World Quality
Conference, the IEEE, the
ISO, ANSI,
IFPUG,
SPIN, and the SEI.
IASQ will also cooperate with
other
organizations
such as universities, the
Information Technology
Software
Quality: The Key to
Successful Software
Engineering
611
Metrics
and Productivity Institute
(ITMPI), the Project
Management
Institute
(PMI), the Quality Assurance
Institute (QAI),
software
testing
societies, and relevant
engineering, benchmarking, and
pro-
fessional
organizations such as the
ISBSG benchmarking
group.
ISQF
will also cooperate with similar
quality organizations
abroad
such
as those in China, Japan,
India, and Russia. This
cooperation
might
include reciprocal memberships if
other organizations
are
willing
to participate in that
fashion.
21.
ISQF will be governed by a board of
five directors, to be elected
by
the
membership. The board of
directors will appoint a
president
or
chief executive officer. The
president will appoint a
treasurer,
secretary,
and such additional officers as
may be required by
the
terms
and place of incorporation.
Initially, the board,
president,
and
officers will serve as volunteers on a
pro bono basis. To
ensure
inclusion
of fresh information, the
term of president will be
two
calendar
years.
22.
Funding for the ISQF will be
a combination of dues,
donations,
grants,
and possible fund-raising
activities such as conferences
and
events.
23.
The ISQF will also have a
technical advisory board of
five members
to
be appointed by the president.
The advisory board will
assist
ISQF
in staying at the leading
edge of research into topics
such as
testing,
inspections, quality metrics,
and also availability and
reli-
ability
and other ITIL
metrics.
24.
The ISQF will use modern
communication methods to expand
the
distribution
of information on quality topics.
These methods will
include
an ISQF web site, webinars,
a possible quality
Wikipedia,
Twitter,
blogs, and online
newsletters.
25.
The ISQF will have several
subcommittees that deal with
topics
such
as membership, grants and
donations, press liaison,
university
liaison,
and liaison with other
nonprofit organizations such as
the
Global
Association of Software Quality in
Belgium.
26.
To raise awareness of the
importance of quality, the
ISQF will
produce
a quarterly journal, with a tentative
name of Software
Quality
Progress. This
will be a refereed journal, with the
referees
all
coming from the ISQF
membership.
27.
To raise awareness of the
importance of quality, the
ISQF will spon-
sor
an annual conference and will
solicit nominations for a
series
of
"outstanding quality awards."
The initial set of awards
will be
organized
by type of software (information
systems, commercial
applications,
military software, outsourced
applications, systems
and
embedded software, web
applications). The awards will be
for
612
Chapter
Nine
lowest
numbers of delivered defects,
highest levels of defect
removal
efficiency,
best customer service, and
highest rankings of
customer
satisfaction.
28.
To raise awareness of the
importance of software quality,
ISQF
members
will be encouraged to write and
review articles and
books
on software quality topics.
Both technical journals such
as
CrossTalk
and
mainstream business journals
such as the Harvard
Business
Review will be
journals of choice.
29.
To raise awareness of the
importance of software quality,
ISQF will
begin
the collection of a major
library of books, journal
articles, and
monographs
on topics and issues
associated with software
quality.
30.
To raise awareness of the
importance of software quality,
ISQF will
sponsor
benchmark studies of software defects,
defect severity
levels,
defect
removal efficiency, test coverage,
inspection efficiency,
inspec-
tion
and test costs, cost of
quality (COQ), and software
litigation where
poor
quality was one of the
principal complaints by the
plaintiffs.
31.
To raise awareness of the
economic consequences of poor
quality,
the
ISQF will sponsor research on
consequential damages,
deaths,
and
property losses associated with
poor software
quality.
32.
To raise awareness of the
economic consequences of poor
quality,
the
ISQF will collect public
information on the results of
software
litigation
where poor quality was
part of the plaintiff's
claims. Such
litigation
includes breach of contract
cases, fraud cases, and
cases
where
poor quality damaged
plaintiff business
operations.
33.
To raise awareness of the
importance of software quality,
ISQF
chapters
will be encouraged at state and
local levels, such as
Rhode
Island
Software Quality Association or a
Boston Software
Quality
Association.
34.
To ensure high standards of
quality education, the ISQF
will review
and
certify specific courses on
software quality matters
offered by
universities
and private corporations as
well. Courses will be
sub-
mitted
for certification on a voluntary
basis. Minimal fees will be
charged
for certification in order to
defray expenses. Fees will
be
based
on time and material charges
and will be levied whether
or
not
a specific course passes certification or
is denied certification.
35.
To ensure that quality
topics are included and
are properly defined
in
contracts and outsource
agreements, the ISQF will
cooperate
with
the American Bar
Association, state bar
associations, the
American
Arbitration Society, and
various law schools on the
legal
status
of software quality and on
contractual issues.
Software
Quality: The Key to
Successful Software
Engineering
613
36.
ISQF members will be asked to
subscribe to a code of ethics
that
will
be fully defined by the ISQF
technical advisory board.
The
code
of ethics will include topics
such as providing full and
honest
information
about quality to all who
ask, avoiding conflicts of
inter-
est,
and basing recommendations
about quality on solid
empirical
information.
37.
Because security and quality
are closely related, the
ISQF will also
include
security attack prevention
and also recovery from
security
attacks
topics as part of the
overall mission. However,
security is
highly
specialized and requires
additional skills outside
the normal
training
of quality assurance and
test personnel.
38.
Because of the serious
global recession, the ISQF
will attempt to
rapidly
disseminate empirical data on
the economic value of
quality.
High
quality for software has
been proven to shorten
development
schedules,
lower development costs, improve
customer support, and
reduce
maintenance costs. But few
managers and executives
have
access
to the data that supports
such claims.
Software
engineering and software
quality need to be more
closely
coupled
than has been the
norm in the past. Better
prediction of quality,
better
measurement of quality, more
widespread usage of effective
defect
prevention
methods and defect removal
methods are all congruent
with
advancing
software engineering to the
status of a true
engineering
discipline.
Software
Defect Removal
Although
both defect prevention and
defect removal are
important, it
is
easier to study and measure
defect removal. This is because
counts
of
defects found by means of
inspections, static analysis,
and testing
provide
a quantitative basis for
calculating defect removal
efficiency
levels.
In
spite of the fact that
defect removal is theoretically
easy to study,
the
literature remains distressingly
sparse. For example, testing
has
an
extensive literature with hundreds of
books, thousands of
journal
articles,
many professional associations,
and numerous
conferences.
Yet
hardly any of the testing
literature contains empirical
data on
the
measured numbers of test cases
created, actual counts of
defects
found
and removed, data on bad-fix
injection rates, or other
tangible
data
points.
Several
important topics have almost
no citations at all in the
test-
ing
literature. For example, a
study done at IBM found more
errors in
test
cases than in the software
that was being tested.
The same study
614
Chapter
Nine
found
about 35 percent duplicate or
redundant test cases. Yet
neither
test
case errors nor redundant
test cases are discussed in
the software
testing
literature.
Another
gap in the literature of
both testing and other
forms of defect
removal
concerns bad-fix injections.
About 7 percent of attempts
to
repair
software defects contain new
defects in the repairs
themselves.
In
fact, sometimes there are
secondary and even tertiary
bad fixes; that
is,
three consecutive attempts to fix a
bug may fail to fix the
original
bug
and introduce new bugs
that were not there
before!
Another
problem with the software
engineering literature and
also
with
software professional associations is a
very narrow focus.
Most
testing
organizations tend to ignore
static analysis and
inspections.
As
a result of this narrow
focus, the synergies among
various kinds of
defect
removal operations are not
well covered in the quality
or software
engineering
literature. For example,
carrying out formal
inspections
of
requirements and design not
only finds defects, but
also raises the
defect
removal efficiency levels of
subsequent test stages by at
least
5
percent by providing better
and more complete source
material for
constructing
test cases.
Running
automated static analysis
prior to testing will find
numerous
defects
having to do with limits, boundary
conditions, and
structural
problems,
and therefore speed up subsequent
testing.
Formal
inspections are best at
finding very complicated and
subtle
problems
that require human
intelligence and insight.
Formal inspec-
tions
are also best at finding
errors of omission and
errors of ambiguity.
Static
analysis is best at finding
structural and mechanical
problems
such
as boundary conditions, duplications,
failures of error-handling,
and
branches to incorrect routines.
Static analysis can also
find security
flaws.
Testing
is best at finding problems
that occur when software is
execut-
ing,
such as performance issues,
usability issues, and
security issues.
Individually,
these three methods are
useful but incomplete.
When
used
together, their synergies
can elevate defect removal
efficiency
levels
and also reduce the
effort and costs associated with
defect removal
activities.
Table
9-22 provides an overview of 80
different forms of software
defect
removal:
static analysis, inspections,
many kinds of testing, and
some
special
forms of defect removal
associated with software
litigation.
Although
Table 9-22 shows overall
values for defect removal
efficiency,
the
data really deals with
removal efficiency against
selected defect cat-
egories.
For example, automated
static analysis might find
87 percent
of
structural code problems,
but it can't find
requirements omissions or
problems
such as the Y2K problem that
originate in requirements.
Software
Quality: The Key to
Successful Software
Engineering
615
TABLE
9-22
Overview
of 80 Varieties of Software Defect Removal
Activities
DEFECT
REMOVAL ACTIVITIES
Bad-Fix
Defect
Number
of
Injection
Removal
Test
Cases
Percent
Efficiency
per
FP
Activities
STATIC
ANALYSIS
1.
Automated
static analysis
0.00
87.00%
2.00%
2.
Requirements
inspections
0.00
85.00%
6.00%
3.
External
design inspection
0.00
85.00%
6.00%
4.
Use-case
inspection
0.00
85.00%
4.00%
5.
Internal
design inspection
0.00
85.00%
4.00%
6.
New
code inspections
0.00
85.00%
4.00%
7.
Reuse
certification inspection
0.00
84.00%
2.00%
8.
Test
case inspection
0.00
83.00%
5.00%
9.
Automated
document analysis
0.00
83.00%
6.00%
10.
Legacy
code inspections
0.00
83.00%
6.00%
11.
Quality
function deployment
0.00
82.00%
3.00%
12.
Document
proof reading
0.00
82.00%
1.00%
13.
Nationalization
inspection
0.00
81.00%
3.00%
14.
Architecture
inspections
0.00
80.00%
3.00%
15.
Test
plan inspection
0.00
80.00%
5.00%
16.
Test
script inspection
0.00
78.00%
4.00%
17.
Test
coverage analysis
0.00
77.00%
3.00%
18.
Document
editing
0.00
77.00%
2.50%
19.
Pair
programming review
0.00
75.00%
5.00%
20.
Six
Sigma analysis
0.00
75.00%
3.00%
21.
Bug
repair inspection
0.00
70.00%
3.00%
22.
Business
plan inspections
0.00
70.00%
8.00%
23.
Root-cause
analysis
0.00
65.00%
4.00%
24.
Governance
reviews
0.00
63.00%
5.00%
25.
Refactoring
of code
0.00
62.00%
5.00%
26.
Error-prone
module analysis
0.00
60.00%
10.00%
27.
Independent
audits
0.00
55.00%
10.00%
28.
Internal
audits
0.00
52.00%
10.00%
29.
Scrum
sessions (daily)
0.00
50.00%
2.00%
30.
Quality
assurance review
0.00
45.00%
7.00%
31.
Sarbanes-Oxley
review
0.00
45.00%
10.00%
32.
User
story reviews
0.00
40.00%
10.00%
33.
Informal
peer reviews
0.00
40.00%
10.00%
34.
Independent
verification and
validation
0.00
35.00%
12.00%
35.
Private
desk checking
0.00
35.00%
7.00%
(Continued)
616
Chapter
Nine
TABLE
9-22
Overview
of 80 Varieties of Software Defect Removal
Activities
(continued)
DEFECT
REMOVAL ACTIVITIES
Bad-Fix
Defect
Number
of
Injection
Removal
Test
Cases
Percent
Efficiency
per
FP
Activities
36.
Phase
reviews
0.00
30.00%
15.00%
37.
Correctness
proofs
0.00
27.00%
20.00%
Average
0.00
66.92%
6.09%
GENERAL
TESTING
38.
PSP/TSP
unit testing
3.50
52.00%
2.00%
39.
Subroutine
testing
0.25
50.00%
2.00%
40.
XP
testing
2.00
40.00%
3.00%
41.
Component
testing
1.75
40.00%
3.00%
42.
System
testing
1.50
40.00%
7.00%
43.
New
function testing
2.50
35.00%
5.00%
44.
Regression
testing
2.00
30.00%
7.00%
45.
Unit
testing
3.00
25.00%
4.00%
Average
2.06
41.00%
4.13%
Sum
16.50
AUTOMATIC
TESTING
46.
Virus/spyware
test
3.50
80.00%
4.00%
47.
System
test
2.00
40.00%
8.00%
48.
Regression
test
2.00
37.00%
7.00%
49.
Unit
test
0.05
35.00%
4.00%
50.
New
function test
3.00
35.00%
5.00%
Average
2.11
45.40%
5.60%
Sum
10.55
SPECIALIZED
TESTING
51.
Virus
testing
0.70
98.00%
2.00%
52.
Spyware
testing
1.00
98.00%
2.00%
53.
Security
testing
0.40
90.00%
4.00%
54.
Limits/capacity
testing
0.50
90.00%
5.00%
55.
Penetration
testing
4.00
90.00%
4.00%
56.
Reusability
testing
4.00
88.00%
0.25%
57.
Firewall
testing
2.00
87.00%
3.00%
58.
Performance
testing
0.50
80.00%
7.00%
59.
Nationalization
testing
0.30
75.00%
10.00%
60.
Scalability
testing
0.40
65.00%
6.00%
61.
Platform
testing
0.20
55.00%
5.00%
62.
Clean-room
testing
3.00
45.00%
7.00%
63.
Supply
chain testing
0.30
35.00%
10.00%
Software
Quality: The Key to
Successful Software
Engineering
617
TABLE
9-22
Overview
of 80 Varieties of Software Defect Removal
Activities
(continued)
DEFECT
REMOVAL ACTIVITIES
Bad-Fix
Defect
Number
of
Injection
Removal
Test
Cases
Percent
Efficiency
per
FP
Activities
64.
SOA
orchestration
0.20
30.00%
5.00%
65.
Independent
testing
0.20
25.00%
12.00%
Average
1.18
70.07%
5.48%
Sum
17.70
USER
TESTING
66.
Usability
testing
0.25
65.00%
4.00%
67.
Local
nationalization testing
0.40
60.00%
3.00%
68.
Lab
testing
1.25
45.00%
5.00%
69.
External
beta testing
1.00
40.00%
7.00%
70.
Internal
acceptance testing
0.30
30.00%
8.00%
71.
Outsource
acceptance testing
0.05
30.00%
6.00%
72.
COTS
acceptance testing
0.10
25.00%
8.00%
Average
0.48
42.14%
5.86%
Sum
3.35
LITIGATION
ANALYSIS, TESTING
73.
Intellectual
property testing
2.00
80.00%
1.00%
74.
Intellectual
property review
0.00
80.00%
3.00%
75.
Breach
of contract review
0.00
80.00%
2.00%
76.
Breach
of contract testing
2.00
70.00%
2.00%
77.
Tax
litigation review
0.00
80.00%
4.00%
78.
Tax
litigation testing
1.00
70.00%
4.00%
79.
Fraud
code review
0.00
80.00%
2.00%
80.
Embezzlement
code review
0.00
80.00%
2.00%
Average
2.35
77.14%
2.71%
Sum
5.00
TOTAL
TEST CASES
53.10
PER
FUNCTION POINT
Table
9-22 is sorted in descending
order of defect removal
efficiency.
However,
the results shown are
maximum values. In real
life, the range
of
measured defect removal
efficiency can be less than
half of the nomi-
nal
maximum values shown in
Table 9-18.
Although
Table 9-22 lists 80
different kinds of software
defect removal
activities,
that does not imply
that all of them are
used at the same
time.
618
Chapter
Nine
In
fact, the U.S. average
for defect removal
activities includes only
six
kinds
of testing:
U.S.
Average Sequence of Defect
Removal
1.
Unit test
2.
New function test
3.
Performance test
4.
Regression test
5.
System test
6.
Acceptance or beta
test
These
six forms of testing,
collectively, range between
about 70 per-
cent
and 85 percent in cumulative
defect removal efficiency
levels: far
below
what is needed to achieve
high levels of reliability
and customer
satisfaction.
The bottom line is that
testing, by itself, is insufficient
to
achieve
professional levels of
quality.
An
optimum sequence of defect
removal activities would
include sev-
eral
kinds of pretest inspections,
static analysis, and at
least eight forms
of
testing:
Optimal
Sequence of Software
Defect
Removal
Pretest
Defect Removal
1.
Requirements inspection
2.
Architecture inspection
3.
Design inspection
4.
Code inspection
5.
Test case inspection
6.
Automated static
analysis
Testing
Defect Removal
7.
Subroutine test
8.
Unit test
9.
New function test
10.
Security test
11.
Performance test
12.
Usability test
Software
Quality: The Key to
Successful Software
Engineering
619
13.
System test
14.
Acceptance or beta
test
This
combination of synergistic forms of
defect removal will
achieve
cumulative
defect removal efficiency
levels in excess of 95 percent
for
every
software project and can
achieve 99 percent for some
projects.
When
the most effective forms of
defect removal are combined
with
the
most effective forms of
defect prevention, then
software engineering
should
be able to achieve consistent
levels of excellent quality. If
this
combination
can occur widely enough to
become the norm, then
software
engineering
can be considered a true
engineering discipline.
Software
Quality Specialists
As
noted earlier in the book,
more than 115 types of
occupations and
specialists
are working in the software
engineering domain. In
most
knowledge-based
occupations such as medicine
and law, specialists
have
extra
training and sometimes extra
skills that allow them to
outperform
generalists
in selected areas such as in
neurosurgery or maritime
law.
For
software engineering, the
literature is sparse and
somewhat
ambiguous
about the roles of
specialists. Much of the
literature on
software
specialization is vaporous and
merely expresses some
kind
of
bias. Many authors prefer a
generalist model where
individuals are
interchangeable
and can handle requirements,
design, development,
and
testing as needed. Other
authors prefer a specialist
model where
key
skills such as testing,
quality assurance, and
maintenance are per-
formed
by trained specialists.
In
this chapter, we will focus
primarily on two basic
questions:
1.
Do specialized skills lower
defect potentials and
benefit defect
prevention?
2.
Do specialized skills raise
defect removal efficiency
levels?
Not
all of the 115 or so
specialists will be discussed, but
those whose
roles
have a potential impact on
quality levels will be discussed in
terms
of
defect prevention and defect
removal.
The
20 specialist categories discussed in
this chapter include,
in
alphabetical
order:
1.
Architects
2.
Business analysts
3.
Database analysts
4.
Data quality analysts
620
Chapter
Nine
5.
Enterprise architects
6.
Estimating specialists
7.
Function point
specialists
8.
Inspection moderators
9.
Maintenance specialists
10.
Requirements analysts
11.
Performance specialists
12.
Risk analysis
specialists
13.
Security specialists
14.
Six Sigma specialists
15.
Systems analysts
16.
Software quality assurance
(SQA)
17.
Technical writers
18.
Testers
19.
Usability specialists
20.
Web designers
For
each of these 20 specialist
groups, we will consider the
volume of
potential
defects they face, and
whether they have a tangible
impact on
defect
prevention and defect
removal activities.
Table
9-23 ranks the specialists
in terms of assignment
scope. This
metric
represents the number of
function points normally
assigned to
one
practitioner. Table 9-23
also shows the volume of
defects that the
various
occupations face as part of
their jobs. Table 9-23
then shows the
approximate
impacts of these specialized
occupations on both
defect
prevention
and defect removal.
The
top-ranked specialists face
large numbers of potential
defects
that
are also capable of causing
great damage to entire
corporations
as
well as to the software
applications owned by those
corporations.
Following
are short discussions of
each of the 20 kinds of
specialists.
Risk
Analysis Specialists
Assignment
scope = 300,000 function
points
Defect
potentials = 7.00
Defect
prevention impact = 75
percent
Defect
removal impact = 25
percent
The
large assignment scope of
300,000 function points
indicates that
companies
do not need many risk
analysts, but the ones
they employ need
to
be very competent and
understand both technical
and financial risks.
Software
Quality: The Key to
Successful Software
Engineering
621
TABLE
9-23
Software
Specialization Impact on Software
Quality
Assignment
Defect
Defect
Defect
Specialized
Occupations
Scope
Potential
Prevention
Removal
1.
Risk
analysis specialists
300,000
7.00
75.00%
25.00%
2.
Enterprise
architects
250,000
6.00
25.00%
20.00%
3.
Six
Sigma specialists
250,000
5.00
25.00%
30.00%
4.
Database
analysts
100,000
3.00
15.00%
10.00%
5.
Architects
100,000
3.00
17.00%
12.00%
6.
Usability
specialists
100,000
1.00
10.00%
15.00%
7.
Security
specialists
50,000
7.00
70.00%
20.00%
8.
Data
quality analysts
50,000
5.00
12.00%
15.00%
9.
Business
analysts
50,000
3.50
25.00%
10.00%
10.
Estimating
specialists
25,000
3.00
20.00%
25.00%
11.
Systems
analysts
20,000
6.00
20.00%
20.00%
12.
Performance
specialists
20,000
1.00
10.00%
12.00%
13.
Quality
assurance (QA)
10,000
5.50
15.00%
40.00%
14.
Web
designers
10,000
4.00
15.00%
12.00%
15.
Requirements
analysts
10,000
4.00
20.00%
15.00%
16.
Testers
10,000
3.00
15.00%
50.00%
17.
Function
point specialists
5,000
4.00
10.00%
10.00%
18.
Technical
writers
2,000
1.00
10.00%
10.00%
19.
Maintenance
specialists
1,500
3.50
30.00%
20.00%
20.
Inspection
moderators
1,000
4.50
27.00%
35.00%
Average
68,225
4.00
23.30%
20.30%
Given
the enormous number of
business failures as part of
the recession,
it
is obvious that risk
analysis is not yet as
sophisticated as it should
be;
especially for dealing with
financial risks.
Risk
analysts face more than
100 percent of the potential
defects
associated
with any given software
application. Not only do they
have
to
deal with technical risks
and quality risks, but
they also need to
address
financial risks and legal
risks that are outside
the normal realm
of
software quality and defect
measurement.
A
formal and careful risk
analysis prior to committing
funds to a
major
software application can
stop investments in excessively
haz-
ardous
projects before any serious
money is spent. For
questionable
projects,
a formal and careful risk
analysis prior to starting
the project
can
introduce better technologies
prior to committing
funds.
The
keys to successful early
risk analysis include the
ability to do
early
sizing, early cost
estimating, early quality
estimating, and
knowl-
edge
of dozens of potential risks
derived from analysis of
project failures
and
successes.
622
Chapter
Nine
The
main role of risk analysts
in terms of quality are to
stop bad proj-
ects
before they start, and to
ensure that projects that do
start utilize
state-of-the-art
quality methods. Risk
analysts also need to
understand
the
main reasons for software
failures, and they should be
familiar
with
software litigation results
for cases dealing with
cancelled proj-
ects,
breach of contract, theft of
intellectual property, patent
violations,
embezzlement
via software, fraud, tax
issues, Sarbanes-Oxley
issues,
and
other forms of litigation as
well.
Enterprise
Architects
Assignment
scope = 250,000 function
points
Defect
potentials = 6.00
Defect
prevention impact = 25
percent
Defect
removal impact = 20
percent
Enterprise
architects are key players
whose job is to understand
every
aspect
of corporate business and to
match business needs against
entire
portfolios,
which may contain more
than 3000 separate
applications and
total
to more than 10 million
function points. Not only
internal software,
but
also open-source applications
and commercial software
packages
such
as Vista and SAP need to be
part of the enterprise
architect's
domain
of knowledge.
The
main role of enterprise
analysts in terms of quality is to
under-
stand
the business value of
quality to corporate operations,
and to
ensure
that top executives have
similar understandings. Both
enter-
prise
architects and corporate
executives need to push for
excellence in
order
to achieve speed of delivery.
Enterprise
architects also play a role
in corporate governance, by
ensuring
that critical mistakes such
as the Y2K problem are
prevented
from
occurring in the
future.
Six
Sigma Specialists
Assignment
scope = 250,000 function
points
Defect
potentials = 5.00
Defect
prevention impact = 25
percent
Defect
removal impact = 30
percent
The
large assignment scope for
Six Sigma specialists indicates
that
their
work is corporate in nature
rather than being limited to
specific
applications.
The main role of Six Sigma
specialists in terms of
quality
is
to provide expert analysis of
defect origins and defect
causes, and
to
suggest effective methods of
continuous improvement to reduce
the
major
sources of software
error.
Software
Quality: The Key to
Successful Software
Engineering
623
Database
Analysts
Assignment
scope = 100,000 function
points
Defect
potentials = 7.00
Defect
prevention impact = 75
percent
Defect
removal impact = 25
percent
In
today's world of 2009, major
corporations and government
agencies
own
even more data than
they own software. Customer
data, employee
data,
manufacturing data, total to
millions of records scattered
over
dozens
of databases and repositories.
This collection of enterprise
data
is
a valuable asset that needs
to be accessed for key
business decisions,
and
also protected against
hacking, theft, and
unauthorized access.
There
is a major quality weakness in
2009 in the area of data
qual-
ity.
There are no "data point"
metrics that express the
size of databases
and
repositories. As a result, it is very
hard to quantify data
quality. In
fact,
for all practical purposes,
no literature at all on data
quality uses
actual
counts of errors.
As
a result, database analysts
and data quality analysts
are severely
handicapped.
They both play key
roles in quality, but lack
all of the tools
they
need to do a good
job.
The
major role played by
database analysts in terms of
quality is to
ensure
that databases and
repositories are designed
and organized in
optimal
fashions, and that processes
are in place to validate the
accu-
racy
of all data elements that
are added to enterprise data
storage.
Architects
Assignment
scope = 100,000 function
points
Defect
potentials = 3.00
Defect
prevention impact = 17
percent
Defect
removal impact = 12
percent
Architects
also have a large assignment
scope, and need to be able
to
envision
and deal with the largest
known applications of the
modern
world,
such as Vista, ERP packages
like SAP and Oracle,
air-traffic
control,
defense applications, and
major business
applications.
Over
the past 50 years, software
applications have evolved
from run-
ning
by themselves to running under an
operating system to
running
as
part of a multitier network
and indeed to running in
fragments scat-
tered
over a cloud of hardware and
software platforms that may
be
thousands
of miles apart.
As
a result, the role of
architects has become much
more complex
in
2009 than it was even
ten years ago. Architects
need to understand
modern
application practices such as
service-oriented architecture
(SOA),
624
Chapter
Nine
cloud
computing, and multitier
hierarchies. In addition, architects
need
to
know the sources and
certification methods of various
kinds of reus-
able
material that constitutes
more than 50 percent of many
large appli-
cations
circa 2009.
The
main role that architects
play in terms of quality is to
under-
stand
the implications of software
defects in complex, multitier,
highly
distributed
environments where software
components may come
from
dozens
of sources.
Usability
Specialists
Assignment
scope = 100,000 function
points
Defect
potentials = 1.00
Defect
prevention impact = 10
percent
Defect
removal impact = 15
percent
The
word "usability" defines
what customers need to do to
operate
software
successfully. It also includes
what software customers need
to
do
when the software
misbehaves.
Usability
specialists often have a
background in cognitive
psychol-
ogy
and are well versed in
various kinds of software
interfaces: key-
board
commands, buttons, touch
screens, voice recognitions,
and even
more.
The
main role of usability
specialists in terms of quality is to
ensure
that
software applications have
interfaces and control sequences
that
are
as natural and intuitive as
possible. Usability studies
are normally
carried
out with volunteer clients
who use the software
while it is under
development.
Large
computer and software
companies such as IBM and
Microsoft
have
usability laboratories where
customers can be observed
while they
are
using prerelease versions of
software and hardware
products. These
labs
monitor keystrokes, screen
touches, voice commands, and
other
interface
methods. Usability specialists
also debrief customers
after
every
session to find out what
customers like and dislike
about inter-
faces
and command sequences.
Security
Specialists
Assignment
scope = 50,000 function
points
Defect
potentials = 7.00
Defect
prevention impact = 70
percent
Defect
removal impact = 20
percent
There
is an increasing need for
more software security
specialists,
and
also for better training of
software security specialists
both at the
university
level and after employment,
as security threats evolve
and
change.
Software
Quality: The Key to
Successful Software
Engineering
625
As
of 2009, due in part to the
recession, attacks and data
theft are
increasing
rapidly in numbers and
sophistication. Hacking is
rapidly
moving
from the domain of
individual amateurs to organized
crime and
even
to hostile foreign
governments.
Software
applications are not
entirely safe behind
firewalls, even with
active
antivirus and antispyware
applications installed. There is
an
urgent
need to raise the immunity
levels of software applications
by
using
techniques such as Google's Caja,
the E programming
language,
and
changing permission
schemas.
Security
and quality are not
identical, but they are
very close together,
and
both prevention and removal
methods are congruent and
synergistic.
The
closeness of quality and security is
indicated by the fact that
major
avenues
of attack on software applications
are error-handling
routines.
The
main role of security
specialists in terms of quality is to
stay cur-
rent
on the latest kinds of
threats, and to ensure that
both new applica-
tions
and legacy applications have
state-of-the-art security
defenses.
Data
Quality Analysts
Assignment
scope = 50,000 function
points
Defect
potentials = 5.00
Defect
prevention impact = 12
percent
Defect
removal impact = 15
percent
As
of 2009, data quality
analysts are few in number
and woefully
under-equipped
in terms of tools and
technology. There is no
effective
size
metric for data volumes
(i.e., a data point metric
similar to func-
tion
points). As a result, no empirical
data is available on topics
such as
defect
potentials for databases and
repositories, effective defect
removal
methods,
defect estimation, or defect
measurement.
The
theoretical role of data
quality analysts is to prevent
data errors
from
occurring, and to recommend
effective removal methods.
However,
given
the very large number of
apparent data errors in
public records,
credit
scores, accounting, taxes, and so
on, it is obvious that data
quality
lags
behind even software
quality. In fact, data and
software appear to
lag
behind every other
engineering and technical
domain in terms of
quality
control.
Business
Analysts
Assignment
scope = 50,000 function
points.
Defect
potentials = 3.5
Defect
prevention impact = 25
percent
Defect
removal impact = 10
percent
In
many information technology
organizations, business
analysts
are
the primary connection
between the software
community and the
626
Chapter
Nine
community
of software users. Business
analysts are required to be
well
versed
in both business needs and in
software engineering
technologies.
The
main role that business
analysts should play in
terms of qual-
ity
is to convince both the
business and technical
communities that
high
levels of software quality will
shorten development schedules
and
lower
development costs. Too
often, the business clients
set arbitrary
schedules
and then attempt to force
the software community to
try
and
meet those schedules by
skimping on inspections and
truncating
testing.
Good
business analysts should
have data available from
sources
such
as the International Software
Benchmarking Standards
Group
(ISBSG)
that shows the relationships
between quality, schedules,
and
costs.
Business analysts should
also understand the value of
methods
such
as joint application design
(JAD), quality function
deployment
(QFD),
and requirements
inspections.
Estimating
Specialists
Assignment
scope = 25,000 function
points
Defect
potentials = 3.00
Defect
prevention impact =
20
Defect
removal impact = 25
percent
It
is a sign of sophistication when a
company employs software
esti-
mating
specialists. Usually these
specialists work in project
offices or
special
staff groups that support
line managers, who often
are not well
trained
in estimation.
Estimation
specialists should have
access to and be familiar with
the
major
software estimating tools
that can predict quality,
schedules, and
costs.
Examples of such tools
include COCOMO, KnowledgePlan,
Price-
S,
SoftCost, SEER, Slim, and a
number of others. In fact, a
number of
companies
utilize several of these
tools for the same
applications and
look
for convergence.
The
main role of an estimating
specialist in terms of quality is to
pre-
dict
quality early. Ideally,
quality will be predicted before
substantial
funds
are spent. Not only that,
but multiple estimates may
be needed
to
show the effects of
variations in development practices
such as Agile
development,
Team Software Process (TSP),
Rational Unified
Process
(RUP),
formal inspections, static
analysis, and various kinds
of testing.
Systems
Analysts
Assignment
scope = 20,000 function
points
Defect
potentials = 6.00
Defect
prevention impact = 25
percent
Defect
removal impact = 20
percent
Software
Quality: The Key to
Successful Software
Engineering
627
Software
systems analysts are one of
the interface points
between
the
software engineering or programming
community and end
users
of
software. Systems analysts
and business analysts
perform similar
roles,
but the title "systems
analyst" occurs more often
for embedded
and
systems software, which are
developed for technical
purposes rather
than
to satisfy local business
needs.
The
main role of systems
analysts in terms of quality is to
understand
that
all forms of representation
for software (user stories,
use-cases,
formal
specification languages, flowcharts,
Nassi-Schneiderman charts,
etc.)
may contain errors. These
errors may not be amenable
to discovery
via
testing, which would be too
late in any case. Therefore,
a key role
of
systems analysts is to participate in
formal inspections of
require-
ments,
internal design documents,
and external design
documents. If
the
application is being constructed
using test-driven
development,
systems
analysts will participate in test
case design and
construction.
Systems
analysts will also participate in
activities such as joint
applica-
tion
design (JAD) and quality
function deployment
(QFD).
Performance
Specialists
Assignment
scope = 20,000 function
points
Defect
potentials = 1.00
Defect
prevention impact = 10
percent
Defect
removal impact = 12
percent
The
occupation of "performance specialist" is
usually found only in
very
large companies that build
very large and complex
software appli-
cations;
that is, IBM, Raytheon,
Lockheed, Boeing, SAP,
Oracle, Unisys,
Google,
Motorola, and the
like.
The
general role of performance
specialists is to understand
every
potential
bottleneck in hardware and
software platforms that
might
slow
down performance.
Sluggish
or poor performance is viewed as a
quality issue, so the
role
of
performance specialists is to assist
software engineers and
software
designers
in building software that will
achieve good performance
levels.
In
today's world of 2009, with
multitier architectures as the
dominant
model
and with multiple programming
languages as the dominant
form
of
development, the work of
performance specialists has
become much
more
difficult than it was only
ten years ago. Looking
ahead, the work
of
performance specialists will probably
become even more difficult
ten
years
from now.
Software
Quality Assurance
Assignment
scope = 10,000 function
points
Defect
potentials = 5.50
628
Chapter
Nine
Defect
prevention impact = 15
percent
Defect
removal impact = 40
percent
The
general title of "quality
assurance" is much older
than software
and
has been used by engineering
companies for about 100
years.
Within
the software world, the
title of "software quality
assurance" has
existed
for more than 50 years.
Today in 2009, software
quality special-
ists
average between 2 percent
and 6 percent of total
software employ-
ment
in most large companies. The
hi-tech companies such as IBM
and
Lockheed
employ more software quality
assurance personnel than
do
lo-tech
companies such as insurance
and general
manufacturing.
A
small percentage of software
quality assurance personnel
have been
certified
by one or more of the software
quality assurance
associations.
The
roles of software quality
assurance vary from company
to com-
pany,
but they usually include
these core activities:
ensuring that
relevant
international and corporate
quality standards are used
and
adhered
to, measuring defect removal
efficiency, measuring
cyclomatic
and
essential complexity, teaching
classes in quality, and
estimating or
predicting
quality levels.
A
few very sophisticated
companies such as IBM have
quality assurance
research
positions, where the
personnel can develop new
and improved
quality
control methods. Some of the
results of these QA research
groups
include
formal inspections, function
point metrics, automated
con-
figuration
control tools, clean-room
development, and joint
application
design
(JAD).
Given
the fact that quality
assurance positions have
existed for more
than
50 years and that SQA
personnel number in the
thousands, why is
software
quality in 2009 not much
better than it was in
1979?
One
reason is that in many
companies, quality assurance
plays an advi-
sory
role, but their advice
does not have to be
followed. In some
companies
such
as IBM, formal QA approval is necessary
prior to delivering a
prod-
uct
to customers. If the QA team
feels that quality methods
were deficient,
then
delivery will not occur.
This is a very serious
business issue.
In
fact, very few projects
are stopped from being
delivered. But the
theoretical
power to stop delivery if
quality is inadequate is a
strong
incentive
to pursue state-of-the-art quality
control methods.
Therefore,
a major role of software
quality assurance is to ensure
that
state-of-the-art
measures, methods, and tools
are used for quality
control,
with
the knowledge that poor
quality can lead to delays
in delivery.
Web
Designers
Assignment
scope = 10,000 function
points
Defect
potentials =4.00
Defect
prevention impact = 15
percent
Defect
removal impact = 12
percent
Software
Quality: The Key to
Successful Software
Engineering
629
Software
web design is a fairly new
occupation, but one that is
grow-
ing
faster than almost any
other. The fast growth in
web design is due
to
software companies and other
businesses migrating to the
Web as
their
main channel for marketing
and information.
The
role of web design in terms
of software quality is still
evolving
and
will continue to do so as web sites
move toward virtual reality
and
3-D
representation. As of 2009, some of
the roles are to ensure
that all
interfaces
are fairly intuitive, and
that all links and
connections actu-
ally
work.
Unfortunately,
due to the exponential
increase in hacking, data
theft,
and
denial of service attacks,
web quality and web
security are now
overlapping.
Effective quality for web
sites must include effective
secu-
rity,
and many web design
specialists do not yet know
enough about
security
to be fully effective.
Requirements
Analysts
Assignment
scope = 10,000 function
points
Defect
potentials = 4.00
Defect
prevention impact = 20
percent
Defect
removal impact = 15
percent
The
work of requirements analysts
overlaps the work of systems
ana-
lysts
and business analysts.
However, those who
specialize in require-
ments
analysis also know topics
such as quality function
deployment
(QFD),
joint application design
(JAD), requirements inspections,
and at
least
half a dozen requirements
representation methods such as
use-
cases,
user stories, and several
others.
Because
the majority of "new"
applications being developed
circa
2009
are really nothing more
than replacements for legacy
applications,
requirements
analysts should also be
conversant with data mining.
In
fact,
the best place to start
the requirements analysis
for a replacement
application
is to mine the older legacy
application for business
rules
and
algorithms that are hidden
in the code. Data mining is
necessary
because
usually the original
specifications are either
missing completely
or
long out of date.
The
role of requirements analysis in
terms of quality is to ensure
that
toxic
requirements defects are
removed before they enter
the design or
find
their way into source
code. The frequently cited
Y2K problem is an
example
of a toxic requirement.
Because
the measured rate at which
requirements grow after
the
requirements
phase is between 1 percent
and 3 percent per
calendar
month,
another quality role is to
ensure that prototypes,
embedded
users,
JAD, or other methods are
used that minimize unplanned
changes
in
requirements.
630
Chapter
Nine
Requirements
analysts should also be
members of or support
change
control
boards that review and
approve requirements
changes.
Testers
Assignment
scope = 10,000 function
points
Defect
potentials = 3.00
Defect
prevention impact = 15
percent
Defect
removal impact = 50
percent
Software
testing is one of the specialized
occupations where there
is
some
empirical evidence that
specialists can outperform
generalists.
Not
every kind of testing is
performed by test specialists.
For example,
unit
testing is almost always
carried out by the
developers. However,
the
forms
of testing that integrate
the work of entire teams of
developers need
testing
specialists for large
applications. Such forms of
testing include new
function
testing, regression testing,
and system testing among
others.
The
role of test specialists in
terms of quality is to ensure
that test
coverage
approaches 99 percent, that
test cases themselves do not
con-
tain
errors, and that test
libraries are effectively
maintained and purged
of
duplicate test cases that
add cost but not
value.
Although
not a current requirement
for test case personnel, it
would
be
useful if test specialists
also measured defect removal
efficiency
levels
and attempted to raise
average testing efficiency
from today's
average
of around 35 percent up to at least 75
percent.
Test
specialists should also be
pioneers in new testing
technologies
such
as automated testing. Running
static analysis tools prior
to testing
could
also be added with some
value accruing.
Function
Point Specialists
Assignment
scope = 5000 function
points
Defect
potentials = 4.00
Defect
prevention impact = 10
percent
Defect
removal impact = 10
percent
Because
function point metrics are
the best choice for
normalizing
quality
data and creating effective
benchmarks of quality
information,
function
point specialists are
rapidly becoming part of
successful quality
improvement
programs.
However,
traditional manual counts of
function points are too
slow and
too
costly to be used as standard
quality control methods. The
average
counting
speed by a certified function
point specialist is only
about 400
function
points per day. This
explains why function point
analysis is almost
never
used for applications larger
than about 10,000 function
points.
However,
new methods have been
developed that allow
function points
to
be calculated at least six
months earlier than
previously possible.
Software
Quality: The Key to
Successful Software
Engineering
631
These
same methods operate at speeds in
excess of 10,000
function
points
per minute. This makes it
possible to use function
points for early
quality
estimation, as well as for
measuring quality and
producing qual-
ity
benchmarks.
The
role of function point
specialists in terms of quality is to
create
useful
size information fast enough
and early enough that it
can serve
for
risk analysis, quality
prediction, and quality
measures.
Technical
Writers
Assignment
scope = 2000 function
points
Defect
potentials = 1.00
Defect
prevention impact = 10
percent
Defect
removal impact = 10
percent
Good
writing is a fairly rare
skill in the human species.
As a result,
good
software technical manuals
are also fairly rare.
Many kinds of
quality
problems are common in
software manuals, including
ambigu-
ity,
missing information, poor
organization structures, and
incurred
data.
There
are automated tools
available that can analyze
the readabil-
ity
of text, such as the FOG
index and the Fleisch
index. But these
are
seldom used for software
manuals. Editing is useful, as
are formal
inspections
of user documentation.
Another
approach, which was actually
used by IBM, was to
select
examples
of user documents with the
highest user evaluation
scores
and
use them as samples.
The
role of technical writers in
terms of software quality is
make sure
that
factual data is complete and
correct, and that manuals
are easy to
read
and understand.
Maintenance
Specialists
Assignment
scope = 1,500 function
points
Defect
potentials = 3.5
Defect
prevention impact = 30
percent
Defect
removal impact = 20
percent
Maintenance
programming in terms of both
enhancing legacy
soft-
ware
and repairing bugs has been
the dominant activity for
the software
industry
for more than 20 years.
This should not be a
surprise, because
for
every industry older than 50
years, more people are
working on
repairs
of existing products than
are working on new
development.
As
the recession deepens and
lengthens, the U.S.
automobile industry
is
providing a very painful
example of this fact:
automotive manufac-
turing
is shrinking faster than the
polar ice fields, while
automotive
repairs
are increasing.
632
Chapter
Nine
Aging
legacy applications have a
number of quality problems,
includ-
ing
poor structure, dead code,
error-prone modules, and
poor or missing
comments.
As
the recession continues,
many companies are
considering ways of
stretching
out the useful lives of
legacy applications. In fact,
renovation
and
data mining of legacy
software are both growing,
even in the face
of
the recession.
The
main role of maintenance
programmers in terms of quality is
to
strengthen
the quality of legacy
software. The methods
available to do this
include
full renovation using
automated tools; complexity
measurement
and
reduction; dead code
removal; improving comments;
identification
and
surgical removal of error-prone
modules; converting code
from orphan
languages
such as MUMPS or Coral into
modern languages such as
Java
or
Ruby, and improving the
security flaws of legacy
applications.
Inspection
Moderators
Assignment
scope = 1000 function
points
Defect
potentials = 4.5
Defect
prevention impact = 25
percent
Defect
removal impact = 35
percent
Software
inspections have a number of
standard roles, including
the
moderator,
the recorder, the
inspectors, and the person
whose work is
being
inspected. The moderator is
the key to a successful
inspection.
The
tasks of the moderator
include keeping the
discussions on track,
minimizing
disruptive events, and
ensuring that the inspection
session
starts
and ends on time.
The
main role of inspection
moderators in terms of quality
include
ensuring
the materials to be inspected
are delivered in time for
pre-inspec-
tion
review, making sure that
the inspectors and other
personnel show up
on
time, keeping the inspection
team focused on defect identification
(as
opposed
to repairs), and intervening in
potential arguments or
disputes.
The
inspection recorder plays a
key role too, because
the recorder
keeps
notes and fills out
the defect reports of all
bugs or defects that
the
inspection identified. This is
not as easy as it sounds, because
there
may
be some debate as to whether a
particular issue is a defect or
a
possible
enhancement.
Summary
and Conclusions on
Software
Specialization
The
overall topic of software
specialization is not well
covered in the
software
engineering literature. Considering
that there are more
than
115
specialists associated with software,
this fact is mildly
surprising.
Software
Quality: The Key to
Successful Software
Engineering
633
When
it comes to software quality,
some forms of specialization do
add
value,
and this can be shown by
analysis of both defect
prevention and
defect
removal. The key specialists
who add the most
value to software
quality
include risk analysts, Six
Sigma specialists, quality
assurance
personnel,
inspection moderators, maintenance
specialists, and
profes-
sional
test personnel.
However,
many other specialists such
as business analysts,
enterprise
architects,
architects, estimating specialists,
and function point
special-
ists
also add value.
The
Economic Value of
Software
Quality
The
economic value of software
quality is not well covered
in the soft-
ware
engineering literature. There
are several reasons for
this prob-
lem.
One major reason is the
rather poor measurement
practices of
the
software engineering domain.
Many cost factors such as
unpaid
overtime
are routinely ignored. In
addition, there are frequent
gaps and
omissions
in software cost data, such
as omission of project
manage-
ment
costs and the omission of
part-time specialists such as
technical
writers.
In fact, only the effort
and costs of coding have
fairly good data
available.
Everything else, such as
requirements, design,
inspections,
testing,
quality assurance, project
offices, and documentation
tend to be
underreported
or ignored.
As
pointed out in other
sections, the software
engineering literature
depends
too much on vague and
unpredictable definitions of
quality
such
as "conformance to requirements" or
adhering to a collection of
ambiguous
terms ending in ility.
These
unscientific definitions
slow
down
research on software quality
economics.
Two
other measurement problems
also affect quality economic
stud-
ies.
These problems are the
usage of two invalid
economic measures:
cost
per defect and lines of
code. As discussed earlier in
this chapter,
cost
per defect penalizes quality
and achieves its lowest
costs for the
buggiest
applications. Lines of code
penalizes high-level
programming
languages
and disguises the value of
high-level languages for
studying
either
quality or productivity.
In
this section, the economic
value of quality will be shown by
means
of
eight case studies. Because
the value of software
quality correlates
to
application size, four
discrete size ranges will be
used: 100 function
points,
1000 function points, 10,000
function points, and 100,000
func-
tion
points.
Applications
in the 100function point
range are usually small
fea-
tures
for larger systems rather
than stand-alone applications.
However,
this
is a very common size range
for prototypes of larger
applications.
634
Chapter
Nine
There
may be small stand-alone
applications in this range
such as cur-
rency
converters or applets for
devices such as
iPhones.
Applications
in the 1000function point
range are normally
stand-
alone
software applications such as
fuel-injection controls, atomic
watch
controls,
compilers for languages such
as Java, and software
estimating
tools
in the class of
COCOMO.
Applications
in the 10,000function point
range are normally
impor-
tant
systems that control aspects
of business, such as insurance
claims
processing,
motor vehicle registration, or
child-support applications.
Applications
in the 100,000function point
range are normally
major
systems
in the class of large
international telephone-switching
systems,
operating
systems in the class of
Vista and IBM's MVS, or
suites of
linked
applications such as Microsoft
Office. Some enterprise
resource
planning
(ERP) applications are in
this size range, and
may even top
300,000
function points. Also, large
defense applications such as
the
World
Wide Military Command and
Control System (WWMCCS)
also
top
100,000 function
points.
To
reduce the number of
variables, all eight of the
examples are
assumed
to be coded in the C programming language
and have a ratio
of
about 125 code statements
per function point.
Because
all eight of the
applications are assumed to be
written in the
same
programming language, productivity
and quality can be
expressed
using
the lines of code metric
without distortion. The
lines of code metric
is
invalid for comparisons
between unlike programming
languages.
For
each size plateau, two
cases will be illustrated: average
quality
and
excellent quality. The
average quality case assumes
waterfall devel-
opment,
CMMI level 1, normal testing,
and nothing special in terms
of
defect
prevention.
The
excellent quality case
assumes at least CMMI level 3,
formal
inspections,
static analysis, rigorous
development such as the
Team
Software
Process (TSP), and the
use of prototypes and joint
application
design
(JAD) for requirements
gathering.
(Some
readers may wonder why Agile
development is not used for
the
case
studies. The main reason is
that there are no Agile
applications
in
the 10,000 and
100,000function point ranges.
The Agile method
is
used primarily for smaller
applications in the 1000function
point
range.)
Although
all of the case studies
are derived from actual
applications,
to
make the calculations
consistent, a number of simplifying
assump-
tions
are used. These assumptions
include the following key
points:
All
cost data is based on a
fully burdened cost of
$10,000 per staff
■
month.
A staff month is considered to
have 132 working hours.
This
is
equivalent to $75.75 per
hour.
Software
Quality: The Key to
Successful Software
Engineering
635
Work
months are assumed to
consist of 22 days, and each
day consists
■
of
8 hours. Unpaid overtime is
not shown nor is paid
overtime.
Defect
potentials are the total
numbers of defects found in five
categories:
■
requirements
defects, design defects, code
defects, documentation
defects,
and bad fixes, or secondary
defects accidentally included
in
defect
repairs.
Creeping
requirements are not shown.
The sizes of the six
case studies
■
reflect
application size as delivered to
clients.
Software
reuse is not shown. All six
cases can be assumed to
reuse
■
about
15 percent of legacy code. But to
simplify assumptions,
the
defect
potentials in the reused
code and other materials
are assumed
to
equal defect potentials of
new material. Larger volumes
of certified
reusable
material would significantly
improve both the quality
and
productivity
of all six case studies,
and especially so for the
larger
systems
above 10,000 function
points.
Bad-fix
injections are not shown.
About 7 percent of attempts to
repair
■
bugs
accidentally introduce a new
bug, but the mathematics of
bad-fix
injection
is complicated since the
bugs are not found in
the activity
where
they originate.
The
first year of maintenance is
assumed to find 100 percent of
latent
■
bugs
delivered with the software. In
reality, many bugs fester
for
years,
but the examples only
show the first year of
maintenance.
The
maintenance data only shows
defect repairs.
Enhancements
■
and
adding new features are
excluded in order to highlight
quality
value.
Maintenance
defect repair rates are
based on average values
of
■
12
bugs fixed per staff
month. In real life, ranges
can run from
fewer
than
4 to more than 20 bugs
repaired each month.
Application
staff size is based on U.S.
average assignment scopes
for
■
all
classes of software personnel,
which is approximately 150
function
points.
That is, if you divide
application size in function
points by the
total
staffing complement of technical
workers plus project
manag-
ers,
the result will be close to 150
function points. This value
includes
software
engineers and also
specialists such as quality
assurance,
technical
writers, and test
personnel.
Schedules
for the "average" cases
are based on raising
function point
■
size
to the 0.4 power. This
rule of thumb provides a
fairly good approx-
imation
of schedules from start of
requirements to delivery in
terms
of
calendar months.
Schedules
for the "excellent" cases
are based on raising
function point
■
size
to the 0.36 power. This
exponent works well with
object-oriented
636
Chapter
Nine
software
and rigorous development
practices. It is also a good fit
for
Agile
projects, except that the
lack of data above 10,000
function
points
for Agile makes the
upper level
uncertain.
Data
in this section is expressed
using the function point
metric defined
■
by
the International Function
Point Users' Group (IFPUG)
version 4.2
of
the counting rules. Other
functional metrics such as
COSMIC func-
tion
points or engineering function
points or Mark II function
points
would
yield different results from
the values shown
here.
Data
on source code in this
section is expressed using
counts of logical
■
statements
rather than counts of
physical lines. There can be
as much
as
500 percent difference in
apparent code size based on
whether
counts
are physical or logical
lines. The counting rules
are those of
the
author's book Applied
Software Measurement.
The
reason for these simplifying
assumptions is to minimize
extra-
neous
variations among the eight
case studies, so that the
data is pre-
sented
in a consistent fashion for
each. Because all of these
assumptions
vary
in real life, readers are
urged to try out alternative
values based on
their
own local data or on
benchmarks from organizations
such as the
International
Software Benchmarking Standards
Group (ISBSG).
The
simplifying assumptions serve to
make the results
consistent,
but
each of the assumptions can
change in either direction by
fairly
large
amounts.
The
Value of Quality for Very
Small
Applications
of 100 Function
Points
Small
applications in this range
usually have low defect
potentials and
fairly
high defect removal
efficiency levels. This is because
such small
applications
can be developed by a single
person, so there are no
inter-
face
problems between features
developed by different individuals
or
different
teams. Table 9-24 shows
quality value for very
small applica-
tions
of 100 function
points.
Note
that cost per defect
goes up
as
quality improves; not
down.
This
phenomenon
distorts economic analysis. As will be
shown in the later
examples,
cost per defect tends to
decline as applications grow
larger. This
is
because large applications have
many more defects than
small ones.
Prototypes
or applications in this size
range are very sensitive
to
individual
skill levels, primarily because one
person does almost all
of
the
work. The measured
variations for this size
range are about 5 to 1
in
how
much code gets written
for a given specification
and about 6 to 1 in
terms
of productivity and quality
levels. Therefore, average
values need
to
be used with caution. Averages
are particularly unreliable
for applica-
tions
where one person performs
the bulk of the entire
application.
Software
Quality: The Key to
Successful Software
Engineering
637
TABLE
9-24
Quality
Value for 100 Function
Point Applications
(Note:
100 function points = 12,500
C statements)
Average
Excellent
Quality
Quality
Difference
Defects
per function point
3.50
1.50
2.00
Defect
potential
350
150
200.00
Defect
removal efficiency
94.00%
99.00%
5.00%
Defects
removed
329
149
181
Defects
delivered
21
2
20
Cost
per defect prerelease
$379
$455
$76
Cost
per defect
postrelease
$1,061
$1,288
$227
Development
schedule (calendar
months)
6
5
1
Development
staffing
1
1
0
Development
effort (staff months)
6
5
1
Development
costs
$63,096
$52,481
$10,615
Function
points per staff
month
15.85
19.05
3.21
LOC
per staff month
1,981
2,382
401
Maintenance
staff
1
1
0
Maintenance
effort (staff months)
2
0
1.63
Maintenance
costs (year 1)
$17,500
$1,250
$16,250
TOTAL
EFFORT
8
5
3
TOTAL
COST
$80,596
$53,731
$26,865
TOTAL
COST PER STAFF MEMBER
$40,298
$26,865
$13,432
TOTAL
COST PER FUNCTION POINT
$805.96
$537.31
$269
TOTAL
COST PER LOC
$6.45
$4.30
$2.15
AVERAGE
COST PER DEFECT
$720
$871
$152
The
Value of Quality for Small
Applications
of
1000 Function
Points
For
small applications of 1000
function points, quality
starts to become
very
important, but it is also
somewhat easier to achieve
than it is for
large
systems. At this size range,
teams are small and
methods such
as
Agile development tend to be
dominant, other than for
systems and
embedded
software where more rigorous
methods such as the
Team
Software
Process (TSP) and the
Rational Unified Process
(RUP) are
more
common. Table 9-25 shows
the value of quality for
small applica-
tions
in the 1000function point
range.
The
bulk of the savings for
the Excellent Quality column
shown in
Table
9-25 would come from
shorter testing schedules
due to the use of
requirements,
design, and code
inspections. Other changes
that added
value
include the use of Team
Software Process (TSP),
static analysis
prior
to testing, and the
achievement of higher CMMI
levels.
638
Chapter
Nine
TABLE
9-25
Quality
Value for 1000Function
Point Applications
(Note:
1000 function points =
125,000 C statements)
Average
Excellent
Quality
Quality
Difference
Defects
per function point
4.50
2.50
2.00
Defect
potential
4,500
2,500
2,000
Defect
removal efficiency
93.00%
97.00%
4.00%
Defects
removed
4,185
2,425
1,760
Defects
delivered
315
75
240.00
Cost
per defect prerelease
$341
$417
$76
Cost
per defect
postrelease
$909
$1,136
$227
Development
schedule (calendar
months)
16
12
4
Development
staffing
7
7
0.00
Development
effort (staff months)
106
80
26
Development
costs
$1,056,595
$801,510
$255,086
Function
points per staff
month
9.46
12.48
3.01
LOC
per staff month
1,183
1,560
376.51
Maintenance
staff
2
2
0
Maintenance
effort (staff months)
26
6
20.00
Maintenance
costs (year 1)
$262,500
$62,500
$200,000
TOTAL
EFFORT
132
86
46
TOTAL
COST
$1,319,095
$864,010
$455,086
TOTAL
COST PER STAFF MEMBER
$158,291
$103,681
$54,610
TOTAL
COST PER FUNCTION POINT
$1,319.10
$864.01
$455
TOTAL
COST PER LOC
$10.55
$6.91
$3.64
AVERAGE
COST PER DEFECT
$625
$776
$152
In
the size range of 1000
function points, numerous
methods are fairly
effective.
For example, both Agile
development and extreme
program-
ming
report good results in this
size range as do the
Rational Unified
Process
(RUP) and the Team
Software Process
(TSP).
The
Value of Quality for Large
Applications
of
10,000 Function
Points
When
software applications reach
10,000 function points, they
are
very
significant systems that
require close attention to quality
control,
change
control, and corporate
governance. In fact, without
careful qual-
ity
and change control, the
odds of failure or cancellation
top 35 percent
for
this size range.
Note
that as application size
increases, defect potentials
increase rap-
idly
and defect removal
efficiency levels decline,
even with sophisticated
quality
control steps in place. This
is due to the exponential
increase in
Software
Quality: The Key to
Successful Software
Engineering
639
the
volume of paperwork for
requirements and design,
which often leads
to
partial inspections rather
than 100 percent
inspections. For
large
systems,
test coverage declines and
the number of test cases
mounts rap-
idly,
but cannot usually keep pace
with complexity. Table 9-26
shows the
increasing
value of quality as size
goes up to 10,000 function
points.
Cost
savings from better quality
increase as application sizes
increase.
The
general rule is that the
larger the software
application, the more
valu-
able
quality becomes. The same
principle is true for change
control, because
the
volume of creeping requirements
goes up with application
size.
For
large systems, the available
methods that demonstrate
improve-
ment
begin to decline. For
example, Agile methods are
difficult to apply,
and
when they are, the
results are not always good.
For large systems,
rigorous
methods such as the Rational
Unified Process (RUP) or
Team
Software
Process (TSP) yield the
best results and have
the greatest
amount
of empirical data.
TABLE
9-26
Quality
Value for 10,000Function
Point Applications
(Note:
10,000 function points =
1,250,000 C statements)
Average
Excellent
Quality
Quality
Difference
Defects
per function point
6.00
3.50
2.50
Defect
potential
60,000
35,000
25,000
Defect
removal efficiency
84.00%
96.00%
12.00%
Defects
removed
50,400
33,600
16,800
Defects
delivered
9,600
1,400
8,200
Cost
per defect prerelease
$341
$417
$76
Cost
per defect
postrelease
$833
$1,061
$227
Development
schedule (calendar
months)
40
28
12
Development
staffing
67
67
0.00
Development
effort (staff months)
2,654
1,836
818
Development
costs
$26,540,478
$18,361,525
$8,178,953
Function
points per staff
month
3.77
5.45
1.68
LOC
per staff month
471
681
209.79
Maintenance
staff
17
17
0
Maintenance
effort (staff months)
800
117
683.33
Maintenance
costs (year 1)
$8,000,000
$1,166,667
$6,833,333
TOTAL
EFFORT (STAFF MONTHS)
3,454
1,953
1501
TOTAL
COST
$34,540,478
$19,528,191
$15,012,287
TOTAL
COST PER STAFF MEMBER
$414,486
$234,338
$180,147
TOTAL
COST PER FUNCTION POINT
$3,454.05
$1,952.82
$1,501.23
TOTAL
COST PER LOC
$27.63
$15.62
$12.01
AVERAGE
COST PER DEFECT
$587
$739
$152
640
Chapter
Nine
The
Value of Quality for Very
Large
Applications
of 100,000 Function
Points
Software
applications in the 100,000function
point range are
among
the
most costly endeavors of
modern business. These large
systems
are
also hazardous, because many of
them fail, and almost
all of them
exceed
their budgets and planned
schedules.
Without
excellence in software quality
control, the odds of
complet-
ing
a software application of 100,000
function points are only
about
20
percent. The odds of
finishing it on time and
within budget hover
close
to 0 percent.
Even
with excellent quality control
and excellent change
control, mas-
sive
applications in the 100,000function
point range are
expensive
and
troublesome. Table 9-27
illustrates the two cases
for such massive
applications.
TABLE
9-27
Quality
Value for 100,000Function
Point Applications
(Note:
100,000 function points =
12,500,000 C statements)
Average
Excellent
Quality
Quality
Difference
Defects
per function point
7.00
4.00
3.00
Defect
potential
700,000
400,000
300,000
Defect
removal efficiency
81.00%
94.00%
13.00%
Defects
removed
567,000
376,000
191,000
Defects
delivered
133,000
24,000
109,000
Cost
per defect prerelease
$303
$379
$76
Cost
per defect
postrelease
$758
$985
$227
Development
schedule (calendar
months)
100
63
37
Development
staffing
667
667
0.00
Development
effort (staff months)
66,667
42,064
24,603
Development
costs
$666,666,667
$420,638,230 $246,028,437
Function
points per staff
month
1.50
2.38
0.88
LOC
per staff month
188
297
109.67
Maintenance
staff
167
167
0
Maintenance
effort (staff months)
11,083
2,000
9,083
Maintenance
costs (year 1)
$110,833,333
$20,000,000
$90,833,333
TOTAL
EFFORT
77,750
44,064
33686
TOTAL
COST
$777,500,000
$440,638,230 $336,861,770
TOTAL
COST PER STAFF MEMBER
$933,000
$528,766
$404,234
TOTAL
COST PER FUNCTION POINT
$7,775.00
$4,406.38
$3,368.62
TOTAL
COST PER LOC
$62.20
$352.51
$290.31
AVERAGE
COST PER DEFECT
$530
$682
$152
Software
Quality: The Key to
Successful Software
Engineering
641
There
are several reasons why
defect potentials are so
high for mas-
sive
applications and why defect
removal efficiency levels
are reduced.
The
first reason is that for
such massive applications,
requirements
changes
will be so numerous that they
exceed most companies'
ability
to
control them well.
The
second reason is that
paperwork volumes tend to
rise with applica-
tion
size, and this slows
down activities such as
inspections of requirements
and
design. As a result, massive
applications tend to use partial
inspec-
tions
rather than 100 percent
inspections of major deliverable
items.
A
third reason, which was
worked out mathematically at IBM in
the
1970s,
is that the number of test
cases needed to achieve 90
percent
coverage
of code rise exponentially with
size. In fact, the number of
test
cases
required to fully test a
massive system of 100,000
function points
approaches
infinity. As a result, testing
efficiency declines with
size,
even
though static analysis and
inspections stay about the
same.
A
useful rule of thumb for
predicting overall number of
test cases is to
raise
application size in function
points to the 1.2 power. As
can be seen,
test
case volumes rise very
rapidly, and most companies
cannot keep
pace,
so test coverage declines. Automated
static analysis is still
effec-
tive.
Inspections are also
effective, but for 100,000
function points,
partial
inspections
of key deliverables are the
norm rather than 100
percent
inspections.
This is because paperwork volumes
also rise
exponentially
with
size.
Return
on Investment in Software
Quality
As
already mentioned, the value
of software quality goes up as
appli-
cation
size goes up. Table
9-28 calculates the
approximate return on
investment
for the "excellent" case
studies of 100 function
points, 1000
function
points, 10,000 function
points, and 100,000 function
points.
Here
too the assumptions are
simplified to make calculations
easy
and
understandable. The basic
assumption is that every
software team
member
needs five days of training to
get up to speed in software
inspec-
tions
and the Team Software
Process (TSP). These
training days are
then
multiplied by average hourly
costs of $75.75 per
employee.
These
training expenses are then
divided into the total
savings figure
that
includes both development
and maintenance savings due
to high
quality.
The final result is the
approximate ROI based on dividing
value
by
training expenses. Table
9-28 illustrates the ROI
calculations.
The
ROI figure reflects the
total savings divided by the
total train-
ing
expenses needed to bring
team members up to speed in
quality
technologies.
In
real life, these simple
assumptions would vary
widely, and other
factors
might also be considered.
Even so, high levels of
software quality
642
Chapter
Nine
TABLE
9-28
Return
on Investment in Software Quality
Function
point size
100
1,000
10,000
100,000
Education
hours
80
560
5,360
53,360
Education
costs
$6,060
$42,420
$406,020
$4,042,020
Savings
from high quality
$26,865
$455,086
$15,012,287
$336,861,770
Return
on investment (ROI)
$4.43
$10.73
$36.97
$83.34
have
a very solid return on
investment due to the
reduction in develop-
ment
schedules, development costs,
and maintenance
costs.
There
may be many other topics
where software engineers and
man-
agers
need training, and there
may be other cost elements
such as the
costs
of ascending to the higher
levels of the capability
maturity model.
While
the savings from high
quality are frequently
observed, the exact
ROI
will vary based on the way
training and process improvement
work
is
handled under local
accounting rules.
If
the reduced risks of
cancelled projects or major
overruns were
included
in the ROI calculations, the
value would be even
higher.
Other
technologies such as high
volumes of certified reusable
mate-
rial
would also have a beneficial
impact on both quality and
productiv-
ity.
However, as this book is
written in 2009, only
limited sources are
available
for certified reusable
materials. Uncertified reuse is
hazardous
and
may even be harmful rather
than beneficial.
Summary
and Conclusions
In
spite of the fact that
the software industry spends
more money on
finding
and fixing bugs than
any other activity, software
quality remains
ambiguous
and poorly covered in the
software engineering
literature.
There
are dozens of books on
software quality and
testing, but hardly
any
of them contain quantitative
data on defect volumes,
numbers of
test
cases, test coverage, or the
costs associated with defect
removal
activities.
Even
worse, much of the
literature on quality merely
cites urban
legends
of how "cost per defect
rises throughout development
and into
the
field," without realizing
that such a trend is caused
by ignoring
fixed
costs.
Software
quality does have value,
and the value increases as
applica-
tion
sizes get bigger. In fact,
without excellence in quality
control, even
completing
a large software application is
highly unlikely.
Completing
it
on time and within budget in
the absence of excellent quality
control
is
essentially impossible.
Software
Quality: The Key to
Successful Software
Engineering
643
Readings
and References
Beck,
Kent. Test-Driven
Development. Boston,
MA: Addison Wesley,
2002.
Chelf,
Ben and Raoul Jetley.
Diagnosing
Medical Device Software
Defects Using
Static
Analysis.
San
Francisco, CA: Coverity
Technical Report,
2008.
Chess,
Brian and Jacob West.
Secure
Programming with Static
Analysis. Boston,
MA:
Addison
Wesley, 2007.
Cohen,
Lou. Quality
Function Deployment--How to Make
QFD Work for You.
Upper
Saddle
River, NJ: Prentice Hall,
1995.
Crosby,
Philip B. Quality
is Free. New
York, NY: New American
Library, Mentor
Books,
1979.
Everett,
Gerald D. and Raymond
McLeod. Software
Testing. Hoboken,
NJ: John Wiley &
Sons,
2007.
Gack,
Gary. Applying Six Sigma to
Software Implementation Projects.
http://software
.isixsigma.com/library/content/c040915b.asp.
Gilb,
Tom and Dorothy Graham.
Software
Inspections. Reading,
MA: Addison Wesley,
1993.
Hallowell,
David L. Six Sigma Software
Metrics, Part 1.
http://software.isixsigma.com/
library/content/c03910a.asp.
International
Organization for Standards. ISO
9000 / ISO 14000.
http://www.iso.org/iso/
en/iso9000-14000/index.html.
Jones,
Capers. Software
Quality--Analysis and Guidelines
for Success. Boston,
MA:
International
Thomson Computer Press,
1997.
Kan,
Stephen H. Metrics
and Models in Software
Quality Engineering, Second
Edition.
Boston,
MA: Addison Wesley Longman,
2003.
Land,
Susan K., Douglas B. Smith,
John Z. Walz. Practical
Support for Lean Six
Sigma
Software
Process Definition: Using IEEE Software
Engineering Standards. Los
Alamitos,
CA; Wiley-IEEE Computer
Society Press, 2008.
Mosley,
Daniel J. The
Handbook of MIS Application Software
Testing. Englewood
Cliffs,
NJ:
Yourdon Press, Prentice
Hall, 1993.
Myers,
Glenford. The
Art of Software Testing. New
York, NY: John Wiley &
Sons, 1979.
Nandyal.
Raghav. Making
Sense of Software Quality Assurance.
New
Delhi: Tata
McGraw-Hill
Publishing, 2007.
Radice,
Ronald A. High
Quality Low Cost Software
Inspections. Andover,
MA:
Paradoxicon
Publishing, 2002.
Wiegers,
Karl E. Peer
Reviews in Software--A Practical
Guide. Boston,
MA: Addison
Wesley
Longman, 2002.
This
page intentionally left
blank
Table of Contents:
|
|||||