ZeePedia

Software Quality: The Key to Successful Software Engineering, Measuring Software Quality, Software Defect Removal

<< Programming and Code Development:How Many Programming Languages Are Really Needed?
Index >>
img
Chapter
9
Software Quality: The Key to
Successful Software Engineering
Introduction
The overall software quality averages for the United States have
scarcely changed since 1979. Although national data is flat for quality,
a few companies have made major improvements. These happen to be
companies that measure quality because they define quality in such a
way that both prediction and measurement are possible.
The same companies also use full sets of defect removal activities that
include inspections and static analysis as well as testing. Defect preven-
tion methods such as joint application design (JAD) and development
methods that focus on quality such as Team Software Process (TSP)
are also used, once the importance of quality to successful software
engineering is realized.
Historically, large software projects spend more time and effort on
finding and fixing bugs than on any other activity. Because software
defect removal efficiency only averages about 85 percent, the major
costs of software maintenance are finding and fixing bugs accidentally
released to customers.
When development defect removal is added to maintenance defect
removal, the major cost driver for total cost of ownership (TCO) is that
of defect removal. Between 30 percent and 50 percent of every dollar
ever spent on software has gone to finding and fixing bugs.
When software projects run late and exceed their budgets, a main
reason is excessive defect levels, which slow down testing and force
applications into delays and costly overruns.
555
556
Chapter Nine
When software projects are cancelled and end up in court for breach
of contract, excessive defect levels, inadequate defect removal, and poor
quality measures are associated with every case.
Given the fact that software defect removal costs have been the pri-
mary cost driver for all major software projects for the past 50 years, it
is surprising that so little is known about software quality.
There are dozens of books about software quality and testing, but very
few of these books actually contain solid and reliable quantified data
about basic topics such as:
1. How many bugs are going to be present in specific new software
applications?
2. How many bugs are likely to be present in legacy software applica-
tions?
3. How can software quality be predicted and measured?
4. How effective are ISO standards in improving quality?
5. How effective are software quality assurance organizations in
improving quality?
6. How effective is software quality assurance certification for improv-
ing quality?
7. How effective is Six Sigma for improving quality?
8. How effective is quality function deployment (QFD) for improving
quality?
9. How effective are the higher levels of the CMMI in improving
quality?
10. How effective are the forms of Agile development in improving
quality?
11. How effective is the Rational Unified Process (RUP) in improving
quality?
12. How effective is the Team Software Process (TSP) in improving
quality?
13. How effective are the ITIL methods in improving quality?
14. How effective is service-oriented architecture (SOA) for improving
quality?
15. How effective are certified reusable components for improving
quality?
16. How many bugs can be eliminated by inspections?
17. How many bugs can be eliminated by static analysis?
18. How many bugs can be eliminated by testing?
Software Quality: The Key to Successful Software Engineering
557
19. How many different kinds of testing are needed?
20. How many test personnel are needed?
21. How effective are test specialists compared with developers?
22. How effective is automated testing?
23. How many test cases are needed for applications of various sizes?
24. How effective is test certification in improving performance?
25. How many bug repairs will themselves include new bugs?
26. How many bugs will get delivered to users?
27. How much does it cost to improve software quality?
28. How long does it take to improve software quality?
29. How much will we save from improving software quality?
30. How much is the return on investment (ROI) for better software
quality?
This purpose of this chapter is to show the quantified results of every
major form of quality assurance activity, inspection stage, static analysis,
and testing stage on the delivered defect levels of software applications.
Defect removal comes in "private" and "public" forms. The private
forms of defect removal include desk checking, static analysis, and unit
testing. They are also covered in Chapter 8, because they concentrate
on code defects, and that chapter deals with programming and code
development.
The public forms of defect removal include formal inspections, static
analysis if run by someone other than the software engineer who wrote
the code, and many kinds of testing carried out by test specialists rather
than the developers.
Both private and public forms of defect removal are important, but
it is harder to get data on the private forms because they usually occur
with no one else being present other than the person who is doing
the desk checking or unit testing. As pointed out in Chapter 8, IBM
used volunteers to record defects found via private removal activities.
Some development methods such as Watts Humphrey's Team Software
Process (TSP) and Personal Software Process (PSP) also record private
defect removal.
This chapter will also explain how to predict the number of bugs or
defects that might occur, and how to predict defect removal efficiency
levels. Not only code bugs, but also bugs or defects in requirements,
design, and documents need to be predicted. In addition, new bugs acci-
dentally included in bug repairs need to be predicted. These are called
"bad fixes." Finally, there are also bugs or errors in test cases them-
selves, and these need to be predicted, too.
558
Chapter Nine
This chapter will discuss the best ways of measuring quality and will
caution against hazardous metrics such as "cost per defect" and "lines
of code," which distort results and conceal the real facts of software
quality. In this chapter, several critical software quality topics will be
discussed:
Defining Software Quality
Predicting Software Quality
Measuring Software Quality
Software Defect Prevention
Software Defect Removal
Specialists in Software Quality
The Economic Value of Software Quality
Software quality is the key to successful software engineering.
Software has long been troubled by excessive numbers of software
defects both during development and after release. Technologies are
available that can reduce software defects and improve quality by sig-
nificant amounts.
Carefully planning and selecting an effective combination of defect
prevention and defect removal activities can shorten software develop-
ment schedules, lower software development costs, significantly reduce
maintenance and customer support costs, and improve both customer
satisfaction and employee morale at the same time. Improving software
quality has the highest return on investment of any current form of
software process improvement.
As the recession continues, every company is anxious to lower both
software development and software maintenance costs. Improving soft-
ware quality will assist in improving software economics more than any
other available technology.
Defining Software Quality
A good definition for software quality is fairly difficult to achieve. There
are many different definitions published in the software literature.
Unfortunately, some of the published definitions for quality are either
abstract or off the mark. A workable definition of software quality needs
to have six fundamental features:
1. Quality should be predictable before a software application starts.
2. Quality needs to encompass all deliverables and not just the code.
3. Quality should be measurable during development.
Software Quality: The Key to Successful Software Engineering
559
4. Quality should be measurable after release to customers.
5. Quality should be apparent to customers and recognized by them.
6. Quality should continue after release, during maintenance.
Here are some of the published definitions for quality, and explana-
tions of why some of them don't seem to conform to the six criteria just
listed.
Quality Definition 1: "Quality means
conformance to requirements."
There are several problems with this definition, but the major problem
is that requirements errors or bugs are numerous and severe. Errors in
requirements constitute about 20 percent of total software defects and
are responsible for more than 35 percent of high-severity defects.
Defining quality as conformance to a major source of error is circular
reasoning, and therefore this must be considered to be a flawed and
unworkable definition. Obviously, a workable definition for quality has
to include errors in requirements themselves.
Don't forget that the famous Y2K problem originated as a specific user
requirement and not as a coding bug. Many software engineers warned
clients and managers that limiting date fields to two digits would cause
problems, but their warnings were ignored or rejected outright.
The author once worked (briefly) as an expert witness in a lawsuit
where a company attempted to sue an outsource vendor for using two-
digit date fields in a software application developed under contract.
During the discovery phase, it was revealed that the vendor cautioned
the client that two-digit date fields were hazardous, but the client
rejected the advice and insisted that the Y2K problem be included in
the application. In fact, the client's own internal standards mandated
two-digit date fields. Needless to say, the client dropped the suit when it
became evident that they themselves were the cause of the problem. The
case illustrates that "user requirements" are often wrong and sometimes
even dangerous or "toxic."
It also illustrates another point. Neither the corporate executives nor
the legal department of the plaintiff knew that the Y2K problem had
been caused by their own policies and practices. Obviously, there is a
need for better governance of software from the top when problems such
as this are not understood by corporate executives.
Using modern terminology from the recession, it is necessary to
remove "toxic requirements" before conformance can be safe. The defi-
nition of quality as "conformance to requirements" does not lead to any
significant quality improvements over time. No more requirements are
being met in 2009 than in 1979.
560
Chapter Nine
If software engineering is to become a true profession rather than an
art form, software engineers have a responsibility to help customers
define requirements in a thorough and effective manner. It is the job
of a professional software engineer to insist on effective requirements
methods such as joint application design (JAD), quality function deploy-
ment (QFD), and requirements inspections.
Far too often the literature on software quality is passive and makes
the incorrect assumption that users will be 100 percent effective in
identifying requirements. This is a dangerous assumption. User require-
ments are never complete and they are often wrong. For a software
project to succeed, requirements need to be gathered and analyzed in
a professional manner, and software engineering is the profession that
should know how to do this well.
It should be the responsibility of the software engineers to insist that
proper requirements methods be used. These include joint application
design (JAD), quality function deployment (QFD), and requirements
inspections. Other methods that benefit requirements, such as embedded
users or use-cases, might also be recommended. The users themselves
are not software engineers and cannot be expected to know optimal
ways of expressing and analyzing requirements. Ensuring that require-
ments collection and analysis are at state-of-the-art levels devolves to
the software engineering team.
Once user requirements have been collected and analyzed, then con-
formance to them should of course occur. However, before conformance
can be safe and effective, dangerous or toxic requirements have to be
weeded out, excess and superfluous requirements should be pointed
out to the users, and potential gaps that will cause creeping require-
ments should be identified and also quantified. The users themselves
will need professional assistance from the software engineering team,
who should not be passive bystanders for requirements gathering and
analysis.
Unfortunately, requirements bugs cannot be removed by ordinary
testing. If requirements bugs are not prevented from occurring, or not
removed via formal inspections, test cases that are constructed from the
requirements will confirm the errors and not find them. (This is why
years of software testing never found and removed the Y2K problem.)
A second problem with this definition is that it is not predictable
during development. Conformance to requirements can be measured
after the fact, but that is too late for cost-effective recovery.
A third problem with this definition is that for brand-new kinds of
innovative applications, there may not be any users other than the
original inventor. Consider the history of successful software innovation
such as the APL programming language, the first spreadsheet, and the
early web search engine that later became Google.
Software Quality: The Key to Successful Software Engineering
561
These innovative applications were all created by inventors to solve
problems that they themselves wanted to solve. They were not created
based on the normal concept of "user requirements." Until prototypes
were developed, other people seldom even realized how valuable the
inventions would be. Therefore, "user requirements" are not completely
relevant to brand-new inventions until after they have been revealed
to the public.
Given the fact that software requirements grow and change at mea-
sured rates of 1 percent to more than 2 percent every calendar month
during the subsequent design and coding phases, it is apparent that
achieving a full understanding of requirements is a difficult task.
Software requirements are important, but the combination of toxic
requirements, missing requirements, and excess requirements makes
simplistic definitions such as "quality means conformance to require-
ments" hazardous to the software industry.
Quality Definition 2: "Quality means
reliability, portability, and many other -ilities."
The problem with defining quality as a set of words ending with ility is
that many of these factors are neither predictable before they occur nor
easily measurable when they do occur.
While most of the -ility words are useful properties for software
applications, some don't seem to have much to do with quality as we
would consider the term for a physical device such as an automobile or
a toaster. For example, "portability" may be useful for a software vendor,
but it does not seem to have much relevance to quality in the eyes of a
majority of users.
The use of -ility words to define quality does not lead to quality
improvements over time. In 2009, the software industry is no better in
terms of many of these -ilities than it was in 1979. Using modern lan-
guage from the recession, many of the -ilities are "subprime" definitions
that don't prevent serious quality failures. In fact, using -ilities rather
than focusing on defect prevention and removal slows down progress
on software quality control.
Among the many words that are cited when using this definition can
be found (in alphabetical order):
1. Augmentability
2. Compatibility
3. Expandability
4. Flexibility
5. Interoperability
562
Chapter Nine
6. Maintainability
7. Manageability
8. Modifiability
9. Operability
10. Portability
11. Reliability
12. Scalability
13. Survivability
14. Understandability
15. Usability
16. Testability
17. Traceability
18. Verifiability
Of the words on this list, only a few such as "reliability" and "test-
ability" seem to be relevant to quality as viewed by users. The other
terms range from being obscure (such as "survivability") to useful but
irrelevant (such as "portability"). Other terms may be of interest to the
vendor or development team, but not to customers (such as "maintain-
ability").
The -ility words seem to have an academic origin because they don't
really address some of the real-world quality issues that bother cus-
tomers. For example, none of these terms addresses ease or difficulty
of reaching customer support to get help when a bug is noted or the
software misbehaves. None of the terms deals with the speed of fixing
bugs and providing the fix to users in a timely manner.
The new Information Technology Infrastructure Library (ITIL) does a
much better job of dealing with issues of quality in the eyes of users, such
as customer support, incident management, and defect repairs intervals
than does the standard literature dealing with software quality.
More seriously, the list of -ility words ignores two of the main topics
that have a major impact on software quality when the software is
finally released to customers: (1) defect potentials and (2) defect removal
efficiency levels.
The term defect potential refers to the total quantity of defects that
will likely occur when designing and building a software application.
Defect potentials include bugs or defects in requirements, design, code,
user documents, and bad fixes or secondary defects. The term defect
removal efficiency refers to the percentage of defects found by any
sequence of inspection, static analysis, and test stages.
Software Quality: The Key to Successful Software Engineering
563
To reach acceptable levels of quality in the view of customers, a com-
bination of low defect potentials and high defect removal efficiency rates
(greater than 95 percent) is needed. The current U.S. average for soft-
ware quality is a defect potential of about 5.0 bugs per function point
coupled with 85 percent defect removal efficiency. This combination
yields a total of delivered defects of about 0.75 per function point, which
the author regards as unprofessional and unacceptable.
Defect potentials need to drop below 2.5 per function point and defect
removal efficiency needs to average greater than 95 percent for software
engineering to be taken seriously as a true engineering discipline. This
combination would result in a delivered defect total of only 0.125 defect per
function point or about one-sixth of today's averages. Achieving or exceed-
ing this level of quality is possible today in 2009, but seldom achieved.
One of the reasons that good quality is not achieved as widely as it
might be is that concentrating on the -ility topics rather than measuring
defects and defect removal efficiency leads to gaps and failures in defect
removal activities. In other words, the -ilities definitions of quality are
a distraction from serious study of software defect causes and the best
methods of preventing and removing software defects.
Specific levels of defect potentials and defect removal efficiency levels
could be included in outsource agreements. These would probably be
more effective than current contracting practices for quality, which are
often nonexistent or merely insist on a certain CMMI level.
If software is released with excessive quantities of defects so that it
stops, behaves erratically, or runs slowly, it will soon be discovered that
most of the -ility words fall by the wayside.
Defect quantities in released software tend to be the paramount qual-
ity issue with users of software applications, coupled with what kinds of
corrective actions the software vendor will take once defects are reported.
This brings up a third and more relevant definition of software quality.
Quality Definition 3: "Quality is the absence
of defects that would cause an application to
stop working or to produce incorrect results."
A software defect is a bug or error that causes software to either stop
operating or to produce invalid or unacceptable results. Using IBM's
severity scale, defects have four levels of severity:
Severity 1 means that the software application does not work at all.
Severity 2 means that major functions are disabled or produce incor-
rect results.
Severity 3 means that there are minor issues or minor functions are
not working.
Severity 4 means a cosmetic problem that does not affect operation.
564
Chapter Nine
There is some subjectivity with these defect severity levels because
they are assigned by human beings. Under the IBM model, the ini-
tial severity level is assigned when the bug is first reported, based on
symptoms described by the customer or user who reported the defect.
However, a final severity level is assigned by the change team when the
defect is repaired.
This definition of quality is one favored by the author for several reasons.
First, defects can be predicted before they occur and measured when they
do occur. Second, customer satisfaction surveys for many software applica-
tions appear to correlate more closely to delivered defect levels than to any
other factor. Third, many of the -ility factors also correlate to defects, or to
the absence of defects. For example, reliability correlates exactly to the
number of defects found in software. Usability, testability, traceability, and
verifiability also have indirect correlations to software defect levels.
Measuring defect volumes and defect severity levels and then taking
effective steps to reduce those volumes via a combination of defect pre-
vention and defect removal activities is the key to successful software
engineering.
This definition of software quality does lead to quality improvements
over time. The companies that measure defect potentials, defect removal
efficiency levels, and delivered defects have improved both factors by
significant amounts. This definition of quality supports process improve-
ments, predicting quality, measuring quality, and customer satisfaction
as measured by surveys.
Therefore, companies that measure quality such as IBM, Dovél
Technologies, and AT&T have made progress in quality control. Also,
methods that integrate defect tracking and reporting such as Team
Software Process (TSP) have made significant progress in reducing
delivered defects. This is also true for some open-source applications
that have added static-analysis to their suite of defect removal tools.
Defect and removal efficiency measures have been used to validate
the effectiveness of formal inspections, show the impact of static analy-
sis, and fine-tune more than 15 kinds of testing. The subjective mea-
sures have no ability to deal with such issues.
Every software engineer and every software project manager should
be trained in methods for predicting software defects, measuring soft-
ware defects, preventing software defects, and removing software
defects. Without knowledge of effective quality and defect control, soft-
ware engineering is a hoax.
The full definition of quality suggested by the author includes these
nine factors:
1. Quality implies low levels of defects when software is deployed,
ideally approaching zero defects.
Software Quality: The Key to Successful Software Engineering
565
2. Quality implies high reliability, or being able to run without stop-
page or strange and unexpected results or sluggish performance.
3. Quality implies high levels of user satisfaction when users are sur-
veyed about software applications and its features.
4. Quality implies a feature set that meets the normal operational
needs of a majority of customers or users.
5. Quality implies a code structure and comment density that minimize
bad fixes or accidentally inserting new bugs when attempting to repair
old bugs. This same structure will facilitate adding new features.
6. Quality implies effective customer support when problems do occur,
with minimal difficulty for customers in contacting the support
team and getting assistance.
7. Quality implies rapid repairs of known defects, and especially so
for high-severity defects.
8. Quality should be supported by meaningful guarantees and war-
ranties offered by software developers to software users.
9. Effective definitions of quality should lead to quality improvements.
This means that quality needs to be defined rigorously enough so
that both improvements and degradations can be identified, and
also averages. If a definition for quality cannot show changes or
improvements, then it is of very limited value.
The 6th, 7th, 8th, and 9th of these quality issues tend to be sparsely
covered by the literature on software quality, other than the new ITIL
books. Unfortunately, the ITIL coverage is used only for internal software
applications and is essentially ignored by commercial software vendors.
The definition of quality as an absence of defects, combined with sup-
plemental topics such as ease of customer support and maintenance
speed, captures the essence of quality in the view of many software
users and customers.
Consider how the three definitions of quality discussed in this chapter
might relate to a well-known software product such as Microsoft Vista.
Vista has been selected as an example because it is one of the best-
known large software applications in the world, and therefore a good
test bed for trying out various quality definitions.
Applying Definition 1 to Vista: "Quality
means conformance to requirements."
The first definition would be hard to use for Vista, since no ordinary cus-
tomers were asked what features they wanted in the operating system,
although focus groups were probably used at some point.
566
Chapter Nine
If you compare Vista with XP, Leopard, or Linux, it seems to include
a superabundance of features and functions, many of which were
neither requested nor ever used by a majority of users. One topic
that the software engineering literature does not cover well, or at
all, is that of overstuffing applications with unnecessary and useless
features.
Most people know that ordinary requirements usually omit about
20 percent of functions that users want. However, not many people
know that for commercial software put out by companies such as
Microsoft, Symantec, Computer Associates, and the like, applications
may have more than 40 percent features that customers don't want and
never use.
Feature stuffing is essentially a competitive move to either imitate
what competitors do, or to attempt to pull ahead of smaller competi-
tors by providing hundreds of costly but marginal features that small
competitors could not imitate. In either case, feature stuffing is not a
satisfactory conformance to user requirements.
Further, certain basic features such as security and performance,
which users of operating systems do appreciate, are not particularly
well embodied in Vista.
The bottom line is that defining quality as conformance to require-
ments is almost useless for applications with greater than 1 million
users such as Vista, because it is impossible to know what such a large
group will want or not want.
Also, users seldom are able to articulate requirements in an effective
manner, so it is the job of professional software engineers to help users
in defining requirements with care and accuracy. Too often the software
literature assumes that software engineers are only passive observers of
user requirements, when in fact, software engineers should be playing
the role of physicians who are diagnosing medical conditions in order
to prescribe effective therapies.
Physicians don't just passively ask patients what the problem is and
what kind of medicine they want to take. Our job as software engineers
is to have professional knowledge about effective requirement gather-
ing and analysis methods (i.e., like medical diagnostic tests) and to also
know what kinds of applications might provide effective "therapies" for
user needs.
Passively waiting for users to define requirements without assisting
them in using joint application design (JAD) or quality function deploy-
ment (QFD) or data mining of legacy applications is unprofessional on
the part of the software engineering community. Users are not trained
in requirements definition, so we need to step up to the task of assist-
ing them.
img
Software Quality: The Key to Successful Software Engineering
567
Applying Definition 2 to Vista: "Quality
means adherence to -ility terms."
When Vista is judged by matching its features against the list of -ility
terms shown earlier, it can be seen how abstract and difficult to apply
such a list really is
1.
Augmentability
Ambiguous and difficult to apply to Vista
2.
Compatibility
Poor for Vista; many old applications don't work
3.
Expandability
Applicable to Vista and fairly good
4.
Flexibility
Ambiguous and difficult to apply to Vista
5.
Interoperability
Ambiguous and difficult to apply to Vista
6.
Maintainability
Unknown to users but probably poor for Vista
7.
Manageability
Ambiguous and difficult to apply to Vista
8.
Modifiability
Unknown to users but probably poor for Vista
9.
Operability
Ambiguous and difficult to apply to Vista
10.
Portability
Poor for Vista
11.
Reliability
Originally poor for Vista but improving
12.
Scalability
Marginal for Vista
13.
Survivability
Ambiguous and difficult to apply to Vista
14.
Understandability
Poor for Vista
15.
Usability
Asserted to be good for Vista, but questionable
16.
Testability
Poor for Vista: complexity far too high
17.
Traceability
Poor for Vista: complexity far too high
18.
Verifiability
Ambiguous and difficult to apply to Vista
The bottom line is that more than half of the -ility words are difficult
or ambiguous to apply to Vista or any other commercial software appli-
cation. Of the ones that can be applied to Vista, the application does not
seem to have satisfied any of them but expandability and usability.
Many of the -ility words cannot be predicted nor can they be mea-
sured. Worse, even if they could be predicted and measured, they are of
marginal interest in terms of serious quality control.
Applying Definition 3 to Vista: "Quality means
an absence of defects, plus corollary factors."
Released defects can and should be counted for every software applica-
tion. Other related topics such as ease of reporting defects and speed of
repairing defects should also be measured.
Unfortunately, for commercial software, not all of these nine topics
can be evaluated. Microsoft together with many other software vendors
does not publish data on bad-fix injections or even on total numbers
568
Chapter Nine
of bugs reported. However, six of the eight factors can be evaluated by
means of journal articles and limited Microsoft data.
1. Vista was released with hundreds or thousands of defects, although
Microsoft will not provide the exact number of defects found and
reported by users.
2. At first Vista was not very reliable, but achieved acceptable reli-
ability after about a year of usage. Microsoft does not report data
on mean time to failure or other measures of reliability.
3. Vista never achieved high levels of user satisfaction compared with
XP. The major sources of dissatisfaction include lack of printer driv-
ers, poor compatibility with older applications, excessive resource
usage, and sluggish performance on anything short of high-end
computer chips and lots of memory.
4. The feature set of Vista has been noted as adequate in customer
surveys, other than excessive security vulnerabilities.
5. Microsoft does not release statistics on bad-fix injections or on num-
bers of defect reports, so this factor cannot be known by the general
public.
6. Microsoft customer support is marginal and troublesome to access
and use. This is a common failing of many software vendors.
7. Some known bugs have remained in Microsoft Vista for several
years. Microsoft is marginally adequate in defect repair speed.
8. There is no effective warranty for Vista (or for other commercial
applications). Microsoft's end-user license agreement (EULA)
absolves Microsoft of any liabilities other than replacing a defec-
tive disk.
9. Microsoft's new operating system is not yet available as this book
is published, so it is not possible to know if Microsoft has used
methods that will yield better quality than Vista. However, since
Microsoft does have substantial internal defect tracking and quality
assurance methods, hopefully quality will be better. Microsoft has
shown some improvements in quality over time.
Based on this pattern of analysis for the nine factors, it cannot be said
that Vista is a high-quality application under any of the definitions. Of
the three major definitions, defining quality as conformance to require-
ments is almost impossible to use with Vista because with millions of
users, nobody can define what everybody wants.
The second definition of quality as a string of -ility words is difficult
to apply, and many are irrelevant. These words might be marginally
useful for small internal applications, but are not particularly helpful
Software Quality: The Key to Successful Software Engineering
569
for commercial software. Also, many key quality issues such as cus-
tomer support and maintenance repair times are not found in any of
the -ility words.
The third definition that centers on defects, customer support, defect
repairs, and better warranties seems to be the most relevant. The third
also has the advantage of being both predictable and measurable, which
the first two lack.
Given the high costs of commercial software, the marginal or use-
less warranties of commercial software, and the poor customer sup-
port offered by commercial software vendors, the author would favor
mandatory defect reporting that required commercial vendors such as
Microsoft to produce data on defects reported by customers, sorted by
severity levels.
Mandatory defect reporting is already a requirement for many prod-
ucts that affect human life or safety, such as medicines, aircraft engines,
automobiles, and many other consumer products. Mandatory reporting
of business and financial information is also required. Software affects
human life and safety in critical ways, and it affects business operations
in critical ways, but to date software has been exempt from serious study
due to the lack of any mandate for measuring and reporting released
defect levels.
Somewhat surprisingly, the open-source software community appears
to be pulling ahead of old-line commercial software vendors in terms
of measuring and reporting defects. Many open-source companies have
added defect tracking and static-analysis tools to their quality arsenal,
and are making data available to customers that is not available from
many commercial software vendors.
The author would also favor a "lemon law" for commercial software
similar to the lemon law for automobiles. If serious defects occur that
users cannot get repaired when making good-faith effort to resolve the
situation with vendors, vendors should be required to return the full
purchase or lease price of the offending software application.
A form of lemon law might also be applied to outsource contracts,
except the litigation already provides relief for outsource failures that
cannot be used against commercial software vendors due to their one-
sided EULA agreements, which disclaim any responsibility for quality
other than media replacement.
No doubt software vendors would object to both mandatory defect
tracking and also to a lemon law. But shrewd and farsighted vendors
would soon perceive that both topics offer significant competitive advan-
tages to software companies that know how to control quality. Since
high-quality software is also cheaper and faster to develop and has
lower maintenance costs than buggy software, there are even more
important economic advantages for shrewd vendors.
570
Chapter Nine
The author hypothesizes that a combination of mandatory defect
reporting by software vendors plus a lemon law would have the effect
of improving software quality by about 50 percent every five years for
perhaps a 20-year period.
Software quality needs to be taken much more seriously than it has
been. Now that the recession is expanding, better software quality con-
trol is one of the most effective strategies for lowering software costs.
But effective quality control depends on better measures of quality
and on proven combinations of defect prevention and defect removal
activities.
Quality prediction, quality measurement, better defect prevention,
and better defect removal are on the critical path for advancing software
engineering to the status of a true engineering discipline instead of
a craft or art form as it is today in 2009.
Defining and Predicting Software Defects
If delivered defects are the main quality problem for software, it is
important to know what causes these defects, so that they can be pre-
vented from occurring or removed before delivery.
The software quality literature includes a great deal of pedantic
bickering about various terms such as "fault," "error," "bug," "defect"
and many other terms. For this book, if software stops working, won't
load, operates erratically, or produces incorrect results due to mis-
takes in its own code, then that is called a "defect." (This same defi-
nition has been used in 14 of the author's previous books and also in
more than 30 journal articles. The author's first use of this definition
started in 1978.)
However, in the modern world, the same set of problems can occur
without the developers or the code being the cause. Software infected
by a virus or spyware can also stop working, refuse to load, operate
erratically, and produce incorrect results. In today's world, some defect
reports may well be caused by outside attacks.
Attacks on software from hackers are not the same as self-inflicted
defects, although successful attacks do imply security vulnerabilities.
In this book and the author's previous books, software defects have
five main points of origin:
1. Requirements
2. Design
3. Code
4. User documents
5. Bad fixes (new defects due to repairs of older defects)
img
Software Quality: The Key to Successful Software Engineering
571
Because the author worked for IBM when starting research on quality,
the IBM severity scale for classifying defect severity levels is used in this
book and the author's previous books. There are four severity levels:
Severity 1: Software does not operate at all
Severity 2: Major features disabled or incorrect
Severity 3: Minor features disabled or incorrect
Severity 4: Cosmetic error that does not affect operation
There are other methods of classifying severity levels, but these four
are the most common due to IBM introducing them in the 1960s, so they
became a de facto standard.
Software defects have seven kinds of causes, with the major causes
including
Errors of omission:
Something needed was accidentally left out
Errors of commission:
Something needed is incorrect
Errors of ambiguity:
Something is interpreted in several ways
Errors of performance:
Some routines are too slow to be useful
Errors of security:
Security vulnerabilities allow attacks from outside
Errors of excess:
Irrelevant code and unneeded features are included
Errors of poor removal:
Defects that should easily have been found
These seven causes occur with different frequencies for different
deliverables. For paper documents such as requirements and design,
errors of ambiguity are most common, followed by errors of omission.
For source code, errors of commission are most common, followed by
errors of performance and security.
The seventh category, "errors of poor removal," would require root-cause
analysis for identification. The implication is that the defect was neither
subtle nor hard to find, but was missed because test cases did not cover the
code segment or because of partial inspections that overlooked the defect.
In a sense, all delivered defects might be viewed as errors of poor
removal, but it is important to find out why various kinds of inspection,
static analysis, or testing missed obvious bugs. This category should not
be assigned for subtle defects, but rather for obvious defects that should
have been found but for some reason escaped to the outside world.
The main reason for including errors of poor removal is to encourage
more study and research on the effectiveness of various kinds of defect
removal operations. More solid data is needed on the removal efficiency
levels of inspections, static analysis, automatic testing, and all forms of
manual testing.
The combination of defect origins, defect severity, and defect causes
provides a useful taxonomy for classifying defects for statistical analysis
or root-cause analysis. For example, the Y2K problem was cited earlier
img
572
Chapter Nine
in this chapter. In its most common manifestation, the Y2K problem
might have this description using the taxonomy just discussed:
Y2K origin:
Requirements
Y2K severity:
Severity 2 major features disabled
Y2K primary cause:
Error of commission
Y2K secondary cause:
Error of poor removal
Note that this taxonomy allows the use of primary and secondary fac-
tors since sometimes more than one problem is behind having a defect
in software.
Note also that the Y2K problem did not have the same severity for
every application. An approximate distribution of Y2K severity levels
for several hundred applications noted that the software stopped in
about 15 percent of instances, which are severity 1 problems; it created
severity 2 problems in about 50 percent; it created severity 3 problems
in about 25 percent; and had no operational consequences in about 10
percent of the applications in the sample.
To know the origin of a defect, some research is required. Most defects
are initially found because the code stops working or produces erratic
results. But it is important to know if upstream problems such as
requirements or design issues are the true cause. Root-cause analysis
can find the true causes of software defects.
Several other factors should be included in a taxonomy for tracking
defects. These include whether a reported defect is valid or invalid.
(Invalid defects are common and fairly expensive, since they still require
analysis and a response.) Another factor is whether a defect report is
new and unique, or merely a duplicate of a prior defect report.
For testing and static analysis, the category of "false positives" needs
to be included. A false positive is the mistaken identification of a code
segment that initially seems to be incorrect, but which later research
reveals is actually correct.
A third factor deals with whether the repair team can make the same
problem occur on their own systems, or whether the defect was caused
by a unique configuration on the client's system. When defects cannot be
duplicated, they were termed abeyant defects by IBM, since additional
information needed to be collected to solve the problem.
Adding these additional topics to the Y2K example would result in
an expanded taxonomy:
Y2K origin:
Requirements
Y2K validity:
Valid defect report
Y2K uniqueness:
Duplicate (this problem was reported millions of times)
Y2K severity:
Severity 2 major features disabled
Y2K primary cause:
Error of commission
Y2K secondary cause:
Error of poor removal
img
Software Quality: The Key to Successful Software Engineering
573
When defects are being counted or predicted, it is useful to have a
standard metric for normalizing the results. As discussed in Chapter 5,
there are at least ten candidates for such a normalizing metric, including
function points, story points, use-case points, lines of code, and so on.
In this book and also in the author's previous books, the function
point metric defined by the International Function Point Users Group
(IFPUG) is used to quantify and normalize data for both defects and
productivity.
There are several reasons for using IFPUG function points. The most
important reason in terms of measuring software defects is that non-
code defects in requirements, design, and documents are major defect
sources and cannot be measured using the older "lines of code" metric.
Another important reason is that all of the major benchmark data
collections for productivity and quality use function point metrics, and
data expressed via IFPUG function points composes about 85 percent
of all known benchmarks.
It is not impossible to use other metrics for normalization, but if
results are to be compared against industry benchmarks such as those
published by the International Software Benchmarking Standards
Group (ISBSG), the IFPUG function points are the most convenient.
Later in the discussion of defect prediction, examples will be given of
using other metrics in addition to IFPUG function points.
It is interesting to combine the origin, severity, and cause factors to
examine the approximate frequency of each.
Table 9-1 shows the combination of these factors for software applica-
tions during development. Therefore, Table 9-1 shows defect potentials,
or the probable numbers of defects that will be encountered during
development and after release. Only severity 1 and severity 2 defects
are shown in Table 9-1.
Data on defect potentials is based on long-range studies of defects and
defect removal efficiency carried out by organizations such as the IBM
Software Quality Assurance groups, which have been studying software
quality for more than 35 years.
TABLE 9-1
Overview of Software Defect Potentials
Defect
Defects per
Severity 1
Severity 2
Most Frequent
Origins
Function Point
Defects
Defects
Defect Cause
Requirements
1.00
11.00%
15.00%
Omission
Design
1.25
15.00%
20.00%
Omission
Code
1.75
70.00%
57.00%
Commission
Documents
0.60
1.00%
1.00%
Ambiguity
Bad fixes
0.40
3.00%
7.00%
Commission
TOTAL
5.00
100.00%
100.00%
Omission
img
574
Chapter Nine
Other corporations such as AT&T, Coverity, Computer Aid Inc. (CAI),
Dovél Technologies, Motorola, Software Productivity Research (SPR),
Galorath Associates, the David Consulting Group, the Quality and
Productivity Management Group (QPMG), Unisys, Microsoft, and the
like, also carry out long-range studies of defects and removal efficiency
levels.
Most such studies are carried out by corporations rather than uni-
versities because academia is not really set up to carry out longitudinal
studies that may last more than ten years.
While coding bugs or coding defects are the most numerous during
development, they are also the easiest to find and to get rid of. A
combination of inspections, static analysis, and testing can wipe out
more than 95 percent of coding defects and sometimes top 99 percent.
Requirements defects and bad fixes are the toughest categories of defect
to eliminate.
Table 9-2 uses Table 9-1 as a starting point, but shows the latent
defects that will still be present when the software application is deliv-
ered to users. Table 9-2 shows approximate U.S. averages circa 2009.
Note the variations in defect removal efficiency by origin.
It is interesting that when the software is delivered to clients, require-
ments defects are the most numerous, primarily because they are the
most difficult to prevent and also the most difficult to find. Only formal
requirements-gathering methods combined with formal requirements
inspections can improve the situation for finding and removing require-
ments defects.
If not prevented or removed, both requirements bugs and design bugs
eventually find their way into the code. These are not coding bugs per
se, such as branching to a wrong address, but more serious and deep-
seated kinds of bugs or defects.
It was noted earlier in this chapter that requirements defects cannot
be found and removed by means of testing. If a requirements defect is
not prevented or removed via inspection, all test cases created using the
requirements will confirm the defect and not identify it.
TABLE 9-2
Overview of Delivered Software Defects
Defect
Defects per
Removal
Delivered Defects per
Most Frequent
Origins
Function Point
Efficiency
Function Point
Defect Cause
Requirements
1.00
70.00%
0.30
Commission
Design
1.25
85.00%
0.19
Commission
Code
1.75
95.00%
0.09
Commission
Documents
0.60
91.00%
0.05
Omission
Bad fixes
0.40
70.00%
0.12
Commission
TOTAL
5.00
85.02%
0.75
Commission
img
Software Quality: The Key to Successful Software Engineering
575
Since Table 9-2 reflects approximate U.S. averages, the methods
assumed are those of fairly careless requirements gathering: water-
fall development, CMMI level 1, no formal inspections of requirements,
design, or code; no static analysis; and using only five forms of testing:
(1) unit test, (2) new function test, (3) regression test, (4) system test,
and (5 )acceptance test.
Note also that during development, requirements will continue to
grow and change at rates of 1 percent to 2 percent every calendar month.
These changing requirements have higher defect potentials than the
original requirements and lower levels of defect removal efficiency. This
is yet another reason why requirements defects cause more problems
than any other defect origin point.
Software requirements are the most intractable source of software
defects. However, methods such as joint application design (JAD), qual-
ity function deployment (QFD), Six Sigma analysis, root-cause analy-
sis, embedding users with the development team as practiced by Agile
development, prototypes, and the use of formal requirements inspec-
tions can assist in bringing requirements defects under control.
Table 9-3 shows what quality might look like if an optimal combina-
tion of defect prevention and defect removal activities were utilized.
Table 9-3 assumes formal requirements methods, rigorous development
such as practiced using the Team Software Process (TSP) or the higher
CMMI levels, prototypes and JAD, formal inspections of all deliverables,
static analysis of code, and a full set of eight testing stages: (1) unit
test, (2) new function test, (3) regression test, (4) performance test, (5)
security test, (6) usability test, (7) system test, and (8) acceptance test.
Table 9-3 also assumes a software quality assurance (SQA) group
and rigorous reporting of software defects starting with requirements,
continuing through inspections, static analysis and testing, and out
into the field with multiple years of customer-reported defects, main-
tenance, and enhancements. Accumulating data such as that shown in
Tables 9-1 through 9-3 requires longitudinal data collection that runs
for many years.
TABLE 9-3
Optimal Defect Prevention and Defect Removal Activities
Defect
Defects per
Removal
Delivered Defects per Most Frequent
Origins
Function Point
Efficiency
Function Point
Defect Cause
Requirements
0.50
95.00%
0.03
Omission
Design
0.75
97.00%
0.02
Omission
Code
0.50
99.00%
0.01
Commission
Documents
0.40
96.00%
0.02
Omission
Bad fixes
0.20
92.00%
0.02
Commission
TOTAL
2.35
96.40%
0.08
Omission
576
Chapter Nine
This combination has the effect of cutting defect potentials by more
than 50 percent and of raising cumulative defect removal efficiency from
today's average of 85 percent up to more than 96 percent.
It might be possible to even exceed the results shown in Table 9-3,
but doing so would require additional methods such as the availability
of a full suite of certified reusable materials.
Tables 9-2 and 9-3 are oversimplifications of real-life results. Defect
potentials vary with the size of the application and with other factors.
Defect removal efficiency levels also vary with application size. Bad-fix
injections also vary by defect origins. Both defect potentials and defect
removal efficiency levels vary by methodology, by CMMI levels, and by
other factors as well. These will be discussed later in the section of this
chapter dealing with defect prediction.
Because of the many definitions of quality used by the industry, it is
best to start by showing what is predictable and measurable and what is
not. To sort out the relevance of the many quality definitions, the author
has developed a 10-point scoring method for software quality factors.
If a factor leads to improvement in quality, its maximum score is 3.
If a factor leads to improvement in customer satisfaction, its maxi-
mum score is 3.
If a factor leads to improvement in team morale, its maximum score
is 2.
If a factor is predictable, its maximum score is 1.
If a factor is measurable, its maximum score is 1.
The total maximum score is 10.
The lowest possible score is 0.
Table 9-4 lists all of the quality factors discussed in this chapter in
rank order by using the scoring factor just outlined. Table 9-4 shows
whether a specific quality factor is measurable and predictable, and
also the relevance of the factor to quality as based on surveys of soft-
ware customers. It also includes a weighted judgment as to whether
the factor has led to improvements in quality among the organizations
that use it.
The quality definitions with a score of 10 have been the most effec-
tive in leading to quality improvements over time. As a rule, the quality
definitions scoring higher than 7 are useful. However, the quality defi-
nitions that score below 5 have no empirical data available that shows
any quality improvement at all.
While Table 9-4 is somewhat subjective, at least it provides a math-
ematical basis for scoring the relevance and importance of the rather
img
Software Quality: The Key to Successful Software Engineering
577
TABLE 9-4
Rank Order of Quality Factors by Importance to Quality
Measurable
Predictable
Relevance to
Property?
Property?
Quality
Score
Best Quality Definitions
Defect potentials
Yes
Yes
Very high
10.00
Defect removal efficiency
Yes
Yes
Very high
10.00
Defect severity levels
Yes
Yes
Very high
10.00
Defect origins
Yes
Yes
Very high
10.00
Reliability
Yes
Yes
Very high
10.00
Good Quality Definitions
Toxic requirements
Yes
No
Very high
9.50
Missing requirements
Yes
No
Very high
9.50
Requirements conformance
Yes
No
Very high
9.00
Excess requirements
Yes
No
Medium
9.00
Usability
Yes
Yes
Very high
8.00
Testability
Yes
Yes
High
8.00
Defect causes
Yes
No
Very high
8.00
Fair Quality Definitions
Maintainability
Yes
Yes
High
7.00
Understandability
Yes
Yes
Medium
6.00
Traceability
Yes
No
Low
6.00
Modifiability
Yes
No
Medium
5.00
Verifiability
Yes
No
Medium
5.00
Poor Quality Definitions
Portability
Yes
Yes
Low
4.00
Expandability
Yes
No
Low
3.00
Scalability
Yes
No
Low
2.00
Interoperability
Yes
No
Low
1.00
Survivability
Yes
No
Low
1.00
Augmentability
No
No
Low
0.00
Flexibility
No
No
Low
0.00
Manageability
No
No
Low
0.00
Operability
No
No
Low
0.00
vague and ambiguous collection of quality factors used by the software
industry. In essence, Table 9-4 makes these points:
1. Conformance to requirements is hazardous unless incorrect, toxic,
or dangerous requirements are weeded out. This definition has not
demonstrated any improvements in quality for more than 30 years.
2. Most of the -ility quality definitions are hard to measure, and many
are of marginal significance. Some are not measurable either. None
of the -ility words tend to lead to tangible quality gains.
578
Chapter Nine
3. Quantification of defect potentials and defect removal efficiency
levels have had the greatest impact on improving quality and also
the greatest impact on customer satisfaction levels.
If software engineering is to evolve from a craft or art form into a true
engineering field, it is necessary to put quality on a firm quantitative
basis and to move away from vague and subjective quality definitions.
These will still have a place, of course, but they should not be the pri-
mary definitions for software quality.
Predicting Software Defect Potentials
To predict software quality, it is necessary to measure software quality.
Since companies such as IBM have been doing this for more than 40
years, the best available data comes from companies that have full life-
cycle quality measurement programs that start with requirements, con-
tinue through development, and then extend out to customer-reported
defects for as long as the software is used, which may be 25 years or
more. The next best source of data comes from benchmark and com-
mercial software estimating tool companies, since they collect historical
data on quality as well as on productivity.
Because software defects come from five different sources, the quick-
est way to get a useful approximation of software defect potentials is to
use IFPUG function point metrics.
The basic sizing rule for predicting defect potentials with function
point is: Take the size of a software application in function points and
raise it to the 1.25 power. The result will be a useful approximation
of software defect potentials for applications between a low of about
10 function points and a high of about 5000 function points.
The exponent for this rule of thumb would need to be adjusted down-
wards for the higher CMMI levels, Agile, RUP, and the Team Software
Process (TSP). But since the rule is intended to be applied early, before
any costs are expended, it still provides a useful starting point. Readers
might want to experiment with local data and find an exponent that
gives useful results against local quality and defect data.
Table 9-5 shows approximate U.S. averages for defect potentials.
Recall that defect potentials are the sum of five defect origins: require-
ments defects, design defects, code defects, document defects, and bad-
fix injections.
As can be seen from Table 9-5, defect potentials increase with applica-
tion size. Of course, other factors can reduce or increase the potentials,
as will be discussed later in the section on defect prevention.
While the total defect potential is useful, it is also useful to know
the distribution of defects among the five origins or sources. Table 9-6
img
Software Quality: The Key to Successful Software Engineering
579
TABLE 9-5
U.S. Averages for Software Defect Potentials
Size in FP
Defects per
Defect
Function Points
Function Point
Potentials
1
1.50
2
10
2.34
23
100
3.04
304
1,000
4.62
4,621
10,000
6.16
61,643
100,000
7.77
777,143
1,000,000
8.56
8,557,143
Average
4.86
1,342,983
illustrates typical defect distribution percentages using approximate
average values.
Applying the distribution shown in Table 9-6 to a sample application
of 1500 function points, Table 9-7 illustrates the approximate defect
potential, or the total number of defects that might be found during
development and by customers.
These simple overall examples are not intended as substitutes for
commercial quality estimation tools such as KnowledgePlan and SEER,
which can adjust their predictions based on CMMI levels; development
methods such as Agile, TSP, or RUP; use of inspections; use of static
analysis; and other factors which would cause defect potentials to vary
and also which cause defect removal efficiency levels to vary.
Rules of thumb are never very accurate, but their convenience and
ease of use provide value for rough estimates and early sizing. However,
such rules should not be used for contracts or serious estimates.
Predicting Code Defects
Using function point metrics as an overall tool for quality prediction is
useful because noncoding defects outnumber code defects. That being
said, there are more coding defects than any other single source.
TABLE 9-6
Percentages of Defects by Origin
Defects per
Percent of Total
Defect Origins
Function Point
Defects
Requirements
1.00
20.00%
Design
1.25
25.00%
Source code
1.75
35.00%
User documents
0.60
12.00%
Bad fixes
0.40
8.00%
TOTAL
5.00
100.00%
img
580
Chapter Nine
TABLE 9-7
Defect Potentials for a Sample Application
(Application size = 1500 Function Points)
Defects per
Defect
Percent of
Defect Origins
Function Point
Potentials
Total Defects
Requirements
1.00
1,500
20.00%
Design
1.25
1,875
25.00%
Source code
1.75
2,625
35.00%
User documents
0.60
900
12.00%
Bad fixes
0.40
600
8.00%
TOTAL
5.00
7,500
100.00%
Predicting code defects is fairly tricky for six reasons:
1. More than 2,500 programming languages are in existence, and they
are not equal as sources of defects.
2. A majority of modern software applications use more than one language,
and some use as many as 15 different programming languages.
3. The measured range of performance by a sample of programmers
using the same language for the same test application varies by
more than 10 to 1. Individual skills and programming styles create
significant variations in the amount of code written for the same
problem, in defect potentials, and also in productivity.
4. Lines of code can be counted using either physical lines or logical
statements. For some languages, the two counts are identical, but
for others, there may be as much as a 500 percent variance between
physical and logical counts.
5. For a number of languages starting with Visual Basic, some program-
ming is done by means of buttons or pull-down menus. Therefore,
programming is done without using procedural source code. There are
no effective rules for counting source code with such languages.
6. Reuse of source code from older applications or from libraries of
reusable code is quite common. If the reused code is certified, it will
have very few defects compared with new custom code.
To predict coding defects, it is necessary to know the level of a pro-
gramming language. The concept of the level of a language is often used
informally in phrases such as "high-level" or "low-level" languages.
Within IBM in the 1970s, when research was first carried out on
predicting code defects, it was necessary to give a formal mathematical
definition to language levels. Within IBM the level was defined as the
number of statements in basic assembly language needed to equal the
functionality of 1 statement in a higher-level language.
Software Quality: The Key to Successful Software Engineering
581
Using this definition, COBOL was a level 3 language, because it took
3 basic assembly statements to equal 1 COBOL statement. Using the
same rule, SMALLTALK is a level 15 language.
(For several years before function points were invented, IBM used
"equivalent assembly statements" as the basis for estimating non-code
work such as user manuals. Thus, instead of basing a publication budget
on 10 percent of the effort for writing a program in PL/S, the budget
would be based on 10 percent of the effort if the code were basic assem-
bly language. This method was crude but reasonably effective.)
Dissatisfaction with the equivalent assembler method for estimation
was one of the reasons IBM assigned Allan Albrecht and his colleagues
to develop function point metrics.
Additional programming languages such as APL, Forth, Jovial, and
others were starting to appear, and IBM wanted both a metric and esti-
mating methods that could deal with both noncoding and coding work
in an accurate fashion. IBM also wanted to predict coding defects.
The use of macro-assembly language had introduced reuse, and this
caused measurement problems, too. It raised the issue of how to count
reused code in software applications or any other reused material. The
solution here was to separate productivity and quality into two topics:
(1) development and (2) delivery.
The former dealt with the code and materials that had to be constructed
from scratch. The latter dealt with the final application as delivered,
including reused material. For example, using macro-assembly language
a productivity rate for development productivity might be 300 lines of
code per month. But due to reusing code in the form of macro expansions,
delivery productivity might be as high as 750 lines of code per month.
The same distinction affects quality, too. Assume a program had 1000
lines of new code and 1000 lines of reused code. There might be 15 bugs
per KLOC in the new code but 0 bugs per KLOC in the reused code.
This is an important business distinction that is not well understood
even in 2009. The true goal of software engineering is to improve the
rate of delivery productivity and quality rather than development pro-
ductivity and quality.
After function point metrics were developed circa 1975, the defini-
tion of "language level" was expanded to include the number of logical
code statements equivalent to 1 function point. COBOL, for example,
requires about 105 statements per function point in the procedure and
data divisions. (This expansion is the mathematical basis for backfiring,
or direct conversion from source code to function points.)
Table 9-8 illustrates how code size and coding defects would vary if
15 different programming languages were used for the same applica-
tion, which is 1000 function points. Table 9-8 assumes a constant value
of 15 potential coding defects per KLOC for all languages. However,
img
582
Chapter Nine
TABLE 9-8
Examples of Defects per KLOC and Function Point for 15 Languages
(Assumes a constant of 15 defects per KLOC for all languages)
Language
Sample
Source Code per Source Code per Coding
Defects per
Level
Languages  Function Point
1000 FP
Defects
Function Point
1.
Assembly
320
320,000
4,800
4.80
2.
C
160
160,000
2,400
2.40
3.
COBOL
107
106,667
1,600
1.60
4.
PL/I
80
80,000
1,200
1.20
5.
Ada95
64
64,000
960
0.96
6.
Java
53
53,333
800
0.80
7.
Ruby
46
45,714
686
0.69
8.
E
40
40,000
600
0.60
9.
Perl
36
35,556
533
0.53
10.
C++
32
32,000
480
0.48
11.
C#
29
29,091
436
0.44
12.
Visual
27
26,667
400
0.40
Basic
13.
ASP NET
25
24,615
369
0.37
14.
Objective C
23
22,857
343
0.34
15.
Smalltalk
21
21,333
320
0.32
the 15 languages have levels that vary from 1 to 15, so very different
quantities of code will be created for the same 1000 function points.
Note: Language levels are variable and change based on volumes of
reused code or calls to external functions. The levels shown in Table 9-8
are only approximate and are not constants.
As can be seen from Table 9-8, in order to predict coding defects, it is
critical to know the programming language (or languages) that will be
used and also the size of the application using both function point and
lines of code metrics.
The situation is even more tricky when combinations of two or more
languages are used within the same application. However, this prob-
lem is handled by commercial software cost-estimating tools such as
KnowledgePlan, which include multilanguage estimating capabilities.
Reused code also adds to the complexity of predicting coding defects.
To show the results of multiple languages in the same application, let
us consider two case studies.
In Case Study A, there are three different languages and each lan-
guage has 1000 lines of code, counted using logical statements. In Case
Study B, we have the same three languages, but now each language
comprises exactly 25 function points each.
For Case A, the total volume of source code is 3000 lines of code; total
function points are 73; and total code defect potentials are 45.
img
Software Quality: The Key to Successful Software Engineering
583
Case A: Three Languages with 1000 Lines of Code Each
Lines of
LOC per
Function
Defect
Languages
Levels Code (LOC) Function Point
Points
Potential
6
15
C
2.00
1,000
160
Java
6.00
1,000
53
19
15
Smalltalk
15.00
1,000
21
48
15
TOTAL
3,000
73
45
AVERAGE
7.76
41
When we change the assumptions to Case B and use a constant value
of 25 function points for each language, the total number of function
points only changes from 73 to 75. But the volume of source code almost
doubles, as do numbers of defects. This is because of the much greater
impact of the lowest-level language, the C programming language.
When considering either Case A or Case B, it is easily seen that pre-
dicting either size or quality for a multi language application is a great
deal more complicated than for a single-language application.
Case B: Three Languages with 25 Function Points Each
Lines of
LOC per
Function  Defect
Languages
Levels Code (LOC) Function Point
Points  Potential
C
2.00
4,000
160
25
60
Java
6.00
1,325
53
25
20
Smalltalk
15.00
525
21
25
8
TOTAL
5,850
75
88
AVERAGE
4.10
78
It is interesting to look at Case A and Case B in a side-by-side format
to highlight the differences. Note that in Case B the influence of the
lowest-level language, the C programming language, increases both
code volumes and defect potentials:
Source Code
(Logical statements)
Case A
Case B
C
1,000
4,000
Java
1,000
1,325
Smalltalk
1,000
525
Total lines of code
3,000
5,850
Total KLOC
3.00
5.85
Function Points
73
75
Code Defects
45
88
Defects per KLOC
15
15
Defects per Function Point
0.62
1.17
584
Chapter Nine
Cases A and B oversimplify real-life problems because each case study
uses constants for data items that in real-life are variable. For example,
the constant of 15 defects per KLOC for code defects is really a variable
that can range from less than 5 to more than 25 defects per KLOC.
The number of source code statements per function point is also a vari-
able, and each language can vary by perhaps a range of 2 to 1 around the
average values shown by the nominal language "level" default values.
These variables illustrate why predicting quality and defect levels
depends so heavily upon measuring quality and defect levels. The exam-
ples also illustrate why definitions of quality need to be both measurable
and predictable.
Other variables can affect the ability to predict size and defects as well.
Suppose, for example, that reused code composed 50 percent of the code
volume in Case A. Suppose also that the reused code is certified and has
zero defects. Now the calculations for defect predictions need to include
reuse, which in this example lowers defect potentials by 50 percent.
When the size of the application is used for productivity calculations,
it is necessary to decide whether development productivity or delivery
productivity, or both, are the figures of interest.
Predicting software defects is possible to accomplish with fairly
good accuracy, but the calculations are not trivial, and they need to
include a number of variables that can only be determined by careful
measurements.
The Quality Impacts of Creeping Requirements
Function point analysis at the end of the requirements phase and then
again at application delivery shows that requirements grow and change
at rates in excess of 1 percent per calendar month during the design
and coding phases. The total growth in creeping requirements ranges
from a low of less than 10 percent of total requirements to a high of
more than 50 percent. (One unique project had requirements growth in
excess of 200 percent.)
As an example, if an application is sized at 1000 function points when
the initial requirements phase is over, then every month at least 10 new
function points will be added in the form of new requirements. This
growth might continue for perhaps six months, and so increase the size
of the application from 1000 to 1060 function points. For small projects,
the growth of creeping requirements is more of an inconvenience than
a serious issue.
Larger applications have longer schedules and usually higher rates of
requirements change as well. For an application initially sized at 10,000
function points, new requirements might lead to monthly growth rates of
125 function points for perhaps 20 calendar months. The delivered applica-
tion might be 12,500 function points rather than 10,000 function points.
img
Software Quality: The Key to Successful Software Engineering
585
As this example illustrates, creeping requirements growth of a full 25
percent can have a major impact on development schedules, costs, and
also on quality and delivered defects.
Because new and changing requirements are occurring later in devel-
opment than the original requirements, they are often rushed. As a
result, defect potentials for creeping requirements are about 10 percent
greater than for the original requirements. This is true for toxic require-
ments and design errors. Code bugs may or may not increase, based
upon the schedule pressure applied to the software engineering team.
Creeping requirements also tend to bypass formal inspections and
also have fewer test cases created for them. As a result, defect removal
efficiency is lower against both toxic requirements and also design
errors by at least 5 percent. This seems to be true for code errors as well,
with the exception that applications coded in C or Java that use static
analysis tools will still achieve high levels of defect removal efficiency
against code bugs.
The combined results of higher defect potentials and lower levels
of defect removal for creeping requirements result in a much greater
percentage of delivered defects stemming from changed requirements
than any other source of error. This has been a chronic problem for the
software industry.
The bottom line is that creeping requirements combined with below
optimum levels of defect prevention and defect removal are a primary
cause of cancelled projects, schedule delays, and cost overruns.
As will be discussed later in the sections on defect prevention and
defect removal, there are technologies available for minimizing the harm
from creeping requirements. However, these effective methods, such as
formal requirements and design inspections, are not widely used.
Measuring Software Quality
In spite of the fact that defect removal efficiency is a critical topic for suc-
cessful software projects, measuring defect removal efficiency or software
quality in general are seldom done. From visiting over 300 companies
in the United States, Europe, and Asia, the author found the following
distribution of the frequency of various kinds of quality measures:
No quality measures at all
44%
Measuring only customer-reported defects
30%
Measuring test and customer-reported defects
18%
Measuring inspection, static analysis, test, and customer-reported defects
7%
Using volunteers for measuring personal defect removal
1%
Overall Distribution
100%
586
Chapter Nine
The mathematics of measuring defect removal efficiency is not com-
plicated. Twelve steps in the sequence of data collection and calculations
are needed to quantify defect removal efficiency levels:
1. Accumulate data on every defect that occurs, starting with
requirements.
2. Assign severity levels to each reported defect as it is fixed.
3. Measure how many defects are removed by every defect removal
activity.
4. Use root-cause analysis to identify origins of high-severity defects.
5. Measure invalid defects, duplicates, and false positives, too.
6. After the software is released, measure customer-reported defects.
7. Record hours worked for defect prevention, removal, and repairs.
8. Select a fixed point such as 90 days after release for the calculations.
9. Use volunteers to record private defect removal such as desk
checking.
10. Calculate cumulative defect removal efficiency for the entire series.
11. Calculate the defect removal efficiency for each step in the series.
12. Use the data to improve both defect prevention and defect removal.
The effort and costs required to measure defect removal efficiency
levels are trivial compared with the value of such information. The total
effort required to measure each defect and its associated repair work
amounts to only about an hour. Of this time, probably half is expended
on customer-reported defects, and the other half is expended on internal
defect reports.
However, step 4, root-cause analysis, can take several additional
hours based on how well requirements and design are handled by the
development team.
The value of measuring defect removal efficiency encompasses the
following benefits:
Finding and fixing bugs is the most expensive activity in all of software,
so reducing these costs yields a very large return on investment.
Excessive numbers of bugs constitute the main reason for schedule
slippage, so reducing defects in all deliverables will shorten develop-
ment schedules.
Delivered defects are the major cost driver of software maintenance
for the first two years after release, so improving removal efficiency
lowers maintenance costs.
Software Quality: The Key to Successful Software Engineering
587
Customer satisfaction correlates inversely to numbers of delivered
defects, so reducing delivered defects will result in happier customers.
Team morale correlates with both effective defect prevention and
effective defect removal.
Later in the section on the economics of quality, these benefits will
be quantified to show the overall value of defect prevention and defect
removal.
Many companies and government organizations track software
defects found during static analysis, testing, and also defects reported
by customers. In fact, a number of commercial software defect tracking
tools are available.
These tools normally track defect symptoms, applications containing
defects, hardware and software platforms, and other kinds of indicative
data such as release number, built number, and so on.
However, more sophisticated organizations also utilize formal inspec-
tions of requirements, design, and other materials. Such companies
often utilize static analysis in addition to testing and therefore measure
a wider range of defects than just those found in source code by ordinary
testing.
Some additional information is needed in order to use expanded defect
data for root-cause analysis and other forms of defect prevention. These
additional topics include
It is important to record information on the point
Defect discovery point
at which any specific defect is found. Since requirements defects cannot
normally be found via testing, it is important to try and identify noncode
defect discovery points.
Collectively, noncode defects in requirements and design are more
numerous than coding defects, and also may be high in severity levels.
Defect repair costs for noncode defects are often higher than for coding
defects. Note that there are more than 17 kinds of software testing, and
companies do not use the same names for various test stages.
Date of defect discovery: ________________
Defect Discovery Point:
Customer defect report
Quality assurance defect report
Test stage _________________ defect report
Static analysis defect report
Code inspection defect report
Document inspection defect report
588
Chapter Nine
Design inspection defect report
Architecture inspection defect report
Requirements inspection defect report
Other ____________________ defect report
Defect origin point It is also important to record information on where
software defects originate. This information requires careful analysis
on the part of the change team, so many companies limit defect origin
research to high-severity defects such as Severity 1 and Severity 2.
Date of defect origination: ____________________
Defect Origin Point:
Application name
Release number
Build number
Source code (internal)
Source code (reused from legacy application)
Source code (reused from commercial source)
Source code (commercial software package)
Source code (bad-fix or previous defect repair)
User manual
Design document
Architecture document
Requirement document
Other _____________________ origination point
Ideally, the lag time between defect origins and defect discovery will
be less than a month and hopefully less than a week. It is very impor-
tant that defects that originate within a phase such as the requirements
or design phases should also be discovered and fixed during the same
phase.
When there is a long gap between origins and discovery, such as not
finding a design problem until system test, it is a sign that software
development and quality control processes need to improve.
The best solution for shortening the gap between defect origination
and defect discovery is that of formal inspections of requirements,
design, and other deliverables. Both static analysis and code inspections
are also valuable for shortening the intervals between defect origination
and defect discovery.
img
Software Quality: The Key to Successful Software Engineering
589
TABLE 9-9
Best-Case Defect Discovery Points
Defect Origins
Optimal Defect Discovery
Requirements
Requirements inspection
Design
Design inspection
Code
Static analysis
Bad fixes
Static analysis
Documentation
Editing
Test cases
Test case inspection
Table 9-9 shows the best-case scenario for defect discovery methods
for various defect origins.
Inspections are best at finding subtle and complex bugs and problems
that are difficult to find via testing because sometimes no test cases
are created for them. The example of the Y2K problem is illustrative
of a problem that could be found via testing so long as two-digit dates
were mistakenly believed to be acceptable. Code inspections are useful
for finding subtle problems such as security vulnerabilities that may
escape both testing and even static analysis.
Static analysis is best at finding common coding errors such as
branches to incorrect locations, overflow conditions, poor error handling,
and the like. Static analysis prior to testing or as an adjunct to testing
will lower testing costs.
Testing is best at finding problems that only show up when the code is
operating, such as performance problems, usability problems, interface
problems, and other issues such as mathematical errors or format errors
for screens and reports.
Given the diverse nature of software bugs and defects, it is obvious
that all three defect removal methods are important for success: inspec-
tions, static analysis, and testing.
Table 9-10 illustrates the fact that long delays between defect origins
and defect discovery lead to very troubling situations. Long gaps also
raise bad-fix injections, accidentally including new defects in attempts
to repair older defects.
TABLE 9-10
Worst-Case Defect Discovery Points
Defect Origins
Latest Defect Discovery
Requirements
Deployment
Design
System testing
Code
New function testing
Bad fixes
Regression testing
Documentation
Deployment
Test cases
Not discovered
590
Chapter Nine
In the worst-case scenario, requirements defects are not found until
deployment, while design defects are not found until system test, when
it is difficult to fix them without extending the overall schedule for
the project. Note that in the worst-case scenario, bugs or errors in
test cases themselves are never discovered, so they fester on for many
releases.
Defect prevention and early defect removal are far more cost-effective
than depending on testing alone.
Other quality measures include some or all of the following:
Since it is possible to predict defect potentials
Earned quality value (EQV)
and also to predict defect removal efficiency levels, some companies
such as IBM have used a form of "earned value" where predictions of
defects that would probably be found via inspections, static analysis,
and testing are compared with actual defect discovery rates. Predicted
and actual defect removal costs are also compared.
If fewer defects are found than predicted, then root-cause analysis can
be applied to discover if quality is really better than planned or if defect
removal is lax. (Usually quality is better when this happens.)
If more defects are found than predicted, then root-cause analysis
can be applied to discover if quality is worse than planned or if defect
removal is more effective than anticipated. (Usually, quality is worse
when this happens.)
Cost of quality (COQ) Collectively, the costs of finding and fixing bugs are
the most expensive known activity in the history of software. Therefore,
it is important to gather effort and cost data in such a fashion that cost
of quality (COQ) calculations can be performed.
However, for software, normal COQ calculations need to be tailored
to match the specifics of software engineering. Usually, data is recorded
in terms of hours and then converted into costs by applying salaries,
burden rates, and other cost items.
Defect discovery activity: ___________________
Defect prevention activities: ___________________
Defect effort reported by users
Defect damages reported by users
Preparation hours for inspections
Preparation hours for static analysis
Preparation hours for testing
Defect discovery hours
Defect reporting hours
Software Quality: The Key to Successful Software Engineering
591
Defect analysis hours
Defect repair hours
Defect inspection hours
Defect static analysis hours
Test stages used for defect
Test cases created for defect
Defect test hours
The software industry has long used the "cost per defect" metric
without actually analyzing how this metric works. Indeed, hundreds
of articles and books parrot similar phrases such as "it costs 100 times
as much to fix a bug after release than during coding" or some minor
variation on this phrase. The gist of these dogmatic statements is that
the cost per defect rises steadily as the later defects are found.
What few people realize is that cost per defect is always cheapest
where the most bugs are found and is most expensive where the fewest
bugs are found. In fact, as normally calculated, this metric violates stan-
dard economic assumptions because it ignores fixed costs. The cost per
defect metric actually penalizes quality and achieves the lowest results
for the buggiest applications!
Following is an analysis of why cost per defect penalizes quality and
achieves its best results for the buggiest applications. The same math-
ematical analysis also shows why defects seem to be cheaper if found
early rather than found later.
Furthermore, when zero-defect applications are reached, there
are still substantial appraisal and testing activities that need to be
accounted for. Obviously, the cost per defect metric is useless for zero-
defect applications.
Because of the way cost per defect is normally measured, as quality
improves, cost per defect steadily increases until zero-defect software is
achieved, at which point the metric cannot be used at all.
As with the errors in KLOC metrics, the main source of error is that
of ignoring fixed costs. Three examples will illustrate how cost per defect
behaves as quality improves.
In all three cases, A, B, and C, we can assume that test personnel work
40 hours per week and are compensated at a rate of $2500 per week or
$62.50 per hour. Assume that all three software features that are being
tested are 100 function points.
Case A: Poor Quality
Assume that a tester spent 15 hours writing test cases, 10 hours run-
ning them, and 15 hours fixing 10 bugs. The total hours spent was 40,
592
Chapter Nine
and the total cost was $2500. Since 10 bugs were found, the cost per
defect was $250. The cost per function point for the week of testing
would be $25.00.
Case B: Good Quality
In this second case, assume that a tester spent 15 hours writing test
cases, 10 hours running them, and 5 hours fixing one bug, which was
the only bug discovered. However, since no other assignments were
waiting and the tester worked a full week, 40 hours were charged to
the project.
The total cost for the week was still $2500, so the cost per defect has
jumped to $2500. If the 10 hours of slack time are backed out, leaving
30 hours for actual testing and bug repairs, the cost per defect would be
$1875. As quality improves, cost per defect rises sharply.
Let us now consider cost per function point. With the slack removed,
the cost per function point would be $18.75. As can easily be seen, cost
per defect goes up as quality improves, thus violating the assumptions
of standard economic measures.
However, as can also be seen, testing cost per function point declines
as quality improves. This matches the assumptions of standard econom-
ics. The 10 hours of slack time illustrate another issue: when quality
improves, defects can decline faster than personnel can be reassigned.
Case C: Zero Defects
In this third case, assume that a tester spent 15 hours writing test
cases and 10 hours running them. No bugs or defects were discovered.
Because no defects were found, the cost per defect metric cannot be
used at all.
But 25 hours of actual effort were expended writing and running test
cases. If the tester had no other assignments, he or she would still have
worked a 40-hour week, and the costs would have been $2500. If the 15
hours of slack time are backed out, leaving 25 hours for actual testing,
the costs would have been $1562.
With slack time removed, the cost per function point would be $15.63.
As can be seen again, testing cost per function point declines as quality
improves. Here, too, the decline in cost per function point matches the
assumptions of standard economics.
Time and motion studies of defect repairs do not support the aphorism
that it costs 100 times as much to fix a bug after release as before. Bugs
typically require between 15 minutes and 4 hours to repair.
Some bugs are expensive; these are called abeyant defects by IBM.
Abeyant defects are customer-reported defects that the repair center
cannot re-create, due to some special combination of hardware and
Software Quality: The Key to Successful Software Engineering
593
software at the client site. Abeyant defects constitute less than 5 percent
of customer-reported defects.
Because of the fixed or inelastic costs associated with defect removal
operations, cost per defect always increases as numbers of defects
decline. Because more defects are found at the beginning of a testing
cycle than after release, this explains why cost per defect always goes
up later in the cycle. It is because the costs of writing test cases, running
them, and having maintenance personnel available act as fixed costs.
In any manufacturing cycle with a high percentage of fixed costs, the
cost per unit will go up as the number of units goes down. This basic fact
of manufacturing economics is why cost per defect metrics are hazard-
ous and invalid for economic analysis of software applications.
What would be more effective is to record the hours spent for all forms
of defect removal activity. Once the hours are recorded, the data could
be converted into cost data, and also normalized by converting hours
into standard units such as hours per function point.
Table 9-11 shows a sample of the kinds of data that are useful in
assessing cost of quality and also doing economic studies and effective-
ness studies.
Of course, knowing defect removal hours implies that data is also
collected on defect volumes and severity levels. Table 9-12 shows the
same set of activities as Table 9-11, but switches from hours to defects.
Both Tables 9-11 and 9-12 could also be combined into a single large
spreadsheet. However, defect counts and defect effort accumulation
tend to come from different sources and may not be simultaneously
available.
Defect effort and discovered defect counts are important data ele-
ments for long-range quality improvements. In fact, without such data,
quality improvement is likely to be minimal or not even occur at all.
Failure to record defect volumes and repair effort is a chronic weak-
ness of the software engineering domain. However, several software
development methods such as Team Software Process (TSP) and the
Rational Unified Process (RUP) do include careful defect measures.
The Agile method, on the other hand, is neither strong nor consistent
on software quality measures.
For software engineering to become a true engineering discipline and
not just a craft as it is in 2009, defect measurements, defect prediction,
defect prevention, and defect removal need to become a major focus of
software engineering.
Measuring Defect Removal Efficiency
One of the most effective metrics for demonstrating and improving soft-
ware quality is that of defect removal efficiency. This metric is simple in
concept but somewhat tricky to apply. The basic idea of this metric is to
img
594
Chapter Nine
TABLE 9-11
Software Defect Removal Effort Accumulation
Defect Removal Effort (Hours Worked)
Preparation
Execution
Repair
TOTAL
Removal Stage
Hours
Hours
Hours
HOURS
Inspections:
Requirements
Architecture
Design
Source code
Documents
Static analysis
Test stages:
Unit
New function
Regression
Performance
Usability
Security
System
Independent
Beta
Acceptance
Supply chain
Maintenance:
Customers
Internal SQA
calculate the percentage of software defects found by means of defect
removal operations such as inspections, static analysis, and testing.
What makes the calculations for defect removal efficiency tricky is
that it includes noncode defects found in requirements, design, and
other paper deliverables, as well as coding defects.
Table 9-13 illustrates an example of defect removal efficiency levels
for a full suite of removal operations starting with requirements inspec-
tions and ending with Acceptance testing.
Table 9-13 makes a number of simplifying assumptions. One of these
is the assumption that all delivered defects will be found by customers
in the first 90 days of usage. In real life, of course, many latent defects in
delivered software will stay hidden for months or even years. However,
after 90 days, new releases will usually occur, and they make it difficult
to measure defects for prior releases.
img
Software Quality: The Key to Successful Software Engineering
595
TABLE 9-12
Software Defect Severity Level Accumulation
Defect Severity Levels
Severity 1
Severity 2
Severity 3
Severity 4
TOTAL
Removal Stage
(Critical)
(Serious)
(Minor)
(Cosmetic)
DEFECTS
Inspections:
Requirements
Architecture
Design
Source code
Documents
Static Analysis
Test stages:
Unit
New function
Regression
Performance
Usability
Security
System
Independent
Beta
Acceptance
Supply chain
Maintenance:
Customers
Internal SQA
It is interesting to see what kind of defect removal efficiency levels
occur with less sophisticated series of defect removal steps that do not
include either formal inspections or static analysis.
Since noncode defects that originate in requirements and design even-
tually find their way into the code, the overall removal efficiency levels
of testing by itself without any precursor inspections or static analysis
are seriously degraded, as shown in Table 9-14.
When comparing Tables 9-13 and 9-14, it can easily be seen that a
full suite of defect removal activities is more efficient and effective than
testing alone in finding and removing software defects that originate
outside of the source code. In fact, inspections and static analysis are
also very efficient in finding coding defects and have the additional
property of raising testing efficiency and lowering testing costs.
img
596
Chapter Nine
TABLE 9-13
Software Defect Removal Efficiency Levels
(Assumes inspections, static analysis, and normal testing)
Application size
(function points)
1,000
Language
C
Code size
125,000
Noncode defects
3,000
Code defects
2,000
TOTAL DEFECTS
5,000
Defect Removal Efficiency by Removal Stage
Noncode
Code
Total
Removal
Removal Stage
Defects
Defects
Defects
Efficiency
Inspections:
Requirements
750
0
750
Architecture
200
0
200
Design
1,250
0
1,250
Source code
100
800
900
Documents
250
0
250
Subtotal
2,550
800
3,350
67.00%
Static Analysis
0
800
800
66.67%
Test stages:
Unit
0
50
50
New function
50
100
150
Regression
0
25
25
Performance
0
10
10
Usability
50
0
50
Security
0
20
20
System
25
50
75
Independent
0
5
5
Beta
25
15
40
Acceptance
25
15
40
Supply chain
25
10
35
Subtotal
200
300
500
58.82%
Prerelease Defects
2,750
1,900
4,650
93.00%
Maintenance:
Customers (90 days)
250
100
350
100.00%
TOTAL
3,000
2,000
5,000
Removal Efficiency
91.67%
95.00%
93.00%
img
Software Quality: The Key to Successful Software Engineering
597
TABLE 9-14
Software Defect Removal Efficiency Levels
(Assumes normal testing without inspections or static analysis)
Application size
1000
(function points)
Language
C
Code size
125,000
Noncode defects
3,000
Code defects
2,000
TOTAL DEFECTS
5,000
Defect Removal Efficiency by Removal Stage
Noncode
Code
Total
Removal
Removal Stage
Defects
Defects
Defects
Efficiency
Inspections:
Requirements
0
0
0
Architecture
0
0
0
Design
0
0
0
Source code
0
0
0
Documents
0
0
0
Subtotal
0
0
0
0.00%
Static Analysis
0
0
0
0.00%
Test stages:
Unit
200
350
550
New function
450
600
1,050
Regression
0
100
100
Performance
0
50
50
Usability
200
75
275
Security
0
50
50
System
300
200
500
Independent
50
10
60
Beta
150
25
175
Acceptance
175
20
195
Supply chain
75
20
95
Subtotal
1,600
1,500
3,100
62.00%
Prerelease Defects
1,600
1,500
3,100
62.00%
Maintenance:
Customers (90 days)
1,400
500
1,900
100.00%
TOTAL
3,000
2,000
5,000
Removal Efficiency
53.33%
75.00%
62.00%
598
Chapter Nine
Without pretest inspections and static analysis, testing will find hun-
dreds of bugs, but the overall defect removal efficiency of the full suite of
test activities will be lower than if inspections and static analysis were
part of the suite of removal activities.
In addition to elevating defect removal efficiency levels, adding formal
inspections and static analysis to the suite of defect removal opera-
tions also lowers development and maintenance costs. Development
schedules are also shortened, because traditional lengthy test cycles are
usually the dominant part of software development schedules. Indeed,
poor quality tends to stretch out test schedules by significant amounts
because the software does not work well enough to be released.
Table 9-15 shows a side-by-side comparison of cost structures for the
two examples discussed in this section. Case X is derived from Table
9-13 and uses a sophisticated combination of formal inspections, static
analysis, and normal testing.
Case Y is derived from Table 9-14 and uses only normal testing, with-
out any inspections or static analysis being performed.
The costs in Table 9-15 assume a fully burdened compensation struc-
ture of $10,000 per month. The defect-removal costs assume prepara-
tion, execution, and defect repairs for all defects found and identified.
In addition to the cost advantages, excellence in quality control also
correlates with customer satisfaction and with reliability. Reliability
and customer satisfaction both correlate inversely with levels of deliv-
ered defects.
The more defects there are at delivery, the more unhappy custom-
ers are. In addition, mean time to failure (MTTF) goes up as delivered
defects go down. The reliability correlation is based on high-severity
defects in the Severity 1 and Severity 2 classes.
Table 9-16 shows the approximate relationship between delivered
defects, reliability in terms of mean time to failure (MTTF) hours, and
customer satisfaction with software applications.
Table 9-16 uses integer values, so interpolation between these dis-
crete values would be necessary. Also, the reliability levels are only
approximate. Table 9-13 deals only with the C programming language,
so adjustments in defects per function point would be needed for the
700 other languages that exist. Additional research is needed on the
topics of reliability and customer satisfaction and their correlations
with delivered defect levels.
However, not only do excessive levels of delivered defects generate
negative scores on customer satisfaction surveys, but they also show up
in many lawsuits against outsource contractors and commercial soft-
ware developers. In fact, one lawsuit was even filed by shareholders of
a major software corporation who claimed that excessive defect levels
were lowering the value of their stock.
img
Software Quality: The Key to Successful Software Engineering
599
TABLE 9-15
Comparison of Software Defect Removal Efficiency Costs
(Case X = inspections, static analysis, normal testing)
(Case Y = normal testing only)
Application size
1,000
(function points)
Language
C
Code size
125,000
Noncode defects
3,000
Code defects
2,000
TOTAL DEFECTS
5,000
Defect Removal Costs by Activity
Case X
Case Y
Removal Stage
Removal $
Removal $
Difference
Inspections:
Requirements
Architecture
Design
Source code
Documents
Subtotal
$168,750
$0
$168,750
Static Analysis
$81,250
$0
$81,250
Test stages:
Unit
New function
Regression
Performance
Usability
Security
System
Independent
Beta
Acceptance
Supply chain
Subtotal
$150,000
$775,000
­$625,000
Prerelease Defects
$400,000
$775,000
­$375,000
Maintenance:
Customers (90 days)
$175,000
$950,000
­$775,000
TOTAL COSTS
$575,000
$1,725,000
­$1,150,000
Cost per Defect
$115.00
$345.00
­$230.00
Cost per Function Pt.
$575.00
$1,725.00
­$1,150.00
Cost per LOC
$4.60
$13.80
­$9.20
ROI from inspections,
$3.00
static analysis
Development Schedule
12.00
16.00
­4.00
(Calendar months)
img
600
Chapter Nine
TABLE 9-16
Delivered Defects, Reliability, Customer Satisfaction
(Note 1: Assumes the C programming language)
(Note 2: Assumes 125 LOC per function point)
(Note 3: Assumes severity 1 and 2 delivered defects)
Delivered Defects
Defects per
Mean Time
Customer
per KLOC
Function Point
to Failure (MTTF hours)
Satisfaction
0.00
0.00
Infinite
Excellent
1.00
0.13
303
Very good
2.00
0.25
223
Good
3.00
0.38
157
Fair
4.00
0.50
105
Poor
5.00
0.63
66
Very poor
6.00
0.75
37
Very poor
7.00
0.88
17
Very poor
8.00
1.00
6
Litigation
9.00
1.13
1
Litigation
10.00
1.25
0
Malpractice
Better quality control is the key to successful software engineering.
Software quality needs to be definable, predictable, measurable, and
improvable in order for software engineering to become a true engineer-
ing discipline.
Defect Prevention
The phrase "defect prevention" refers to methods and techniques that
lower the odds of certain kinds of defects occurring at all. The liter-
ature of defect prevention is very sparse, and academic research is
even sparser. The reason for this is that studying defect prevention is
extremely difficult and also somewhat ambiguous at best.
Defect prevention is analogous to vaccination against serious illness
such as pneumonia or flu. There is statistical evidence that vaccination
will lower the odds of patients contracting the diseases for which they
are vaccinated. However, there is no proof that any specific patient
would catch the disease whether receiving a vaccination or not. Also,
a few patients who are vaccinated might contract the disease anyway,
because vaccines are not 100 percent effective. In addition, some vac-
cines may have serious and unexpected side-effects.
All of these issues can occur with software defect prevention, too.
While there is statistical evidence that certain methods such as pro-
totypes, joint application design (JAD), quality function deployment
(QFD), and participation in inspections prevent certain kinds of defects
Software Quality: The Key to Successful Software Engineering
601
from occurring, it is hard to prove that those defects would definitely
occur in the absence of the preventive methodologies.
The way defect prevention is studied experimentally is to have two
versions of similar or identical applications developed, with one version
using a particular defect prevention method while the other version
did not. Obviously, experimental studies such as this must be small in
scale.
The easiest experiments in defect prevention are those dealing with
formal inspections of requirements, design, and code. Because inspec-
tions record all defects, companies that utilize formal inspections soon
accumulate enough data to analyze both defect prevention and defect
removal.
Formal inspections are so effective in terms of defect prevention that
they reduce defect potentials by more than 25 percent per year. In fact,
one issue with inspections is that after about three years of continuous
usage, so few defects occur that inspections become boring.
The more common method for studying defect prevention is to exam-
ine the results of large samples of applications and note differences in
the defect potentials among them. In other words, if 100 applications
that used prototypes are compared with 100 similar applications that
did not use prototypes, are requirements defects lower for the prototype
sample? Are creeping requirements lower for the prototype sample?
This kind of study can only be carried out internally by rather sophis-
ticated companies that have very sophisticated defect and quality mea-
surement programs; that is, companies such as IBM, AT&T, Microsoft,
Raytheon, Lockheed, and the like. (Consultants who work for a number
of companies in the same industry can often observe the effects of defect
prevention by noting similar applications in different companies.)
However, the results of such large-scale statistical studies are some-
times published from benchmark collections by organizations such
as the International Software Benchmarking Standards Group (ISBSG),
the David Consulting Group, Software Productivity Research (SPR),
and the Quality and Productivity Management Group (QPMG).
In addition, consultants such as the author who work as expert wit-
nesses in software litigation may have access to data that is not oth-
erwise available. This data shows the negative effects of failing to use
defect prevention on projects that ended up in court.
Table 9-17 illustrates a large sample of 30 methods and techniques
that have been observed to prevent software defects from occurring.
Although the table shows specific percentages of defect prevention effi-
ciency, the actual data is too sparse to support the results. The percent-
ages are only approximate and merely serve to show the general order
of effectiveness.
img
602
Chapter Nine
TABLE 9-17
Methods and Techniques that Prevent Defects
Activities Observed to Prevent Software Defects
Defect Prevention Efficiency
1.
Reuse (certified sources)
­80.00%
2.
Inspection participation
­60.00%
3.
Prototyping-functional
­55.00%
4.
PSP/TSP
­53.00%
5.
Six Sigma for software
­53.00%
6.
Risk analysis (automated)
­50.00%
7.
Joint application design (JAD)
­45.00%
8.
Test-driven development (TDD)
­45.00%
9.
Defect origin measurements
­44.00%
10.
Root cause analysis
­43.00%
11.
Quality function deployment (QFD)
­40.00%
12.
CMM 5
­37.00%
13.
Agile embedded users
­35.00%
14.
Risk analysis (manual)
­32.00%
15.
CMM 4
­27.00%
16.
Poka-yoke
­23.00%
17.
CMM 3
­23.00%
18.
Scrum sessions (daily)
­20.00%
19.
Code complexity analysis
­19.00%
20.
Use-cases
­18.00%
21.
Reuse (uncertified sources)
­17.00%
22.
Security plans
­15.00%
23.
Rational Unified Process (RUP)
­15.00%
24.
Six Sigma (generic)
­12.50%
25.
Clean-room development
­12.50%
26.
Software Quality Assurance (SQA)
­12.50%
27.
CMM 2
­12.00%
28.
Total Quality Management (TQM)
­10.00%
29.
No use of CMM
0.00%
30.
CMM 1
5.00%
Average
­30.12%
Note that because defect prevention deals with reducing defect poten-
tials, percentages show negative values for methods that lower defects.
Positive values indicate methods that raise defect potentials.
The two top-ranked items deserve comment. The phrase "reuse from
certified sources" implies formal reusability where specifications, source
code, test cases, and the like have gone through rigorous inspection and
test stages, and have proven themselves to be reliable in field trials.
Certified reusable components may approach zero defects, and in any
Software Quality: The Key to Successful Software Engineering
603
case contain very few defects. Reuse of uncertified material is somewhat
hazardous by comparison.
The second method, or participation in formal inspections, has more
than 40 years of empirical data. Inspections of requirements, design,
and other deliverables are very effective and efficient in terms of defect
removal efficiency. But in addition, participants in formal inspections
become aware of defect patterns and categories, and spontaneously
avoid them in their own work.
One emerging form of risk analysis is so new that it lacks empirical
data. This new method consists of performing very early sizing and risk
analysis prior to starting a software application or spending any money
on development.
If the risks for the project are significantly higher than its value, not
doing it at all will obviously prevent 100 percent of potential defects. The
Victorian state government in Australia has started such a program,
and by eliminating hazardous software applications before they start,
they have saved millions of dollars.
New sizing methods based on pattern matching can shift the point
at which risk analysis can be performed about six months earlier than
previously possible. This new approach is promising and needs addi-
tional study.
There are other things that also have some impact in terms of defect
prevention. One of these is certification of personnel either for testing
or for software quality assurance knowledge. Certification also has an
effect on defect removal. The defect prevention effects are shown using
negative percentages, while the defect removal effects are shown with
positive percentages.
Here too the data is only approximate, and the specific percentages
are derived from very sparse sources and should not be depended upon.
Table 9-18 is sorted in terms of defect prevention.
The data in Table 9-18 should not be viewed as accurate, but only
approximate. A great deal more research is needed on the effectiveness
of various kinds of certification. Also, the software industry circa 2009
has overlapping and redundant forms of certification. There are mul-
tiple testing and quality associations that offer certification, but these
separate groups certify using different methods and are not coordinated.
In the absence of a single association or certification body, these various
nonprofit and for-profit test and quality assurance associations offer
rival certificates that use very different criteria.
Yet another set of factors that has an effect in terms of defect pre-
vention are various kinds of metrics and measurements, as discussed
earlier in this book.
For metrics and measurements to have an effect, they need to be
capable of demonstrating quality levels and measuring changes against
img
604
Chapter Nine
TABLE 9-18
Influence of Certification on Defect Prevention and Removal
Defect
Defect
Removal
Prevention
Benefit
Benefit
Certificate
31.
Six Sigma black belt
­12.50%
10.00%
32.
International Software Testing Quality Board (ISTQB)
­12.00%
10.00%
33.
Certified Software Quality Engineer (CSQE)-ASQ
­10.00%
10.00%
34.
Certified. Software Quality Analyst (CSQA)
­10.00%
10.00%
35.
Certified Software Test Manager (CSTM)
­7.00%
7.00%
36.
Six Sigma green belt
­6.00%
5.00%
37.
Microsoft certification (testing)
­6.00%
6.00%
38.
Certified Software Test Professional (CSTP)
­5.00%
12.00%
39.
Certified Software Tester (CSTE)
­5.00%
12.00%
40.
Certified Software Project Manager (CSPM)
­3.00%
3.00%
Average
­7.65%
8.50%
quality baselines. Therefore, many of the -ility measures and metrics
cannot even be included because they are not measurable.
Table 9-19 shows the approximate impacts of various measurements
and metrics on defect prevention and defect removal. IFPUG function
points are top-ranked because they can be used to quantify defects in
TABLE 9-19
Software Metrics, Measures, and Defect Prevention and Removal
Defect Prevention
Defect Removal
Metric
Benefit
Benefit
41.
IFPUG function points
­30.00%
15.00%
42.
Six Sigma
­25.00%
20.00%
43.
Cost of quality (COQ)
­22.00%
15.00%
44.
Root cause analysis
­20.00%
12.00%
45.
TSP/PSP
­20.00%
18.00%
46.
Monthly rate of requirements change
­17.00%
5.00%
47.
Goal-question metrics
­15.00%
10.00%
48.
Defect removal efficiency
­12.00%
35.00%
49.
Use-case points
­12.00%
5.00%
50.
COSMIC function points
­10.00%
10.00%
51.
Cyclomatic complexity
­10.00%
7.00%
52.
Test coverage percent
­10.00%
22.00%
53.
Percent of requirements missed
­7.00%
3.00%
54.
Story points
5.00%
­5.00%
55.
Cost per defect
10.00%
­15.00%
56.
Lines of code (LOC)
15.00%
­12.00%
Average
­11.25%
9.06%
Software Quality: The Key to Successful Software Engineering
605
requirements and design as well as code. IFPUG function points can
also be used to measure software defect removal costs and quality eco-
nomics.
Note that the bottom two metrics, cost per defect and lines of code, are
shown as harmful metrics rather than beneficial because they violate
the assumptions of standard economics.
Note that the two bottom-ranked measurements from Table 9-16 have
a negative impact; that is, they make quality worse rather than better.
As commonly used in the software literature, both cost per defect and
lines of code are close to being professional malpractice, because they
violate the canons of standard economics and distort results.
The lines of code metric penalizes high-level languages and makes
both the quality and productivity of low-level languages look better
than it really is. In addition, this metric cannot even be used to measure
requirements and design defects or any other form of noncode defect.
The cost per defect metric penalizes quality and makes buggy applica-
tions look better than applications with few defects. This metric cannot
even be used for zero-defect applications. A nominal quality metric that
penalizes quality and can't even be used to show the highest level of
quality is a good candidate for being professional malpractice.
The final aspect of defect prevention discussed in this chapter is that
of the effectiveness of various international standards. Unfortunately,
the effectiveness of international standards has very little empirical
data available.
There are no known controlled studies that demonstrate if adherence
to standards improves quality. There is some anecdotal evidence that at
least some standards, such as ISO 9001-9004, degrade quality because
some companies that did not use these standards had higher quality on
similar applications than companies that had been certified. Table 9-20
shows approximate results, but the table has two flaws. It only shows a
small sample of standards, and the rankings are based on very sparse
and imperfect information.
In fields outside of software such as medical practice, standards are
normally validated by field trials, controlled studies, and extensive anal-
ysis. For software, standards are not validated and are based on the
subjective views of the standards committees. While some of these com-
mittees are staffed by noted experts and the standards may be useful,
the lack of validation and field trials prior to publication is a sign that
software engineering needs additional evolution before being classified
as a full engineering discipline.
Tables 9-17 through 9-20 illustrate a total of 65 defect preven-
tion methods and practices. These are not all used at the same time.
Table 9-18 shows the approximate usage patterns observed in several
hundred U.S. companies (and in about 50 overseas companies).
img
606
Chapter Nine
TABLE 9-20
International Standards, Defect Prevention and Removal
Defect
Defect
Prevention Removal
Benefit
Benefit
Standard or Government Mandate
57.
ISO/IEC 10181 Security Frameworks
­25.00%
25.00%
58.
ISO 17799 Security
­15.00%
15.00%
59.
Sarbanes-Oxley
­12.00%
6.00%
60.
ISO/IEC 25030 Software Product Quality Requirements
­10.00%
5.00%
61.
ISO/IEC 9126-1 Software Engineering Product Quality
­10.00%
5.00%
62.
IEEE 730-1998 Software Quality Assurance Plans
­8.00%
5.00%
63.
IEEE 1061-1992 Software Metrics
­7.00%
2.00%
64.
ISO 9000-9003 Quality Management
­6.00%
5.00%
65.
ISO 9001:2000 Quality Management System
­4.00%
7.00%
Average
­10.78%
8.33%
Table 9-21 is somewhat troubling because the three top-ranked meth-
ods have been demonstrated to be harmful and make quality worse
rather than better. In fact, of the really beneficial defect prevention
methods, only a handful such as prototyping, measuring test coverage,
and joint application design (JAD) have more than 50 percent usage in
the United States.
Usage of many of the most powerful and effective methods such as
inspections or measuring cost of quality (COQ) have less than 33 per-
cent usage or penetration. The data shown in Table 9-18 is not precise,
since much larger samples would be needed. However, it does illustrate
a severe disconnect between effective methods of defect prevention and
day-to-day usage in the United States.
Part of the reason for the dismaying patterns of usage is because
of the difficulty of actually measuring and studying defect prevention
methods. Only a few large and sophisticated corporations are able to
carry out studies of defect prevention. Most universities cannot study
defect prevention because they lack sufficient contacts with corpora-
tions and therefore have little data available.
In conclusion, defect prevention is sparsely covered in the software
literature. There is very little empirical data available, and a great deal
more research is needed on this topic.
One way to improve defect prevention and defect removal would be to
create a nonprofit foundation or association that studied a wide range
of quality topics. Both defect prevention and defect removal would be
included. Following is the hypothetical structure and functions of a pro-
posed nonprofit International Software Quality Foundation (ISQF).
img
Software Quality: The Key to Successful Software Engineering
607
TABLE 9-21
Usage Patterns of Software Defect Prevention Methods
Defect Prevention Method
Percent of U.S. Projects
1.
Reuse (uncertified sources)
90.00%
2.
Cost per defect
75.00%
3.
Lines of code (LOC)
72.00%
4.
Prototyping-functional
70.00%
5.
Test coverage percent
67.00%
6.
No use of CMM
50.00%
7.
Joint application design (JAD)
45.00%
8.
Percent of requirements missed
38.00%
9.
Software quality assurance (SQA)
36.00%
10.
Use-cases
33.00%
11.
IFPUG function points
33.00%
12.
Test-driven development (TDD)
30.00%
13.
Cost of quality (COQ)
29.00%
14.
Scrum sessions (daily)
28.00%
15.
CMM 3
28.00%
16.
Agile embedded users
27.00%
17.
Six Sigma
24.00%
18.
Risk analysis (manual)
22.00%
19.
Rational Unified Process (RUP)
22.00%
20.
Cyclomatic complexity
21.00%
21.
CMM 1
20.00%
22.
Monthly rate of requirements change
20.00%
23.
Code complexity analysis
19.00%
24.
ISO 9001:2000 Quality Management System
19.00%
25.
Microsoft certification (testing)
18.00%
26.
ISO 9000-9003 Quality Management
18.00%
27.
Root cause analysis
17.00%
28.
ISO/IEC 9126-1 Software Engineering Product
17.00%
Quality
29.
TSP/PSP
16.00%
30.
ISO/IEC 25030 Software Product Quality
16.00%
Requirements
31.
IEEE 1061-1992 Software Metrics
16.00%
32.
Defect origin measurements
15.00%
33.
Root cause analysis
15.00%
34.
IEEE 730-1998 Software Quality Assurance Plans
15.00%
35.
PSP/TSP
14.00%
36.
Six Sigma for software
13.00%
37.
Six Sigma (generic)
13.00%
38.
Story points
13.00%
(Continued)
img
608
Chapter Nine
TABLE 9-21
Usage Patterns of Software Defect Prevention Methods (continued)
Defect Prevention Method
Percent of U.S. Projects
39.
Inspection participation
12.00%
40.
CMM 2
12.00%
41.
Sarbanes-Oxley
12.00%
42.
Six Sigma green belt
11.00%
43.
ISO/IEC 10181 Security Frameworks
11.00%
44.
Six Sigma black belt
10.00%
45.
Defect removal efficiency
10.00%
46.
Use-case points
10.00%
47.
ISO 17799 Security
10.00%
48.
Goal-Question Metrics
9.00%
49.
CMM 4
8.00%
50.
Certified Software Test Professional (CSTP)
8.00%
51.
Security plans
7.00%
52.
Quality function deployment (QFD)
6.00%
53.
Total quality management (TQM)
6.00%
54.
Certified Software Project Manager (CSPM)
6.00%
55.
International Software Testing Quality Board
4.00%
(ISTQB)
56.
Certified Software Quality Analyst (CSQA)
4.00%
57.
Certified Software Tester (CSTE)
4.00%
58.
COSMIC function points
4.00%
59.
Certified Software Quality Engineer (CSQE) ­ ASQ
3.00%
60.
Risk analysis (automated)
2.00%
61.
Certified Software Test Manager (CSTM)
2.00%
62.
Reuse (certified sources)
1.00%
63.
CMM 5
1.00%
64.
Poka-yoke
0.10%
65.
Clean-room development
0.10%
Proposal for a Nonprofit International Software Quality Foundation (ISQF)
The ISQF will be a nonprofit foundation that is dedicated to improv-
ing the quality and economic value of software applications. The form
of incorporation is to be decided by the initial board of directors. The
intent is to incorporate under section 501(c) of the Internal Revenue
Code and thereby be a tax-exempt organization that is authorized to
receive donations.
The fundamental principles of ISQF are the following:
1. Poor quality has been and is damaging the professional reputation
of the software community.
Software Quality: The Key to Successful Software Engineering
609
2. Poor quality has been and is causing significant litigation between
clients and software development corporations.
3. Significant software quality improvements are technically possible.
4. Improved software quality has substantial economic benefits in
reducing software costs and schedules.
5. Improved software quality depends upon accurate measurement
of quality in many forms, including, but not limited to, measuring
software defects, software defect origins, software defect severity
levels, methods of defect prevention, methods of defect removal,
customer satisfaction, and software team morale.
6. The major cost of software development and maintenance is that
of eliminating defects. ISQF will mount major studies on measur-
ing the economic value of defect prevention, defect removal, and
customer satisfaction.
7. Measurement and estimation are synergistic technologies. ISQF
will evaluate software quality and reliability estimation methods,
and will publish the results of their evaluations. No fees from esti-
mation tool vendors will be accepted. The evaluations will be inde-
pendent and based on standard benchmarks and test cases.
8. Software defects can originate in requirements, design, coding, user
documents, and also in test plans and test cases themselves. In
addition, there are secondary defects that are introduced while
attempting to repair earlier defects. ISQF will study all sources of
software problems and attempt to improve all sources of software
defects and user dissatisfaction.
9. ISQF will sponsor research in technical topics that may include, but
are not be limited to, inspections, static analysis, test case design,
test coverage analysis, test tools, defect reporting, defect tracking
tools, bad-fix injections, error-prone module removal, complexity
analysis, defect prevention, formal inspections, quality measure-
ments, and quality metrics.
10. The ISQF will also sponsor research to quantify the effects of all
social factors that influence software quality, including the effective-
ness of software quality assurance organizations (SQA), separate
test organizations, separate maintenance organizations, interna-
tional standards, and the value of certification. Methods of studying
software customer satisfaction will also be supported.
11. The service metrics defined in the Information Technology
Infrastructure Library (ITIL) are all dependent upon achieving
satisfactory levels of quality. ISQF will incorporate principles from
the ITIL library, and will also sponsor research studies to show the
610
Chapter Nine
correlations between reliability and availability and quality levels
in terms of delivered defects.
12. As new technologies appear in the software industry, it is impor-
tant to stay current with their quality impacts. ISQF will perform
or commission studies on the quality results of a variety of new
approaches including but not limited to Agile development, cloud
computing, crystal development, extreme programming, open-
source development, and service-oriented architecture (SOA).
13. ISQF will provide model curricula for university training in soft-
ware measurement, metrics, defect prevention, defect removal, cus-
tomer support, customer satisfaction, and the economic value of
software quality.
14. ISQF will provide model curricula for MBA programs that deal with
the economics of software and the principles of software manage-
ment. The economics of quality will be a major subtopic.
15. ISQF will provide model curricula for corporate and in-house train-
ing in software measurement, metrics, defect prevention, defect
removal, customer support, customer satisfaction, and the economic
value of software quality.
16. ISQF will provide recommended skill profiles for the occupations of
software quality assurance (SQA), software testing, software cus-
tomer support, and software quality measurement.
17. ISQF will offer examinations and licensing certificates for the
occupations of software quality assurance (SQA), software testing,
software customer support, and software quality measurement. Of
these, software quality measurement has no current certification.
18. ISQF will establish boards of competence to administer examina-
tions and define the state of the art for software quality assurance
(SQA), software testing, and software quality measurement. Other
boards and specialties may be added at future times.
19. ISQF will define the conditions of professional malpractice as they
apply to inadequate methods of software quality control. Examples
of such conditions may include failing to keep adequate records of
software defects, failing to utilize sufficient test stages and test cases,
and failing to perform adequate inspections of critical materials.
20. ISQF will cooperate with other nonprofit organizations that are
concerned with similar issues. These organizations include but are
not limited to the Global Association for Software Quality (GASQ)
in Belgium, the World Quality Conference, the IEEE, the ISO, ANSI,
IFPUG, SPIN, and the SEI. IASQ will also cooperate with other
organizations such as universities, the Information Technology
Software Quality: The Key to Successful Software Engineering
611
Metrics and Productivity Institute (ITMPI), the Project Management
Institute (PMI), the Quality Assurance Institute (QAI), software
testing societies, and relevant engineering, benchmarking, and pro-
fessional organizations such as the ISBSG benchmarking group.
ISQF will also cooperate with similar quality organizations abroad
such as those in China, Japan, India, and Russia. This cooperation
might include reciprocal memberships if other organizations are
willing to participate in that fashion.
21. ISQF will be governed by a board of five directors, to be elected by
the membership. The board of directors will appoint a president
or chief executive officer. The president will appoint a treasurer,
secretary, and such additional officers as may be required by the
terms and place of incorporation. Initially, the board, president,
and officers will serve as volunteers on a pro bono basis. To ensure
inclusion of fresh information, the term of president will be two
calendar years.
22. Funding for the ISQF will be a combination of dues, donations,
grants, and possible fund-raising activities such as conferences and
events.
23. The ISQF will also have a technical advisory board of five members
to be appointed by the president. The advisory board will assist
ISQF in staying at the leading edge of research into topics such as
testing, inspections, quality metrics, and also availability and reli-
ability and other ITIL metrics.
24. The ISQF will use modern communication methods to expand the
distribution of information on quality topics. These methods will
include an ISQF web site, webinars, a possible quality Wikipedia,
Twitter, blogs, and online newsletters.
25. The ISQF will have several subcommittees that deal with topics
such as membership, grants and donations, press liaison, university
liaison, and liaison with other nonprofit organizations such as the
Global Association of Software Quality in Belgium.
26. To raise awareness of the importance of quality, the ISQF will
produce a quarterly journal, with a tentative name of Software
Quality Progress. This will be a refereed journal, with the referees
all coming from the ISQF membership.
27. To raise awareness of the importance of quality, the ISQF will spon-
sor an annual conference and will solicit nominations for a series
of "outstanding quality awards." The initial set of awards will be
organized by type of software (information systems, commercial
applications, military software, outsourced applications, systems
and embedded software, web applications). The awards will be for
612
Chapter Nine
lowest numbers of delivered defects, highest levels of defect removal
efficiency, best customer service, and highest rankings of customer
satisfaction.
28. To raise awareness of the importance of software quality, ISQF
members will be encouraged to write and review articles and
books on software quality topics. Both technical journals such as
CrossTalk and mainstream business journals such as the Harvard
Business Review will be journals of choice.
29. To raise awareness of the importance of software quality, ISQF will
begin the collection of a major library of books, journal articles, and
monographs on topics and issues associated with software quality.
30. To raise awareness of the importance of software quality, ISQF will
sponsor benchmark studies of software defects, defect severity levels,
defect removal efficiency, test coverage, inspection efficiency, inspec-
tion and test costs, cost of quality (COQ), and software litigation where
poor quality was one of the principal complaints by the plaintiffs.
31. To raise awareness of the economic consequences of poor quality,
the ISQF will sponsor research on consequential damages, deaths,
and property losses associated with poor software quality.
32. To raise awareness of the economic consequences of poor quality,
the ISQF will collect public information on the results of software
litigation where poor quality was part of the plaintiff's claims. Such
litigation includes breach of contract cases, fraud cases, and cases
where poor quality damaged plaintiff business operations.
33. To raise awareness of the importance of software quality, ISQF
chapters will be encouraged at state and local levels, such as Rhode
Island Software Quality Association or a Boston Software Quality
Association.
34. To ensure high standards of quality education, the ISQF will review
and certify specific courses on software quality matters offered by
universities and private corporations as well. Courses will be sub-
mitted for certification on a voluntary basis. Minimal fees will be
charged for certification in order to defray expenses. Fees will be
based on time and material charges and will be levied whether or
not a specific course passes certification or is denied certification.
35. To ensure that quality topics are included and are properly defined
in contracts and outsource agreements, the ISQF will cooperate
with the American Bar Association, state bar associations, the
American Arbitration Society, and various law schools on the legal
status of software quality and on contractual issues.
Software Quality: The Key to Successful Software Engineering
613
36. ISQF members will be asked to subscribe to a code of ethics that
will be fully defined by the ISQF technical advisory board. The
code of ethics will include topics such as providing full and honest
information about quality to all who ask, avoiding conflicts of inter-
est, and basing recommendations about quality on solid empirical
information.
37. Because security and quality are closely related, the ISQF will also
include security attack prevention and also recovery from security
attacks topics as part of the overall mission. However, security is
highly specialized and requires additional skills outside the normal
training of quality assurance and test personnel.
38. Because of the serious global recession, the ISQF will attempt to
rapidly disseminate empirical data on the economic value of quality.
High quality for software has been proven to shorten development
schedules, lower development costs, improve customer support, and
reduce maintenance costs. But few managers and executives have
access to the data that supports such claims.
Software engineering and software quality need to be more closely
coupled than has been the norm in the past. Better prediction of quality,
better measurement of quality, more widespread usage of effective defect
prevention methods and defect removal methods are all congruent with
advancing software engineering to the status of a true engineering
discipline.
Software Defect Removal
Although both defect prevention and defect removal are important, it
is easier to study and measure defect removal. This is because counts
of defects found by means of inspections, static analysis, and testing
provide a quantitative basis for calculating defect removal efficiency
levels.
In spite of the fact that defect removal is theoretically easy to study,
the literature remains distressingly sparse. For example, testing has
an extensive literature with hundreds of books, thousands of journal
articles, many professional associations, and numerous conferences.
Yet hardly any of the testing literature contains empirical data on
the measured numbers of test cases created, actual counts of defects
found and removed, data on bad-fix injection rates, or other tangible
data points.
Several important topics have almost no citations at all in the test-
ing literature. For example, a study done at IBM found more errors in
test cases than in the software that was being tested. The same study
614
Chapter Nine
found about 35 percent duplicate or redundant test cases. Yet neither
test case errors nor redundant test cases are discussed in the software
testing literature.
Another gap in the literature of both testing and other forms of defect
removal concerns bad-fix injections. About 7 percent of attempts to
repair software defects contain new defects in the repairs themselves.
In fact, sometimes there are secondary and even tertiary bad fixes; that
is, three consecutive attempts to fix a bug may fail to fix the original
bug and introduce new bugs that were not there before!
Another problem with the software engineering literature and also
with software professional associations is a very narrow focus. Most
testing organizations tend to ignore static analysis and inspections.
As a result of this narrow focus, the synergies among various kinds of
defect removal operations are not well covered in the quality or software
engineering literature. For example, carrying out formal inspections
of requirements and design not only finds defects, but also raises the
defect removal efficiency levels of subsequent test stages by at least
5 percent by providing better and more complete source material for
constructing test cases.
Running automated static analysis prior to testing will find numerous
defects having to do with limits, boundary conditions, and structural
problems, and therefore speed up subsequent testing.
Formal inspections are best at finding very complicated and subtle
problems that require human intelligence and insight. Formal inspec-
tions are also best at finding errors of omission and errors of ambiguity.
Static analysis is best at finding structural and mechanical problems
such as boundary conditions, duplications, failures of error-handling,
and branches to incorrect routines. Static analysis can also find security
flaws.
Testing is best at finding problems that occur when software is execut-
ing, such as performance issues, usability issues, and security issues.
Individually, these three methods are useful but incomplete. When
used together, their synergies can elevate defect removal efficiency
levels and also reduce the effort and costs associated with defect removal
activities.
Table 9-22 provides an overview of 80 different forms of software defect
removal: static analysis, inspections, many kinds of testing, and some
special forms of defect removal associated with software litigation.
Although Table 9-22 shows overall values for defect removal efficiency,
the data really deals with removal efficiency against selected defect cat-
egories. For example, automated static analysis might find 87 percent
of structural code problems, but it can't find requirements omissions or
problems such as the Y2K problem that originate in requirements.
img
Software Quality: The Key to Successful Software Engineering
615
TABLE 9-22
Overview of 80 Varieties of Software Defect Removal Activities
DEFECT REMOVAL ACTIVITIES
Bad-Fix
Defect
Number of
Injection
Removal
Test Cases
Percent
Efficiency
per FP
Activities
STATIC ANALYSIS
1.
Automated static analysis
0.00
87.00%
2.00%
2.
Requirements inspections
0.00
85.00%
6.00%
3.
External design inspection
0.00
85.00%
6.00%
4.
Use-case inspection
0.00
85.00%
4.00%
5.
Internal design inspection
0.00
85.00%
4.00%
6.
New code inspections
0.00
85.00%
4.00%
7.
Reuse certification inspection
0.00
84.00%
2.00%
8.
Test case inspection
0.00
83.00%
5.00%
9.
Automated document analysis
0.00
83.00%
6.00%
10.
Legacy code inspections
0.00
83.00%
6.00%
11.
Quality function deployment
0.00
82.00%
3.00%
12.
Document proof reading
0.00
82.00%
1.00%
13.
Nationalization inspection
0.00
81.00%
3.00%
14.
Architecture inspections
0.00
80.00%
3.00%
15.
Test plan inspection
0.00
80.00%
5.00%
16.
Test script inspection
0.00
78.00%
4.00%
17.
Test coverage analysis
0.00
77.00%
3.00%
18.
Document editing
0.00
77.00%
2.50%
19.
Pair programming review
0.00
75.00%
5.00%
20.
Six Sigma analysis
0.00
75.00%
3.00%
21.
Bug repair inspection
0.00
70.00%
3.00%
22.
Business plan inspections
0.00
70.00%
8.00%
23.
Root-cause analysis
0.00
65.00%
4.00%
24.
Governance reviews
0.00
63.00%
5.00%
25.
Refactoring of code
0.00
62.00%
5.00%
26.
Error-prone module analysis
0.00
60.00%
10.00%
27.
Independent audits
0.00
55.00%
10.00%
28.
Internal audits
0.00
52.00%
10.00%
29.
Scrum sessions (daily)
0.00
50.00%
2.00%
30.
Quality assurance review
0.00
45.00%
7.00%
31.
Sarbanes-Oxley review
0.00
45.00%
10.00%
32.
User story reviews
0.00
40.00%
10.00%
33.
Informal peer reviews
0.00
40.00%
10.00%
34.
Independent verification and validation
0.00
35.00%
12.00%
35.
Private desk checking
0.00
35.00%
7.00%
(Continued)
img
616
Chapter Nine
TABLE 9-22
Overview of 80 Varieties of Software Defect Removal Activities
(continued)
DEFECT REMOVAL ACTIVITIES
Bad-Fix
Defect
Number of
Injection
Removal
Test Cases
Percent
Efficiency
per FP
Activities
36.
Phase reviews
0.00
30.00%
15.00%
37.
Correctness proofs
0.00
27.00%
20.00%
Average
0.00
66.92%
6.09%
GENERAL TESTING
38.
PSP/TSP unit testing
3.50
52.00%
2.00%
39.
Subroutine testing
0.25
50.00%
2.00%
40.
XP testing
2.00
40.00%
3.00%
41.
Component testing
1.75
40.00%
3.00%
42.
System testing
1.50
40.00%
7.00%
43.
New function testing
2.50
35.00%
5.00%
44.
Regression testing
2.00
30.00%
7.00%
45.
Unit testing
3.00
25.00%
4.00%
Average
2.06
41.00%
4.13%
Sum
16.50
AUTOMATIC TESTING
46.
Virus/spyware test
3.50
80.00%
4.00%
47.
System test
2.00
40.00%
8.00%
48.
Regression test
2.00
37.00%
7.00%
49.
Unit test
0.05
35.00%
4.00%
50.
New function test
3.00
35.00%
5.00%
Average
2.11
45.40%
5.60%
Sum
10.55
SPECIALIZED TESTING
51.
Virus testing
0.70
98.00%
2.00%
52.
Spyware testing
1.00
98.00%
2.00%
53.
Security testing
0.40
90.00%
4.00%
54.
Limits/capacity testing
0.50
90.00%
5.00%
55.
Penetration testing
4.00
90.00%
4.00%
56.
Reusability testing
4.00
88.00%
0.25%
57.
Firewall testing
2.00
87.00%
3.00%
58.
Performance testing
0.50
80.00%
7.00%
59.
Nationalization testing
0.30
75.00%
10.00%
60.
Scalability testing
0.40
65.00%
6.00%
61.
Platform testing
0.20
55.00%
5.00%
62.
Clean-room testing
3.00
45.00%
7.00%
63.
Supply chain testing
0.30
35.00%
10.00%
img
Software Quality: The Key to Successful Software Engineering
617
TABLE 9-22
Overview of 80 Varieties of Software Defect Removal Activities
(continued)
DEFECT REMOVAL ACTIVITIES
Bad-Fix
Defect
Number of
Injection
Removal
Test Cases
Percent
Efficiency
per FP
Activities
64.
SOA orchestration
0.20
30.00%
5.00%
65.
Independent testing
0.20
25.00%
12.00%
Average
1.18
70.07%
5.48%
Sum
17.70
USER TESTING
66.
Usability testing
0.25
65.00%
4.00%
67.
Local nationalization testing
0.40
60.00%
3.00%
68.
Lab testing
1.25
45.00%
5.00%
69.
External beta testing
1.00
40.00%
7.00%
70.
Internal acceptance testing
0.30
30.00%
8.00%
71.
Outsource acceptance testing
0.05
30.00%
6.00%
72.
COTS acceptance testing
0.10
25.00%
8.00%
Average
0.48
42.14%
5.86%
Sum
3.35
LITIGATION ANALYSIS, TESTING
73.
Intellectual property testing
2.00
80.00%
1.00%
74.
Intellectual property review
0.00
80.00%
3.00%
75.
Breach of contract review
0.00
80.00%
2.00%
76.
Breach of contract testing
2.00
70.00%
2.00%
77.
Tax litigation review
0.00
80.00%
4.00%
78.
Tax litigation testing
1.00
70.00%
4.00%
79.
Fraud code review
0.00
80.00%
2.00%
80.
Embezzlement code review
0.00
80.00%
2.00%
Average
2.35
77.14%
2.71%
Sum
5.00
TOTAL TEST CASES
53.10
PER FUNCTION POINT
Table 9-22 is sorted in descending order of defect removal efficiency.
However, the results shown are maximum values. In real life, the range
of measured defect removal efficiency can be less than half of the nomi-
nal maximum values shown in Table 9-18.
Although Table 9-22 lists 80 different kinds of software defect removal
activities, that does not imply that all of them are used at the same time.
618
Chapter Nine
In fact, the U.S. average for defect removal activities includes only six
kinds of testing:
U.S. Average Sequence of Defect Removal
1. Unit test
2. New function test
3. Performance test
4. Regression test
5. System test
6. Acceptance or beta test
These six forms of testing, collectively, range between about 70 per-
cent and 85 percent in cumulative defect removal efficiency levels: far
below what is needed to achieve high levels of reliability and customer
satisfaction. The bottom line is that testing, by itself, is insufficient to
achieve professional levels of quality.
An optimum sequence of defect removal activities would include sev-
eral kinds of pretest inspections, static analysis, and at least eight forms
of testing:
Optimal Sequence of Software
Defect Removal
Pretest Defect Removal
1. Requirements inspection
2. Architecture inspection
3. Design inspection
4. Code inspection
5. Test case inspection
6. Automated static analysis
Testing Defect Removal
7. Subroutine test
8. Unit test
9. New function test
10. Security test
11. Performance test
12. Usability test
Software Quality: The Key to Successful Software Engineering
619
13. System test
14. Acceptance or beta test
This combination of synergistic forms of defect removal will achieve
cumulative defect removal efficiency levels in excess of 95 percent for
every software project and can achieve 99 percent for some projects.
When the most effective forms of defect removal are combined with
the most effective forms of defect prevention, then software engineering
should be able to achieve consistent levels of excellent quality. If this
combination can occur widely enough to become the norm, then software
engineering can be considered a true engineering discipline.
Software Quality Specialists
As noted earlier in the book, more than 115 types of occupations and
specialists are working in the software engineering domain. In most
knowledge-based occupations such as medicine and law, specialists have
extra training and sometimes extra skills that allow them to outperform
generalists in selected areas such as in neurosurgery or maritime law.
For software engineering, the literature is sparse and somewhat
ambiguous about the roles of specialists. Much of the literature on
software specialization is vaporous and merely expresses some kind
of bias. Many authors prefer a generalist model where individuals are
interchangeable and can handle requirements, design, development,
and testing as needed. Other authors prefer a specialist model where
key skills such as testing, quality assurance, and maintenance are per-
formed by trained specialists.
In this chapter, we will focus primarily on two basic questions:
1. Do specialized skills lower defect potentials and benefit defect
prevention?
2. Do specialized skills raise defect removal efficiency levels?
Not all of the 115 or so specialists will be discussed, but those whose
roles have a potential impact on quality levels will be discussed in terms
of defect prevention and defect removal.
The 20 specialist categories discussed in this chapter include, in
alphabetical order:
1. Architects
2. Business analysts
3. Database analysts
4. Data quality analysts
620
Chapter Nine
5. Enterprise architects
6. Estimating specialists
7. Function point specialists
8. Inspection moderators
9. Maintenance specialists
10. Requirements analysts
11. Performance specialists
12. Risk analysis specialists
13. Security specialists
14. Six Sigma specialists
15. Systems analysts
16. Software quality assurance (SQA)
17. Technical writers
18. Testers
19. Usability specialists
20. Web designers
For each of these 20 specialist groups, we will consider the volume of
potential defects they face, and whether they have a tangible impact on
defect prevention and defect removal activities.
Table 9-23 ranks the specialists in terms of assignment scope. This
metric represents the number of function points normally assigned to
one practitioner. Table 9-23 also shows the volume of defects that the
various occupations face as part of their jobs. Table 9-23 then shows the
approximate impacts of these specialized occupations on both defect
prevention and defect removal.
The top-ranked specialists face large numbers of potential defects
that are also capable of causing great damage to entire corporations
as well as to the software applications owned by those corporations.
Following are short discussions of each of the 20 kinds of specialists.
Risk Analysis Specialists
Assignment scope = 300,000 function points
Defect potentials = 7.00
Defect prevention impact = ­75 percent
Defect removal impact = 25 percent
The large assignment scope of 300,000 function points indicates that
companies do not need many risk analysts, but the ones they employ need
to be very competent and understand both technical and financial risks.
img
Software Quality: The Key to Successful Software Engineering
621
TABLE 9-23
Software Specialization Impact on Software Quality
Assignment
Defect
Defect
Defect
Specialized Occupations
Scope
Potential
Prevention Removal
1.
Risk analysis specialists
300,000
7.00
75.00%
25.00%
2.
Enterprise architects
250,000
6.00
25.00%
20.00%
3.
Six Sigma specialists
250,000
5.00
25.00%
30.00%
4.
Database analysts
100,000
3.00
15.00%
10.00%
5.
Architects
100,000
3.00
17.00%
12.00%
6.
Usability specialists
100,000
1.00
10.00%
15.00%
7.
Security specialists
50,000
7.00
70.00%
20.00%
8.
Data quality analysts
50,000
5.00
12.00%
15.00%
9.
Business analysts
50,000
3.50
25.00%
10.00%
10.
Estimating specialists
25,000
3.00
20.00%
25.00%
11.
Systems analysts
20,000
6.00
20.00%
20.00%
12.
Performance specialists
20,000
1.00
10.00%
12.00%
13.
Quality assurance (QA)
10,000
5.50
15.00%
40.00%
14.
Web designers
10,000
4.00
15.00%
12.00%
15.
Requirements analysts
10,000
4.00
20.00%
15.00%
16.
Testers
10,000
3.00
15.00%
50.00%
17.
Function point specialists
5,000
4.00
10.00%
10.00%
18.
Technical writers
2,000
1.00
10.00%
10.00%
19.
Maintenance specialists
1,500
3.50
30.00%
20.00%
20.
Inspection moderators
1,000
4.50
27.00%
35.00%
Average
68,225
4.00
23.30%
20.30%
Given the enormous number of business failures as part of the recession,
it is obvious that risk analysis is not yet as sophisticated as it should
be; especially for dealing with financial risks.
Risk analysts face more than 100 percent of the potential defects
associated with any given software application. Not only do they have
to deal with technical risks and quality risks, but they also need to
address financial risks and legal risks that are outside the normal realm
of software quality and defect measurement.
A formal and careful risk analysis prior to committing funds to a
major software application can stop investments in excessively haz-
ardous projects before any serious money is spent. For questionable
projects, a formal and careful risk analysis prior to starting the project
can introduce better technologies prior to committing funds.
The keys to successful early risk analysis include the ability to do
early sizing, early cost estimating, early quality estimating, and knowl-
edge of dozens of potential risks derived from analysis of project failures
and successes.
622
Chapter Nine
The main role of risk analysts in terms of quality are to stop bad proj-
ects before they start, and to ensure that projects that do start utilize
state-of-the-art quality methods. Risk analysts also need to understand
the main reasons for software failures, and they should be familiar
with software litigation results for cases dealing with cancelled proj-
ects, breach of contract, theft of intellectual property, patent violations,
embezzlement via software, fraud, tax issues, Sarbanes-Oxley issues,
and other forms of litigation as well.
Enterprise Architects
Assignment scope = 250,000 function points
Defect potentials = 6.00
Defect prevention impact = ­25 percent
Defect removal impact = 20 percent
Enterprise architects are key players whose job is to understand every
aspect of corporate business and to match business needs against entire
portfolios, which may contain more than 3000 separate applications and
total to more than 10 million function points. Not only internal software,
but also open-source applications and commercial software packages
such as Vista and SAP need to be part of the enterprise architect's
domain of knowledge.
The main role of enterprise analysts in terms of quality is to under-
stand the business value of quality to corporate operations, and to
ensure that top executives have similar understandings. Both enter-
prise architects and corporate executives need to push for excellence in
order to achieve speed of delivery.
Enterprise architects also play a role in corporate governance, by
ensuring that critical mistakes such as the Y2K problem are prevented
from occurring in the future.
Six Sigma Specialists
Assignment scope = 250,000 function points
Defect potentials = 5.00
Defect prevention impact = ­25 percent
Defect removal impact = 30 percent
The large assignment scope for Six Sigma specialists indicates that
their work is corporate in nature rather than being limited to specific
applications. The main role of Six Sigma specialists in terms of quality
is to provide expert analysis of defect origins and defect causes, and
to suggest effective methods of continuous improvement to reduce the
major sources of software error.
Software Quality: The Key to Successful Software Engineering
623
Database Analysts
Assignment scope = 100,000 function points
Defect potentials = 7.00
Defect prevention impact = ­75 percent
Defect removal impact = ­25 percent
In today's world of 2009, major corporations and government agencies
own even more data than they own software. Customer data, employee
data, manufacturing data, total to millions of records scattered over
dozens of databases and repositories. This collection of enterprise data
is a valuable asset that needs to be accessed for key business decisions,
and also protected against hacking, theft, and unauthorized access.
There is a major quality weakness in 2009 in the area of data qual-
ity. There are no "data point" metrics that express the size of databases
and repositories. As a result, it is very hard to quantify data quality. In
fact, for all practical purposes, no literature at all on data quality uses
actual counts of errors.
As a result, database analysts and data quality analysts are severely
handicapped. They both play key roles in quality, but lack all of the tools
they need to do a good job.
The major role played by database analysts in terms of quality is to
ensure that databases and repositories are designed and organized in
optimal fashions, and that processes are in place to validate the accu-
racy of all data elements that are added to enterprise data storage.
Architects
Assignment scope = 100,000 function points
Defect potentials = 3.00
Defect prevention impact = ­17 percent
Defect removal impact = 12 percent
Architects also have a large assignment scope, and need to be able to
envision and deal with the largest known applications of the modern
world, such as Vista, ERP packages like SAP and Oracle, air-traffic
control, defense applications, and major business applications.
Over the past 50 years, software applications have evolved from run-
ning by themselves to running under an operating system to running
as part of a multitier network and indeed to running in fragments scat-
tered over a cloud of hardware and software platforms that may be
thousands of miles apart.
As a result, the role of architects has become much more complex
in 2009 than it was even ten years ago. Architects need to understand
modern application practices such as service-oriented architecture (SOA),
624
Chapter Nine
cloud computing, and multitier hierarchies. In addition, architects need
to know the sources and certification methods of various kinds of reus-
able material that constitutes more than 50 percent of many large appli-
cations circa 2009.
The main role that architects play in terms of quality is to under-
stand the implications of software defects in complex, multitier, highly
distributed environments where software components may come from
dozens of sources.
Usability Specialists
Assignment scope = 100,000 function points
Defect potentials = 1.00
Defect prevention impact = ­10 percent
Defect removal impact = 15 percent
The word "usability" defines what customers need to do to operate
software successfully. It also includes what software customers need to
do when the software misbehaves.
Usability specialists often have a background in cognitive psychol-
ogy and are well versed in various kinds of software interfaces: key-
board commands, buttons, touch screens, voice recognitions, and even
more.
The main role of usability specialists in terms of quality is to ensure
that software applications have interfaces and control sequences that
are as natural and intuitive as possible. Usability studies are normally
carried out with volunteer clients who use the software while it is under
development.
Large computer and software companies such as IBM and Microsoft
have usability laboratories where customers can be observed while they
are using prerelease versions of software and hardware products. These
labs monitor keystrokes, screen touches, voice commands, and other
interface methods. Usability specialists also debrief customers after
every session to find out what customers like and dislike about inter-
faces and command sequences.
Security Specialists
Assignment scope = 50,000 function points
Defect potentials = 7.00
Defect prevention impact = ­70 percent
Defect removal impact = 20 percent
There is an increasing need for more software security specialists,
and also for better training of software security specialists both at the
university level and after employment, as security threats evolve and
change.
Software Quality: The Key to Successful Software Engineering
625
As of 2009, due in part to the recession, attacks and data theft are
increasing rapidly in numbers and sophistication. Hacking is rapidly
moving from the domain of individual amateurs to organized crime and
even to hostile foreign governments.
Software applications are not entirely safe behind firewalls, even with
active antivirus and antispyware applications installed. There is an
urgent need to raise the immunity levels of software applications by
using techniques such as Google's Caja, the E programming language,
and changing permission schemas.
Security and quality are not identical, but they are very close together,
and both prevention and removal methods are congruent and synergistic.
The closeness of quality and security is indicated by the fact that major
avenues of attack on software applications are error-handling routines.
The main role of security specialists in terms of quality is to stay cur-
rent on the latest kinds of threats, and to ensure that both new applica-
tions and legacy applications have state-of-the-art security defenses.
Data Quality Analysts
Assignment scope = 50,000 function points
Defect potentials = 5.00
Defect prevention impact = ­12 percent
Defect removal impact = 15 percent
As of 2009, data quality analysts are few in number and woefully
under-equipped in terms of tools and technology. There is no effective
size metric for data volumes (i.e., a data point metric similar to func-
tion points). As a result, no empirical data is available on topics such as
defect potentials for databases and repositories, effective defect removal
methods, defect estimation, or defect measurement.
The theoretical role of data quality analysts is to prevent data errors
from occurring, and to recommend effective removal methods. However,
given the very large number of apparent data errors in public records,
credit scores, accounting, taxes, and so on, it is obvious that data quality
lags behind even software quality. In fact, data and software appear to
lag behind every other engineering and technical domain in terms of
quality control.
Business Analysts
Assignment scope = 50,000 function points.
Defect potentials = 3.5
Defect prevention impact = ­25 percent
Defect removal impact = 10 percent
In many information technology organizations, business analysts
are the primary connection between the software community and the
626
Chapter Nine
community of software users. Business analysts are required to be well
versed in both business needs and in software engineering technologies.
The main role that business analysts should play in terms of qual-
ity is to convince both the business and technical communities that
high levels of software quality will shorten development schedules and
lower development costs. Too often, the business clients set arbitrary
schedules and then attempt to force the software community to try
and meet those schedules by skimping on inspections and truncating
testing.
Good business analysts should have data available from sources
such as the International Software Benchmarking Standards Group
(ISBSG) that shows the relationships between quality, schedules, and
costs. Business analysts should also understand the value of methods
such as joint application design (JAD), quality function deployment
(QFD), and requirements inspections.
Estimating Specialists
Assignment scope = 25,000 function points
Defect potentials = 3.00
Defect prevention impact = ­20
Defect removal impact = 25 percent
It is a sign of sophistication when a company employs software esti-
mating specialists. Usually these specialists work in project offices or
special staff groups that support line managers, who often are not well
trained in estimation.
Estimation specialists should have access to and be familiar with the
major software estimating tools that can predict quality, schedules, and
costs. Examples of such tools include COCOMO, KnowledgePlan, Price-
S, SoftCost, SEER, Slim, and a number of others. In fact, a number of
companies utilize several of these tools for the same applications and
look for convergence.
The main role of an estimating specialist in terms of quality is to pre-
dict quality early. Ideally, quality will be predicted before substantial
funds are spent. Not only that, but multiple estimates may be needed
to show the effects of variations in development practices such as Agile
development, Team Software Process (TSP), Rational Unified Process
(RUP), formal inspections, static analysis, and various kinds of testing.
Systems Analysts
Assignment scope = 20,000 function points
Defect potentials = 6.00
Defect prevention impact = ­25 percent
Defect removal impact = 20 percent
Software Quality: The Key to Successful Software Engineering
627
Software systems analysts are one of the interface points between
the software engineering or programming community and end users
of software. Systems analysts and business analysts perform similar
roles, but the title "systems analyst" occurs more often for embedded
and systems software, which are developed for technical purposes rather
than to satisfy local business needs.
The main role of systems analysts in terms of quality is to understand
that all forms of representation for software (user stories, use-cases,
formal specification languages, flowcharts, Nassi-Schneiderman charts,
etc.) may contain errors. These errors may not be amenable to discovery
via testing, which would be too late in any case. Therefore, a key role
of systems analysts is to participate in formal inspections of require-
ments, internal design documents, and external design documents. If
the application is being constructed using test-driven development,
systems analysts will participate in test case design and construction.
Systems analysts will also participate in activities such as joint applica-
tion design (JAD) and quality function deployment (QFD).
Performance Specialists
Assignment scope = 20,000 function points
Defect potentials = 1.00
Defect prevention impact = ­10 percent
Defect removal impact = 12 percent
The occupation of "performance specialist" is usually found only in
very large companies that build very large and complex software appli-
cations; that is, IBM, Raytheon, Lockheed, Boeing, SAP, Oracle, Unisys,
Google, Motorola, and the like.
The general role of performance specialists is to understand every
potential bottleneck in hardware and software platforms that might
slow down performance.
Sluggish or poor performance is viewed as a quality issue, so the role
of performance specialists is to assist software engineers and software
designers in building software that will achieve good performance levels.
In today's world of 2009, with multitier architectures as the dominant
model and with multiple programming languages as the dominant form
of development, the work of performance specialists has become much
more difficult than it was only ten years ago. Looking ahead, the work
of performance specialists will probably become even more difficult ten
years from now.
Software Quality Assurance
Assignment scope = 10,000 function points
Defect potentials = 5.50
628
Chapter Nine
Defect prevention impact = ­15 percent
Defect removal impact = 40 percent
The general title of "quality assurance" is much older than software
and has been used by engineering companies for about 100 years.
Within the software world, the title of "software quality assurance" has
existed for more than 50 years. Today in 2009, software quality special-
ists average between 2 percent and 6 percent of total software employ-
ment in most large companies. The hi-tech companies such as IBM and
Lockheed employ more software quality assurance personnel than do
lo-tech companies such as insurance and general manufacturing.
A small percentage of software quality assurance personnel have been
certified by one or more of the software quality assurance associations.
The roles of software quality assurance vary from company to com-
pany, but they usually include these core activities: ensuring that
relevant international and corporate quality standards are used and
adhered to, measuring defect removal efficiency, measuring cyclomatic
and essential complexity, teaching classes in quality, and estimating or
predicting quality levels.
A few very sophisticated companies such as IBM have quality assurance
research positions, where the personnel can develop new and improved
quality control methods. Some of the results of these QA research groups
include formal inspections, function point metrics, automated con-
figuration control tools, clean-room development, and joint application
design (JAD).
Given the fact that quality assurance positions have existed for more
than 50 years and that SQA personnel number in the thousands, why is
software quality in 2009 not much better than it was in 1979?
One reason is that in many companies, quality assurance plays an advi-
sory role, but their advice does not have to be followed. In some companies
such as IBM, formal QA approval is necessary prior to delivering a prod-
uct to customers. If the QA team feels that quality methods were deficient,
then delivery will not occur. This is a very serious business issue.
In fact, very few projects are stopped from being delivered. But the
theoretical power to stop delivery if quality is inadequate is a strong
incentive to pursue state-of-the-art quality control methods.
Therefore, a major role of software quality assurance is to ensure that
state-of-the-art measures, methods, and tools are used for quality control,
with the knowledge that poor quality can lead to delays in delivery.
Web Designers
Assignment scope = 10,000 function points
Defect potentials =4.00
Defect prevention impact = ­15 percent
Defect removal impact = 12 percent
Software Quality: The Key to Successful Software Engineering
629
Software web design is a fairly new occupation, but one that is grow-
ing faster than almost any other. The fast growth in web design is due
to software companies and other businesses migrating to the Web as
their main channel for marketing and information.
The role of web design in terms of software quality is still evolving
and will continue to do so as web sites move toward virtual reality and
3-D representation. As of 2009, some of the roles are to ensure that all
interfaces are fairly intuitive, and that all links and connections actu-
ally work.
Unfortunately, due to the exponential increase in hacking, data theft,
and denial of service attacks, web quality and web security are now
overlapping. Effective quality for web sites must include effective secu-
rity, and many web design specialists do not yet know enough about
security to be fully effective.
Requirements Analysts
Assignment scope = 10,000 function points
Defect potentials = 4.00
Defect prevention impact = ­20 percent
Defect removal impact = 15 percent
The work of requirements analysts overlaps the work of systems ana-
lysts and business analysts. However, those who specialize in require-
ments analysis also know topics such as quality function deployment
(QFD), joint application design (JAD), requirements inspections, and at
least half a dozen requirements representation methods such as use-
cases, user stories, and several others.
Because the majority of "new" applications being developed circa
2009 are really nothing more than replacements for legacy applications,
requirements analysts should also be conversant with data mining. In
fact, the best place to start the requirements analysis for a replacement
application is to mine the older legacy application for business rules
and algorithms that are hidden in the code. Data mining is necessary
because usually the original specifications are either missing completely
or long out of date.
The role of requirements analysis in terms of quality is to ensure that
toxic requirements defects are removed before they enter the design or
find their way into source code. The frequently cited Y2K problem is an
example of a toxic requirement.
Because the measured rate at which requirements grow after the
requirements phase is between 1 percent and 3 percent per calendar
month, another quality role is to ensure that prototypes, embedded
users, JAD, or other methods are used that minimize unplanned changes
in requirements.
630
Chapter Nine
Requirements analysts should also be members of or support change
control boards that review and approve requirements changes.
Testers
Assignment scope = 10,000 function points
Defect potentials = 3.00
Defect prevention impact = ­15 percent
Defect removal impact = 50 percent
Software testing is one of the specialized occupations where there is
some empirical evidence that specialists can outperform generalists.
Not every kind of testing is performed by test specialists. For example,
unit testing is almost always carried out by the developers. However, the
forms of testing that integrate the work of entire teams of developers need
testing specialists for large applications. Such forms of testing include new
function testing, regression testing, and system testing among others.
The role of test specialists in terms of quality is to ensure that test
coverage approaches 99 percent, that test cases themselves do not con-
tain errors, and that test libraries are effectively maintained and purged
of duplicate test cases that add cost but not value.
Although not a current requirement for test case personnel, it would
be useful if test specialists also measured defect removal efficiency
levels and attempted to raise average testing efficiency from today's
average of around 35 percent up to at least 75 percent.
Test specialists should also be pioneers in new testing technologies
such as automated testing. Running static analysis tools prior to testing
could also be added with some value accruing.
Function Point Specialists
Assignment scope = 5000 function points
Defect potentials = 4.00
Defect prevention impact = ­10 percent
Defect removal impact = 10 percent
Because function point metrics are the best choice for normalizing
quality data and creating effective benchmarks of quality information,
function point specialists are rapidly becoming part of successful quality
improvement programs.
However, traditional manual counts of function points are too slow and
too costly to be used as standard quality control methods. The average
counting speed by a certified function point specialist is only about 400
function points per day. This explains why function point analysis is almost
never used for applications larger than about 10,000 function points.
However, new methods have been developed that allow function points
to be calculated at least six months earlier than previously possible.
Software Quality: The Key to Successful Software Engineering
631
These same methods operate at speeds in excess of 10,000 function
points per minute. This makes it possible to use function points for early
quality estimation, as well as for measuring quality and producing qual-
ity benchmarks.
The role of function point specialists in terms of quality is to create
useful size information fast enough and early enough that it can serve
for risk analysis, quality prediction, and quality measures.
Technical Writers
Assignment scope = 2000 function points
Defect potentials = 1.00
Defect prevention impact = ­10 percent
Defect removal impact = 10 percent
Good writing is a fairly rare skill in the human species. As a result,
good software technical manuals are also fairly rare. Many kinds of
quality problems are common in software manuals, including ambigu-
ity, missing information, poor organization structures, and incurred
data.
There are automated tools available that can analyze the readabil-
ity of text, such as the FOG index and the Fleisch index. But these
are seldom used for software manuals. Editing is useful, as are formal
inspections of user documentation.
Another approach, which was actually used by IBM, was to select
examples of user documents with the highest user evaluation scores
and use them as samples.
The role of technical writers in terms of software quality is make sure
that factual data is complete and correct, and that manuals are easy to
read and understand.
Maintenance Specialists
Assignment scope = 1,500 function points
Defect potentials = 3.5
Defect prevention impact = ­30 percent
Defect removal impact = 20 percent
Maintenance programming in terms of both enhancing legacy soft-
ware and repairing bugs has been the dominant activity for the software
industry for more than 20 years. This should not be a surprise, because
for every industry older than 50 years, more people are working on
repairs of existing products than are working on new development.
As the recession deepens and lengthens, the U.S. automobile industry
is providing a very painful example of this fact: automotive manufac-
turing is shrinking faster than the polar ice fields, while automotive
repairs are increasing.
632
Chapter Nine
Aging legacy applications have a number of quality problems, includ-
ing poor structure, dead code, error-prone modules, and poor or missing
comments.
As the recession continues, many companies are considering ways of
stretching out the useful lives of legacy applications. In fact, renovation
and data mining of legacy software are both growing, even in the face
of the recession.
The main role of maintenance programmers in terms of quality is to
strengthen the quality of legacy software. The methods available to do this
include full renovation using automated tools; complexity measurement
and reduction; dead code removal; improving comments; identification
and surgical removal of error-prone modules; converting code from orphan
languages such as MUMPS or Coral into modern languages such as Java
or Ruby, and improving the security flaws of legacy applications.
Inspection Moderators
Assignment scope = 1000 function points
Defect potentials = 4.5
Defect prevention impact = ­25 percent
Defect removal impact = 35 percent
Software inspections have a number of standard roles, including the
moderator, the recorder, the inspectors, and the person whose work is
being inspected. The moderator is the key to a successful inspection.
The tasks of the moderator include keeping the discussions on track,
minimizing disruptive events, and ensuring that the inspection session
starts and ends on time.
The main role of inspection moderators in terms of quality include
ensuring the materials to be inspected are delivered in time for pre-inspec-
tion review, making sure that the inspectors and other personnel show up
on time, keeping the inspection team focused on defect identification (as
opposed to repairs), and intervening in potential arguments or disputes.
The inspection recorder plays a key role too, because the recorder
keeps notes and fills out the defect reports of all bugs or defects that
the inspection identified. This is not as easy as it sounds, because there
may be some debate as to whether a particular issue is a defect or a
possible enhancement.
Summary and Conclusions on
Software Specialization
The overall topic of software specialization is not well covered in the
software engineering literature. Considering that there are more than
115 specialists associated with software, this fact is mildly surprising.
Software Quality: The Key to Successful Software Engineering
633
When it comes to software quality, some forms of specialization do add
value, and this can be shown by analysis of both defect prevention and
defect removal. The key specialists who add the most value to software
quality include risk analysts, Six Sigma specialists, quality assurance
personnel, inspection moderators, maintenance specialists, and profes-
sional test personnel.
However, many other specialists such as business analysts, enterprise
architects, architects, estimating specialists, and function point special-
ists also add value.
The Economic Value of
Software Quality
The economic value of software quality is not well covered in the soft-
ware engineering literature. There are several reasons for this prob-
lem. One major reason is the rather poor measurement practices of
the software engineering domain. Many cost factors such as unpaid
overtime are routinely ignored. In addition, there are frequent gaps and
omissions in software cost data, such as omission of project manage-
ment costs and the omission of part-time specialists such as technical
writers. In fact, only the effort and costs of coding have fairly good data
available. Everything else, such as requirements, design, inspections,
testing, quality assurance, project offices, and documentation tend to be
underreported or ignored.
As pointed out in other sections, the software engineering literature
depends too much on vague and unpredictable definitions of quality
such as "conformance to requirements" or adhering to a collection of
ambiguous terms ending in ility. These unscientific definitions slow
down research on software quality economics.
Two other measurement problems also affect quality economic stud-
ies. These problems are the usage of two invalid economic measures:
cost per defect and lines of code. As discussed earlier in this chapter,
cost per defect penalizes quality and achieves its lowest costs for the
buggiest applications. Lines of code penalizes high-level programming
languages and disguises the value of high-level languages for studying
either quality or productivity.
In this section, the economic value of quality will be shown by means
of eight case studies. Because the value of software quality correlates
to application size, four discrete size ranges will be used: 100 function
points, 1000 function points, 10,000 function points, and 100,000 func-
tion points.
Applications in the 100­function point range are usually small fea-
tures for larger systems rather than stand-alone applications. However,
this is a very common size range for prototypes of larger applications.
634
Chapter Nine
There may be small stand-alone applications in this range such as cur-
rency converters or applets for devices such as iPhones.
Applications in the 1000­function point range are normally stand-
alone software applications such as fuel-injection controls, atomic watch
controls, compilers for languages such as Java, and software estimating
tools in the class of COCOMO.
Applications in the 10,000­function point range are normally impor-
tant systems that control aspects of business, such as insurance claims
processing, motor vehicle registration, or child-support applications.
Applications in the 100,000­function point range are normally major
systems in the class of large international telephone-switching systems,
operating systems in the class of Vista and IBM's MVS, or suites of
linked applications such as Microsoft Office. Some enterprise resource
planning (ERP) applications are in this size range, and may even top
300,000 function points. Also, large defense applications such as the
World Wide Military Command and Control System (WWMCCS) also
top 100,000 function points.
To reduce the number of variables, all eight of the examples are
assumed to be coded in the C programming language and have a ratio
of about 125 code statements per function point.
Because all eight of the applications are assumed to be written in the
same programming language, productivity and quality can be expressed
using the lines of code metric without distortion. The lines of code metric
is invalid for comparisons between unlike programming languages.
For each size plateau, two cases will be illustrated: average quality
and excellent quality. The average quality case assumes waterfall devel-
opment, CMMI level 1, normal testing, and nothing special in terms of
defect prevention.
The excellent quality case assumes at least CMMI level 3, formal
inspections, static analysis, rigorous development such as the Team
Software Process (TSP), and the use of prototypes and joint application
design (JAD) for requirements gathering.
(Some readers may wonder why Agile development is not used for the
case studies. The main reason is that there are no Agile applications
in the 10,000­ and 100,000­function point ranges. The Agile method
is used primarily for smaller applications in the 1000­function point
range.)
Although all of the case studies are derived from actual applications,
to make the calculations consistent, a number of simplifying assump-
tions are used. These assumptions include the following key points:
All cost data is based on a fully burdened cost of $10,000 per staff
month. A staff month is considered to have 132 working hours. This
is equivalent to $75.75 per hour.
Software Quality: The Key to Successful Software Engineering
635
Work months are assumed to consist of 22 days, and each day consists
of 8 hours. Unpaid overtime is not shown nor is paid overtime.
Defect potentials are the total numbers of defects found in five categories:
requirements defects, design defects, code defects, documentation
defects, and bad fixes, or secondary defects accidentally included in
defect repairs.
Creeping requirements are not shown. The sizes of the six case studies
reflect application size as delivered to clients.
Software reuse is not shown. All six cases can be assumed to reuse
about 15 percent of legacy code. But to simplify assumptions, the
defect potentials in the reused code and other materials are assumed
to equal defect potentials of new material. Larger volumes of certified
reusable material would significantly improve both the quality and
productivity of all six case studies, and especially so for the larger
systems above 10,000 function points.
Bad-fix injections are not shown. About 7 percent of attempts to repair
bugs accidentally introduce a new bug, but the mathematics of bad-fix
injection is complicated since the bugs are not found in the activity
where they originate.
The first year of maintenance is assumed to find 100 percent of latent
bugs delivered with the software. In reality, many bugs fester for
years, but the examples only show the first year of maintenance.
The maintenance data only shows defect repairs. Enhancements
and adding new features are excluded in order to highlight quality
value.
Maintenance defect repair rates are based on average values of
12 bugs fixed per staff month. In real life, ranges can run from fewer
than 4 to more than 20 bugs repaired each month.
Application staff size is based on U.S. average assignment scopes for
all classes of software personnel, which is approximately 150 function
points. That is, if you divide application size in function points by the
total staffing complement of technical workers plus project manag-
ers, the result will be close to 150 function points. This value includes
software engineers and also specialists such as quality assurance,
technical writers, and test personnel.
Schedules for the "average" cases are based on raising function point
size to the 0.4 power. This rule of thumb provides a fairly good approx-
imation of schedules from start of requirements to delivery in terms
of calendar months.
Schedules for the "excellent" cases are based on raising function point
size to the 0.36 power. This exponent works well with object-oriented
636
Chapter Nine
software and rigorous development practices. It is also a good fit for
Agile projects, except that the lack of data above 10,000 function
points for Agile makes the upper level uncertain.
Data in this section is expressed using the function point metric defined
by the International Function Point Users' Group (IFPUG) version 4.2
of the counting rules. Other functional metrics such as COSMIC func-
tion points or engineering function points or Mark II function points
would yield different results from the values shown here.
Data on source code in this section is expressed using counts of logical
statements rather than counts of physical lines. There can be as much
as 500 percent difference in apparent code size based on whether
counts are physical or logical lines. The counting rules are those of
the author's book Applied Software Measurement.
The reason for these simplifying assumptions is to minimize extra-
neous variations among the eight case studies, so that the data is pre-
sented in a consistent fashion for each. Because all of these assumptions
vary in real life, readers are urged to try out alternative values based on
their own local data or on benchmarks from organizations such as the
International Software Benchmarking Standards Group (ISBSG).
The simplifying assumptions serve to make the results consistent,
but each of the assumptions can change in either direction by fairly
large amounts.
The Value of Quality for Very Small
Applications of 100 Function Points
Small applications in this range usually have low defect potentials and
fairly high defect removal efficiency levels. This is because such small
applications can be developed by a single person, so there are no inter-
face problems between features developed by different individuals or
different teams. Table 9-24 shows quality value for very small applica-
tions of 100 function points.
Note that cost per defect goes up as quality improves; not down. This
phenomenon distorts economic analysis. As will be shown in the later
examples, cost per defect tends to decline as applications grow larger. This
is because large applications have many more defects than small ones.
Prototypes or applications in this size range are very sensitive to
individual skill levels, primarily because one person does almost all of
the work. The measured variations for this size range are about 5 to 1 in
how much code gets written for a given specification and about 6 to 1 in
terms of productivity and quality levels. Therefore, average values need
to be used with caution. Averages are particularly unreliable for applica-
tions where one person performs the bulk of the entire application.
img
Software Quality: The Key to Successful Software Engineering
637
TABLE 9-24
Quality Value for 100 Function Point Applications
(Note: 100 function points = 12,500 C statements)
Average
Excellent
Quality
Quality
Difference
Defects per function point
3.50
1.50
­2.00
Defect potential
350
150
­200.00
Defect removal efficiency
94.00%
99.00%
5.00%
Defects removed
329
149
­181
Defects delivered
21
2
­20
Cost per defect prerelease
$379
$455
$76
Cost per defect postrelease
$1,061
$1,288
$227
Development schedule (calendar months)
6
5
­1
Development staffing
1
1
0
Development effort (staff months)
6
5
­1
Development costs
$63,096
$52,481
­$10,615
Function points per staff month
15.85
19.05
3.21
LOC per staff month
1,981
2,382
401
Maintenance staff
1
1
0
Maintenance effort (staff months)
2
0
­1.63
Maintenance costs (year 1)
$17,500
$1,250
­$16,250
TOTAL EFFORT
8
5
­3
TOTAL COST
$80,596
$53,731
­$26,865
TOTAL COST PER STAFF MEMBER
$40,298
$26,865
­$13,432
TOTAL COST PER FUNCTION POINT
$805.96
$537.31
­$269
TOTAL COST PER LOC
$6.45
$4.30
­$2.15
AVERAGE COST PER DEFECT
$720
$871
$152
The Value of Quality for Small Applications
of 1000 Function Points
For small applications of 1000 function points, quality starts to become
very important, but it is also somewhat easier to achieve than it is for
large systems. At this size range, teams are small and methods such
as Agile development tend to be dominant, other than for systems and
embedded software where more rigorous methods such as the Team
Software Process (TSP) and the Rational Unified Process (RUP) are
more common. Table 9-25 shows the value of quality for small applica-
tions in the 1000­function point range.
The bulk of the savings for the Excellent Quality column shown in
Table 9-25 would come from shorter testing schedules due to the use of
requirements, design, and code inspections. Other changes that added
value include the use of Team Software Process (TSP), static analysis
prior to testing, and the achievement of higher CMMI levels.
img
638
Chapter Nine
TABLE 9-25
Quality Value for 1000­Function Point Applications
(Note: 1000 function points = 125,000 C statements)
Average
Excellent
Quality
Quality
Difference
Defects per function point
4.50
2.50
­2.00
Defect potential
4,500
2,500
­2,000
Defect removal efficiency
93.00%
97.00%
4.00%
Defects removed
4,185
2,425
­1,760
Defects delivered
315
75
­240.00
Cost per defect prerelease
$341
$417
$76
Cost per defect postrelease
$909
$1,136
$227
Development schedule (calendar months)
16
12
­4
Development staffing
7
7
0.00
Development effort (staff months)
106
80
­26
Development costs
$1,056,595
$801,510
­$255,086
Function points per staff month
9.46
12.48
3.01
LOC per staff month
1,183
1,560
376.51
Maintenance staff
2
2
0
Maintenance effort (staff months)
26
6
­20.00
Maintenance costs (year 1)
$262,500
$62,500
­$200,000
TOTAL EFFORT
132
86
­46
TOTAL COST
$1,319,095
$864,010
­$455,086
TOTAL COST PER STAFF MEMBER
$158,291
$103,681
­$54,610
TOTAL COST PER FUNCTION POINT
$1,319.10
$864.01
­$455
TOTAL COST PER LOC
$10.55
$6.91
­$3.64
AVERAGE COST PER DEFECT
$625
$776
$152
In the size range of 1000 function points, numerous methods are fairly
effective. For example, both Agile development and extreme program-
ming report good results in this size range as do the Rational Unified
Process (RUP) and the Team Software Process (TSP).
The Value of Quality for Large Applications
of 10,000 Function Points
When software applications reach 10,000 function points, they are
very significant systems that require close attention to quality control,
change control, and corporate governance. In fact, without careful qual-
ity and change control, the odds of failure or cancellation top 35 percent
for this size range.
Note that as application size increases, defect potentials increase rap-
idly and defect removal efficiency levels decline, even with sophisticated
quality control steps in place. This is due to the exponential increase in
img
Software Quality: The Key to Successful Software Engineering
639
the volume of paperwork for requirements and design, which often leads
to partial inspections rather than 100 percent inspections. For large
systems, test coverage declines and the number of test cases mounts rap-
idly, but cannot usually keep pace with complexity. Table 9-26 shows the
increasing value of quality as size goes up to 10,000 function points.
Cost savings from better quality increase as application sizes increase.
The general rule is that the larger the software application, the more valu-
able quality becomes. The same principle is true for change control, because
the volume of creeping requirements goes up with application size.
For large systems, the available methods that demonstrate improve-
ment begin to decline. For example, Agile methods are difficult to apply,
and when they are, the results are not always good. For large systems,
rigorous methods such as the Rational Unified Process (RUP) or Team
Software Process (TSP) yield the best results and have the greatest
amount of empirical data.
TABLE 9-26
Quality Value for 10,000­Function Point Applications
(Note: 10,000 function points = 1,250,000 C statements)
Average
Excellent
Quality
Quality
Difference
Defects per function point
6.00
3.50
­2.50
Defect potential
60,000
35,000
­25,000
Defect removal efficiency
84.00%
96.00%
12.00%
Defects removed
50,400
33,600
­16,800
Defects delivered
9,600
1,400
­8,200
Cost per defect prerelease
$341
$417
$76
Cost per defect postrelease
$833
$1,061
$227
Development schedule (calendar months)
40
28
­12
Development staffing
67
67
0.00
Development effort (staff months)
2,654
1,836
­818
Development costs
$26,540,478
$18,361,525
­$8,178,953
Function points per staff month
3.77
5.45
1.68
LOC per staff month
471
681
209.79
Maintenance staff
17
17
0
Maintenance effort (staff months)
800
117
­683.33
Maintenance costs (year 1)
$8,000,000
$1,166,667
­$6,833,333
TOTAL EFFORT (STAFF MONTHS)
3,454
1,953
­1501
TOTAL COST
$34,540,478
$19,528,191 ­$15,012,287
TOTAL COST PER STAFF MEMBER
$414,486
$234,338
­$180,147
TOTAL COST PER FUNCTION POINT
$3,454.05
$1,952.82
­$1,501.23
TOTAL COST PER LOC
$27.63
$15.62
­$12.01
AVERAGE COST PER DEFECT
$587
$739
$152
img
640
Chapter Nine
The Value of Quality for Very Large
Applications of 100,000 Function Points
Software applications in the 100,000­function point range are among
the most costly endeavors of modern business. These large systems
are also hazardous, because many of them fail, and almost all of them
exceed their budgets and planned schedules.
Without excellence in software quality control, the odds of complet-
ing a software application of 100,000 function points are only about
20 percent. The odds of finishing it on time and within budget hover
close to 0 percent.
Even with excellent quality control and excellent change control, mas-
sive applications in the 100,000­function point range are expensive
and troublesome. Table 9-27 illustrates the two cases for such massive
applications.
TABLE 9-27
Quality Value for 100,000­Function Point Applications
(Note: 100,000 function points = 12,500,000 C statements)
Average
Excellent
Quality
Quality
Difference
Defects per function point
7.00
4.00
­3.00
Defect potential
700,000
400,000
­300,000
Defect removal efficiency
81.00%
94.00%
13.00%
Defects removed
567,000
376,000
­191,000
Defects delivered
133,000
24,000
­109,000
Cost per defect prerelease
$303
$379
$76
Cost per defect postrelease
$758
$985
$227
Development schedule (calendar months)
100
63
­37
Development staffing
667
667
0.00
Development effort (staff months)
66,667
42,064
­24,603
Development costs
$666,666,667 $420,638,230 ­$246,028,437
Function points per staff month
1.50
2.38
0.88
LOC per staff month
188
297
109.67
Maintenance staff
167
167
0
Maintenance effort (staff months)
11,083
2,000
­9,083
Maintenance costs (year 1)
$110,833,333
$20,000,000
­$90,833,333
TOTAL EFFORT
77,750
44,064
­33686
TOTAL COST
$777,500,000 $440,638,230 ­$336,861,770
TOTAL COST PER STAFF MEMBER
$933,000
$528,766
­$404,234
TOTAL COST PER FUNCTION POINT
$7,775.00
$4,406.38
­$3,368.62
TOTAL COST PER LOC
$62.20
$352.51
$290.31
AVERAGE COST PER DEFECT
$530
$682
$152
Software Quality: The Key to Successful Software Engineering
641
There are several reasons why defect potentials are so high for mas-
sive applications and why defect removal efficiency levels are reduced.
The first reason is that for such massive applications, requirements
changes will be so numerous that they exceed most companies' ability
to control them well.
The second reason is that paperwork volumes tend to rise with applica-
tion size, and this slows down activities such as inspections of requirements
and design. As a result, massive applications tend to use partial inspec-
tions rather than 100 percent inspections of major deliverable items.
A third reason, which was worked out mathematically at IBM in the
1970s, is that the number of test cases needed to achieve 90 percent
coverage of code rise exponentially with size. In fact, the number of test
cases required to fully test a massive system of 100,000 function points
approaches infinity. As a result, testing efficiency declines with size,
even though static analysis and inspections stay about the same.
A useful rule of thumb for predicting overall number of test cases is to
raise application size in function points to the 1.2 power. As can be seen,
test case volumes rise very rapidly, and most companies cannot keep
pace, so test coverage declines. Automated static analysis is still effec-
tive. Inspections are also effective, but for 100,000 function points, partial
inspections of key deliverables are the norm rather than 100 percent
inspections. This is because paperwork volumes also rise exponentially
with size.
Return on Investment in Software Quality
As already mentioned, the value of software quality goes up as appli-
cation size goes up. Table 9-28 calculates the approximate return on
investment for the "excellent" case studies of 100 function points, 1000
function points, 10,000 function points, and 100,000 function points.
Here too the assumptions are simplified to make calculations easy
and understandable. The basic assumption is that every software team
member needs five days of training to get up to speed in software inspec-
tions and the Team Software Process (TSP). These training days are
then multiplied by average hourly costs of $75.75 per employee.
These training expenses are then divided into the total savings figure
that includes both development and maintenance savings due to high
quality. The final result is the approximate ROI based on dividing value
by training expenses. Table 9-28 illustrates the ROI calculations.
The ROI figure reflects the total savings divided by the total train-
ing expenses needed to bring team members up to speed in quality
technologies.
In real life, these simple assumptions would vary widely, and other
factors might also be considered. Even so, high levels of software quality
img
642
Chapter Nine
TABLE 9-28
Return on Investment in Software Quality
Function point size
100
1,000
10,000
100,000
Education hours
80
560
5,360
53,360
Education costs
$6,060
$42,420
$406,020
$4,042,020
Savings from high quality
$26,865
$455,086
$15,012,287
$336,861,770
Return on investment (ROI)
$4.43
$10.73
$36.97
$83.34
have a very solid return on investment due to the reduction in develop-
ment schedules, development costs, and maintenance costs.
There may be many other topics where software engineers and man-
agers need training, and there may be other cost elements such as the
costs of ascending to the higher levels of the capability maturity model.
While the savings from high quality are frequently observed, the exact
ROI will vary based on the way training and process improvement work
is handled under local accounting rules.
If the reduced risks of cancelled projects or major overruns were
included in the ROI calculations, the value would be even higher.
Other technologies such as high volumes of certified reusable mate-
rial would also have a beneficial impact on both quality and productiv-
ity. However, as this book is written in 2009, only limited sources are
available for certified reusable materials. Uncertified reuse is hazardous
and may even be harmful rather than beneficial.
Summary and Conclusions
In spite of the fact that the software industry spends more money on
finding and fixing bugs than any other activity, software quality remains
ambiguous and poorly covered in the software engineering literature.
There are dozens of books on software quality and testing, but hardly
any of them contain quantitative data on defect volumes, numbers of
test cases, test coverage, or the costs associated with defect removal
activities.
Even worse, much of the literature on quality merely cites urban
legends of how "cost per defect rises throughout development and into
the field," without realizing that such a trend is caused by ignoring
fixed costs.
Software quality does have value, and the value increases as applica-
tion sizes get bigger. In fact, without excellence in quality control, even
completing a large software application is highly unlikely. Completing
it on time and within budget in the absence of excellent quality control
is essentially impossible.
Software Quality: The Key to Successful Software Engineering
643
Readings and References
Beck, Kent. Test-Driven Development. Boston, MA: Addison Wesley, 2002.
Chelf, Ben and Raoul Jetley. Diagnosing Medical Device Software Defects Using Static
Analysis. San Francisco, CA: Coverity Technical Report, 2008.
Chess, Brian and Jacob West. Secure Programming with Static Analysis. Boston, MA:
Addison Wesley, 2007.
Cohen, Lou. Quality Function Deployment--How to Make QFD Work for You. Upper
Saddle River, NJ: Prentice Hall, 1995.
Crosby, Philip B. Quality is Free. New York, NY: New American Library, Mentor Books,
1979.
Everett, Gerald D. and Raymond McLeod. Software Testing. Hoboken, NJ: John Wiley &
Sons, 2007.
Gack, Gary. Applying Six Sigma to Software Implementation Projects. http://software
.isixsigma.com/library/content/c040915b.asp.
Gilb, Tom and Dorothy Graham. Software Inspections. Reading, MA: Addison Wesley,
1993.
Hallowell, David L. Six Sigma Software Metrics, Part 1. http://software.isixsigma.com/
library/content/c03910a.asp.
International Organization for Standards. ISO 9000 / ISO 14000. http://www.iso.org/iso/
en/iso9000-14000/index.html.
Jones, Capers. Software Quality--Analysis and Guidelines for Success. Boston, MA:
International Thomson Computer Press, 1997.
Kan, Stephen H. Metrics and Models in Software Quality Engineering, Second Edition.
Boston, MA: Addison Wesley Longman, 2003.
Land, Susan K., Douglas B. Smith, John Z. Walz. Practical Support for Lean Six Sigma
Software Process Definition: Using IEEE Software Engineering Standards. Los
Alamitos, CA; Wiley-IEEE Computer Society Press, 2008.
Mosley, Daniel J. The Handbook of MIS Application Software Testing. Englewood Cliffs,
NJ: Yourdon Press, Prentice Hall, 1993.
Myers, Glenford. The Art of Software Testing. New York, NY: John Wiley & Sons, 1979.
Nandyal. Raghav. Making Sense of Software Quality Assurance. New Delhi: Tata
McGraw-Hill Publishing, 2007.
Radice, Ronald A. High Quality Low Cost Software Inspections. Andover, MA:
Paradoxicon Publishing, 2002.
Wiegers, Karl E. Peer Reviews in Software--A Practical Guide. Boston, MA: Addison
Wesley Longman, 2002.
This page intentionally left blank