Software Quality: The Key to Successful Software Engineering, Measuring Software Quality, Software Defect Removal

<< Programming and Code Development:How Many Programming Languages Are Really Needed?

Index >>

Chapter

Software Quality: The Key to

Successful Software Engineering

Introduction

The overall software quality averages for the United States have

scarcely changed since 1979. Although national data is flat for quality,

a few companies have made major improvements. These happen to be

companies that measure quality because they define quality in such a

way that both prediction and measurement are possible.

The same companies also use full sets of defect removal activities that

include inspections and static analysis as well as testing. Defect preven-

tion methods such as joint application design (JAD) and development

methods that focus on quality such as Team Software Process (TSP)

are also used, once the importance of quality to successful software

engineering is realized.

Historically, large software projects spend more time and effort on

finding and fixing bugs than on any other activity. Because software

defect removal efficiency only averages about 85 percent, the major

costs of software maintenance are finding and fixing bugs accidentally

released to customers.

When development defect removal is added to maintenance defect

removal, the major cost driver for total cost of ownership (TCO) is that

of defect removal. Between 30 percent and 50 percent of every dollar

ever spent on software has gone to finding and fixing bugs.

When software projects run late and exceed their budgets, a main

reason is excessive defect levels, which slow down testing and force

applications into delays and costly overruns.

555

556

Chapter Nine

When software projects are cancelled and end up in court for breach

of contract, excessive defect levels, inadequate defect removal, and poor

quality measures are associated with every case.

Given the fact that software defect removal costs have been the pri-

mary cost driver for all major software projects for the past 50 years, it

is surprising that so little is known about software quality.

There are dozens of books about software quality and testing, but very

few of these books actually contain solid and reliable quantified data

about basic topics such as:

1. How many bugs are going to be present in specific new software

applications?

2. How many bugs are likely to be present in legacy software applica-

tions?

3. How can software quality be predicted and measured?

4. How effective are ISO standards in improving quality?

5. How effective are software quality assurance organizations in

improving quality?

6. How effective is software quality assurance certification for improv-

ing quality?

7. How effective is Six Sigma for improving quality?

8. How effective is quality function deployment (QFD) for improving

quality?

9. How effective are the higher levels of the CMMI in improving

quality?

10. How effective are the forms of Agile development in improving

quality?

11. How effective is the Rational Unified Process (RUP) in improving

quality?

12. How effective is the Team Software Process (TSP) in improving

quality?

13. How effective are the ITIL methods in improving quality?

14. How effective is service-oriented architecture (SOA) for improving

quality?

15. How effective are certified reusable components for improving

quality?

16. How many bugs can be eliminated by inspections?

17. How many bugs can be eliminated by static analysis?

18. How many bugs can be eliminated by testing?

Software Quality: The Key to Successful Software Engineering

557

19. How many different kinds of testing are needed?

20. How many test personnel are needed?

21. How effective are test specialists compared with developers?

22. How effective is automated testing?

23. How many test cases are needed for applications of various sizes?

24. How effective is test certification in improving performance?

25. How many bug repairs will themselves include new bugs?

26. How many bugs will get delivered to users?

27. How much does it cost to improve software quality?

28. How long does it take to improve software quality?

29. How much will we save from improving software quality?

30. How much is the return on investment (ROI) for better software

quality?

This purpose of this chapter is to show the quantified results of every

major form of quality assurance activity, inspection stage, static analysis,

and testing stage on the delivered defect levels of software applications.

Defect removal comes in "private" and "public" forms. The private

forms of defect removal include desk checking, static analysis, and unit

testing. They are also covered in Chapter 8, because they concentrate

on code defects, and that chapter deals with programming and code

development.

The public forms of defect removal include formal inspections, static

analysis if run by someone other than the software engineer who wrote

the code, and many kinds of testing carried out by test specialists rather

than the developers.

Both private and public forms of defect removal are important, but

it is harder to get data on the private forms because they usually occur

with no one else being present other than the person who is doing

the desk checking or unit testing. As pointed out in Chapter 8, IBM

used volunteers to record defects found via private removal activities.

Some development methods such as Watts Humphrey's Team Software

Process (TSP) and Personal Software Process (PSP) also record private

defect removal.

This chapter will also explain how to predict the number of bugs or

defects that might occur, and how to predict defect removal efficiency

levels. Not only code bugs, but also bugs or defects in requirements,

design, and documents need to be predicted. In addition, new bugs acci-

dentally included in bug repairs need to be predicted. These are called

"bad fixes." Finally, there are also bugs or errors in test cases them-

selves, and these need to be predicted, too.

558

Chapter Nine

This chapter will discuss the best ways of measuring quality and will

caution against hazardous metrics such as "cost per defect" and "lines

of code," which distort results and conceal the real facts of software

quality. In this chapter, several critical software quality topics will be

discussed:

Defining Software Quality

■

Predicting Software Quality

■

Measuring Software Quality

■

Software Defect Prevention

■

Software Defect Removal

■

Specialists in Software Quality

■

The Economic Value of Software Quality

■

Software quality is the key to successful software engineering.

Software has long been troubled by excessive numbers of software

defects both during development and after release. Technologies are

available that can reduce software defects and improve quality by sig-

nificant amounts.

Carefully planning and selecting an effective combination of defect

prevention and defect removal activities can shorten software develop-

ment schedules, lower software development costs, significantly reduce

maintenance and customer support costs, and improve both customer

satisfaction and employee morale at the same time. Improving software

quality has the highest return on investment of any current form of

software process improvement.

As the recession continues, every company is anxious to lower both

software development and software maintenance costs. Improving soft-

ware quality will assist in improving software economics more than any

other available technology.

Defining Software Quality

A good definition for software quality is fairly difficult to achieve. There

are many different definitions published in the software literature.

Unfortunately, some of the published definitions for quality are either

abstract or off the mark. A workable definition of software quality needs

to have six fundamental features:

1. Quality should be predictable before a software application starts.

2. Quality needs to encompass all deliverables and not just the code.

3. Quality should be measurable during development.

Software Quality: The Key to Successful Software Engineering

559

4. Quality should be measurable after release to customers.

5. Quality should be apparent to customers and recognized by them.

6. Quality should continue after release, during maintenance.

Here are some of the published definitions for quality, and explana-

tions of why some of them don't seem to conform to the six criteria just

listed.

Quality Definition 1: "Quality means

conformance to requirements."

There are several problems with this definition, but the major problem

is that requirements errors or bugs are numerous and severe. Errors in

requirements constitute about 20 percent of total software defects and

are responsible for more than 35 percent of high-severity defects.

Defining quality as conformance to a major source of error is circular

reasoning, and therefore this must be considered to be a flawed and

unworkable definition. Obviously, a workable definition for quality has

to include errors in requirements themselves.

Don't forget that the famous Y2K problem originated as a specific user

requirement and not as a coding bug. Many software engineers warned

clients and managers that limiting date fields to two digits would cause

problems, but their warnings were ignored or rejected outright.

The author once worked (briefly) as an expert witness in a lawsuit

where a company attempted to sue an outsource vendor for using two-

digit date fields in a software application developed under contract.

During the discovery phase, it was revealed that the vendor cautioned

the client that two-digit date fields were hazardous, but the client

rejected the advice and insisted that the Y2K problem be included in

the application. In fact, the client's own internal standards mandated

two-digit date fields. Needless to say, the client dropped the suit when it

became evident that they themselves were the cause of the problem. The

case illustrates that "user requirements" are often wrong and sometimes

even dangerous or "toxic."

It also illustrates another point. Neither the corporate executives nor

the legal department of the plaintiff knew that the Y2K problem had

been caused by their own policies and practices. Obviously, there is a

need for better governance of software from the top when problems such

as this are not understood by corporate executives.

Using modern terminology from the recession, it is necessary to

remove "toxic requirements" before conformance can be safe. The defi-

nition of quality as "conformance to requirements" does not lead to any

significant quality improvements over time. No more requirements are

being met in 2009 than in 1979.

560

Chapter Nine

If software engineering is to become a true profession rather than an

art form, software engineers have a responsibility to help customers

define requirements in a thorough and effective manner. It is the job

of a professional software engineer to insist on effective requirements

methods such as joint application design (JAD), quality function deploy-

ment (QFD), and requirements inspections.

Far too often the literature on software quality is passive and makes

the incorrect assumption that users will be 100 percent effective in

identifying requirements. This is a dangerous assumption. User require-

ments are never complete and they are often wrong. For a software

project to succeed, requirements need to be gathered and analyzed in

a professional manner, and software engineering is the profession that

should know how to do this well.

It should be the responsibility of the software engineers to insist that

proper requirements methods be used. These include joint application

design (JAD), quality function deployment (QFD), and requirements

inspections. Other methods that benefit requirements, such as embedded

users or use-cases, might also be recommended. The users themselves

are not software engineers and cannot be expected to know optimal

ways of expressing and analyzing requirements. Ensuring that require-

ments collection and analysis are at state-of-the-art levels devolves to

the software engineering team.

Once user requirements have been collected and analyzed, then con-

formance to them should of course occur. However, before conformance

can be safe and effective, dangerous or toxic requirements have to be

weeded out, excess and superfluous requirements should be pointed

out to the users, and potential gaps that will cause creeping require-

ments should be identified and also quantified. The users themselves

will need professional assistance from the software engineering team,

who should not be passive bystanders for requirements gathering and

analysis.

Unfortunately, requirements bugs cannot be removed by ordinary

testing. If requirements bugs are not prevented from occurring, or not

removed via formal inspections, test cases that are constructed from the

requirements will confirm the errors and not find them. (This is why

years of software testing never found and removed the Y2K problem.)

A second problem with this definition is that it is not predictable

during development. Conformance to requirements can be measured

after the fact, but that is too late for cost-effective recovery.

A third problem with this definition is that for brand-new kinds of

innovative applications, there may not be any users other than the

original inventor. Consider the history of successful software innovation

such as the APL programming language, the first spreadsheet, and the

early web search engine that later became Google.

Software Quality: The Key to Successful Software Engineering

561

These innovative applications were all created by inventors to solve

problems that they themselves wanted to solve. They were not created

based on the normal concept of "user requirements." Until prototypes

were developed, other people seldom even realized how valuable the

inventions would be. Therefore, "user requirements" are not completely

relevant to brand-new inventions until after they have been revealed

to the public.

Given the fact that software requirements grow and change at mea-

sured rates of 1 percent to more than 2 percent every calendar month

during the subsequent design and coding phases, it is apparent that

achieving a full understanding of requirements is a difficult task.

Software requirements are important, but the combination of toxic

requirements, missing requirements, and excess requirements makes

simplistic definitions such as "quality means conformance to require-

ments" hazardous to the software industry.

Quality Definition 2: "Quality means

reliability, portability, and many other -ilities."

The problem with defining quality as a set of words ending with ility is

that many of these factors are neither predictable before they occur nor

easily measurable when they do occur.

While most of the -ility words are useful properties for software

applications, some don't seem to have much to do with quality as we

would consider the term for a physical device such as an automobile or

a toaster. For example, "portability" may be useful for a software vendor,

but it does not seem to have much relevance to quality in the eyes of a

majority of users.

The use of -ility words to define quality does not lead to quality

improvements over time. In 2009, the software industry is no better in

terms of many of these -ilities than it was in 1979. Using modern lan-

guage from the recession, many of the -ilities are "subprime" definitions

that don't prevent serious quality failures. In fact, using -ilities rather

than focusing on defect prevention and removal slows down progress

on software quality control.

Among the many words that are cited when using this definition can

be found (in alphabetical order):

1. Augmentability

2. Compatibility

3. Expandability

4. Flexibility

5. Interoperability

562

Chapter Nine

6. Maintainability

7. Manageability

8. Modifiability

9. Operability

10. Portability

11. Reliability

12. Scalability

13. Survivability

14. Understandability

15. Usability

16. Testability

17. Traceability

18. Verifiability

Of the words on this list, only a few such as "reliability" and "test-

ability" seem to be relevant to quality as viewed by users. The other

terms range from being obscure (such as "survivability") to useful but

irrelevant (such as "portability"). Other terms may be of interest to the

vendor or development team, but not to customers (such as "maintain-

ability").

The -ility words seem to have an academic origin because they don't

really address some of the real-world quality issues that bother cus-

tomers. For example, none of these terms addresses ease or difficulty

of reaching customer support to get help when a bug is noted or the

software misbehaves. None of the terms deals with the speed of fixing

bugs and providing the fix to users in a timely manner.

The new Information Technology Infrastructure Library (ITIL) does a

much better job of dealing with issues of quality in the eyes of users, such

as customer support, incident management, and defect repairs intervals

than does the standard literature dealing with software quality.

More seriously, the list of -ility words ignores two of the main topics

that have a major impact on software quality when the software is

finally released to customers: (1) defect potentials and (2) defect removal

efficiency levels.

The term defect potential refers to the total quantity of defects that

will likely occur when designing and building a software application.

Defect potentials include bugs or defects in requirements, design, code,

user documents, and bad fixes or secondary defects. The term defect

removal efficiency refers to the percentage of defects found by any

sequence of inspection, static analysis, and test stages.

Software Quality: The Key to Successful Software Engineering

563

To reach acceptable levels of quality in the view of customers, a com-

bination of low defect potentials and high defect removal efficiency rates

(greater than 95 percent) is needed. The current U.S. average for soft-

ware quality is a defect potential of about 5.0 bugs per function point

coupled with 85 percent defect removal efficiency. This combination

yields a total of delivered defects of about 0.75 per function point, which

the author regards as unprofessional and unacceptable.

Defect potentials need to drop below 2.5 per function point and defect

removal efficiency needs to average greater than 95 percent for software

engineering to be taken seriously as a true engineering discipline. This

combination would result in a delivered defect total of only 0.125 defect per

function point or about one-sixth of today's averages. Achieving or exceed-

ing this level of quality is possible today in 2009, but seldom achieved.

One of the reasons that good quality is not achieved as widely as it

might be is that concentrating on the -ility topics rather than measuring

defects and defect removal efficiency leads to gaps and failures in defect

removal activities. In other words, the -ilities definitions of quality are

a distraction from serious study of software defect causes and the best

methods of preventing and removing software defects.

Specific levels of defect potentials and defect removal efficiency levels

could be included in outsource agreements. These would probably be

more effective than current contracting practices for quality, which are

often nonexistent or merely insist on a certain CMMI level.

If software is released with excessive quantities of defects so that it

stops, behaves erratically, or runs slowly, it will soon be discovered that

most of the -ility words fall by the wayside.

Defect quantities in released software tend to be the paramount qual-

ity issue with users of software applications, coupled with what kinds of

corrective actions the software vendor will take once defects are reported.

This brings up a third and more relevant definition of software quality.

Quality Definition 3: "Quality is the absence

of defects that would cause an application to

stop working or to produce incorrect results."

A software defect is a bug or error that causes software to either stop

operating or to produce invalid or unacceptable results. Using IBM's

severity scale, defects have four levels of severity:

Severity 1 means that the software application does not work at all.

■

Severity 2 means that major functions are disabled or produce incor-

■

rect results.

Severity 3 means that there are minor issues or minor functions are

■

not working.

Severity 4 means a cosmetic problem that does not affect operation.

■

564

Chapter Nine

There is some subjectivity with these defect severity levels because

they are assigned by human beings. Under the IBM model, the ini-

tial severity level is assigned when the bug is first reported, based on

symptoms described by the customer or user who reported the defect.

However, a final severity level is assigned by the change team when the

defect is repaired.

This definition of quality is one favored by the author for several reasons.

First, defects can be predicted before they occur and measured when they

do occur. Second, customer satisfaction surveys for many software applica-

tions appear to correlate more closely to delivered defect levels than to any

other factor. Third, many of the -ility factors also correlate to defects, or to

the absence of defects. For example, reliability correlates exactly to the

number of defects found in software. Usability, testability, traceability, and

verifiability also have indirect correlations to software defect levels.

Measuring defect volumes and defect severity levels and then taking

effective steps to reduce those volumes via a combination of defect pre-

vention and defect removal activities is the key to successful software

engineering.

This definition of software quality does lead to quality improvements

over time. The companies that measure defect potentials, defect removal

efficiency levels, and delivered defects have improved both factors by

significant amounts. This definition of quality supports process improve-

ments, predicting quality, measuring quality, and customer satisfaction

as measured by surveys.

Therefore, companies that measure quality such as IBM, Dov�l

Technologies, and AT&T have made progress in quality control. Also,

methods that integrate defect tracking and reporting such as Team

Software Process (TSP) have made significant progress in reducing

delivered defects. This is also true for some open-source applications

that have added static-analysis to their suite of defect removal tools.

Defect and removal efficiency measures have been used to validate

the effectiveness of formal inspections, show the impact of static analy-

sis, and fine-tune more than 15 kinds of testing. The subjective mea-

sures have no ability to deal with such issues.

Every software engineer and every software project manager should

be trained in methods for predicting software defects, measuring soft-

ware defects, preventing software defects, and removing software

defects. Without knowledge of effective quality and defect control, soft-

ware engineering is a hoax.

The full definition of quality suggested by the author includes these

nine factors:

1. Quality implies low levels of defects when software is deployed,

ideally approaching zero defects.

Software Quality: The Key to Successful Software Engineering

565

2. Quality implies high reliability, or being able to run without stop-

page or strange and unexpected results or sluggish performance.

3. Quality implies high levels of user satisfaction when users are sur-

veyed about software applications and its features.

4. Quality implies a feature set that meets the normal operational

needs of a majority of customers or users.

5. Quality implies a code structure and comment density that minimize

bad fixes or accidentally inserting new bugs when attempting to repair

old bugs. This same structure will facilitate adding new features.

6. Quality implies effective customer support when problems do occur,

with minimal difficulty for customers in contacting the support

team and getting assistance.

7. Quality implies rapid repairs of known defects, and especially so

for high-severity defects.

8. Quality should be supported by meaningful guarantees and war-

ranties offered by software developers to software users.

9. Effective definitions of quality should lead to quality improvements.

This means that quality needs to be defined rigorously enough so

that both improvements and degradations can be identified, and

also averages. If a definition for quality cannot show changes or

improvements, then it is of very limited value.

The 6th, 7th, 8th, and 9th of these quality issues tend to be sparsely

covered by the literature on software quality, other than the new ITIL

books. Unfortunately, the ITIL coverage is used only for internal software

applications and is essentially ignored by commercial software vendors.

The definition of quality as an absence of defects, combined with sup-

plemental topics such as ease of customer support and maintenance

speed, captures the essence of quality in the view of many software

users and customers.

Consider how the three definitions of quality discussed in this chapter

might relate to a well-known software product such as Microsoft Vista.

Vista has been selected as an example because it is one of the best-

known large software applications in the world, and therefore a good

test bed for trying out various quality definitions.

Applying Definition 1 to Vista: "Quality

means conformance to requirements."

The first definition would be hard to use for Vista, since no ordinary cus-

tomers were asked what features they wanted in the operating system,

although focus groups were probably used at some point.

566

Chapter Nine

If you compare Vista with XP, Leopard, or Linux, it seems to include

a superabundance of features and functions, many of which were

neither requested nor ever used by a majority of users. One topic

that the software engineering literature does not cover well, or at

all, is that of overstuffing applications with unnecessary and useless

features.

Most people know that ordinary requirements usually omit about

20 percent of functions that users want. However, not many people

know that for commercial software put out by companies such as

Microsoft, Symantec, Computer Associates, and the like, applications

may have more than 40 percent features that customers don't want and

never use.

Feature stuffing is essentially a competitive move to either imitate

what competitors do, or to attempt to pull ahead of smaller competi-

tors by providing hundreds of costly but marginal features that small

competitors could not imitate. In either case, feature stuffing is not a

satisfactory conformance to user requirements.

Further, certain basic features such as security and performance,

which users of operating systems do appreciate, are not particularly

well embodied in Vista.

The bottom line is that defining quality as conformance to require-

ments is almost useless for applications with greater than 1 million

users such as Vista, because it is impossible to know what such a large

group will want or not want.

Also, users seldom are able to articulate requirements in an effective

manner, so it is the job of professional software engineers to help users

in defining requirements with care and accuracy. Too often the software

literature assumes that software engineers are only passive observers of

user requirements, when in fact, software engineers should be playing

the role of physicians who are diagnosing medical conditions in order

to prescribe effective therapies.

Physicians don't just passively ask patients what the problem is and

what kind of medicine they want to take. Our job as software engineers

is to have professional knowledge about effective requirement gather-

ing and analysis methods (i.e., like medical diagnostic tests) and to also

know what kinds of applications might provide effective "therapies" for

user needs.

Passively waiting for users to define requirements without assisting

them in using joint application design (JAD) or quality function deploy-

ment (QFD) or data mining of legacy applications is unprofessional on

the part of the software engineering community. Users are not trained

in requirements definition, so we need to step up to the task of assist-

ing them.

Software Quality: The Key to Successful Software Engineering

567

Applying Definition 2 to Vista: "Quality

means adherence to -ility terms."

When Vista is judged by matching its features against the list of -ility

terms shown earlier, it can be seen how abstract and difficult to apply

such a list really is

Augmentability

Ambiguous and difficult to apply to Vista

Compatibility

Poor for Vista; many old applications don't work

Expandability

Applicable to Vista and fairly good

Flexibility

Ambiguous and difficult to apply to Vista

Interoperability

Ambiguous and difficult to apply to Vista

Maintainability

Unknown to users but probably poor for Vista

Manageability

Ambiguous and difficult to apply to Vista

Modifiability

Unknown to users but probably poor for Vista

Operability

Ambiguous and difficult to apply to Vista

10.

Portability

Poor for Vista

11.

Reliability

Originally poor for Vista but improving

12.

Scalability

Marginal for Vista

13.

Survivability

Ambiguous and difficult to apply to Vista

14.

Understandability

Poor for Vista

15.

Usability

Asserted to be good for Vista, but questionable

16.

Testability

Poor for Vista: complexity far too high

17.

Traceability

Poor for Vista: complexity far too high

18.

Verifiability

Ambiguous and difficult to apply to Vista

The bottom line is that more than half of the -ility words are difficult

or ambiguous to apply to Vista or any other commercial software appli-

cation. Of the ones that can be applied to Vista, the application does not

seem to have satisfied any of them but expandability and usability.

Many of the -ility words cannot be predicted nor can they be mea-

sured. Worse, even if they could be predicted and measured, they are of

marginal interest in terms of serious quality control.

Applying Definition 3 to Vista: "Quality means

an absence of defects, plus corollary factors."

Released defects can and should be counted for every software applica-

tion. Other related topics such as ease of reporting defects and speed of

repairing defects should also be measured.

Unfortunately, for commercial software, not all of these nine topics

can be evaluated. Microsoft together with many other software vendors

does not publish data on bad-fix injections or even on total numbers

568

Chapter Nine

of bugs reported. However, six of the eight factors can be evaluated by

means of journal articles and limited Microsoft data.

1. Vista was released with hundreds or thousands of defects, although

Microsoft will not provide the exact number of defects found and

reported by users.

2. At first Vista was not very reliable, but achieved acceptable reli-

ability after about a year of usage. Microsoft does not report data

on mean time to failure or other measures of reliability.

3. Vista never achieved high levels of user satisfaction compared with

XP. The major sources of dissatisfaction include lack of printer driv-

ers, poor compatibility with older applications, excessive resource

usage, and sluggish performance on anything short of high-end

computer chips and lots of memory.

4. The feature set of Vista has been noted as adequate in customer

surveys, other than excessive security vulnerabilities.

5. Microsoft does not release statistics on bad-fix injections or on num-

bers of defect reports, so this factor cannot be known by the general

public.

6. Microsoft customer support is marginal and troublesome to access

and use. This is a common failing of many software vendors.

7. Some known bugs have remained in Microsoft Vista for several

years. Microsoft is marginally adequate in defect repair speed.

8. There is no effective warranty for Vista (or for other commercial

applications). Microsoft's end-user license agreement (EULA)

absolves Microsoft of any liabilities other than replacing a defec-

tive disk.

9. Microsoft's new operating system is not yet available as this book

is published, so it is not possible to know if Microsoft has used

methods that will yield better quality than Vista. However, since

Microsoft does have substantial internal defect tracking and quality

assurance methods, hopefully quality will be better. Microsoft has

shown some improvements in quality over time.

Based on this pattern of analysis for the nine factors, it cannot be said

that Vista is a high-quality application under any of the definitions. Of

the three major definitions, defining quality as conformance to require-

ments is almost impossible to use with Vista because with millions of

users, nobody can define what everybody wants.

The second definition of quality as a string of -ility words is difficult

to apply, and many are irrelevant. These words might be marginally

useful for small internal applications, but are not particularly helpful

Software Quality: The Key to Successful Software Engineering

569

for commercial software. Also, many key quality issues such as cus-

tomer support and maintenance repair times are not found in any of

the -ility words.

The third definition that centers on defects, customer support, defect

repairs, and better warranties seems to be the most relevant. The third

also has the advantage of being both predictable and measurable, which

the first two lack.

Given the high costs of commercial software, the marginal or use-

less warranties of commercial software, and the poor customer sup-

port offered by commercial software vendors, the author would favor

mandatory defect reporting that required commercial vendors such as

Microsoft to produce data on defects reported by customers, sorted by

severity levels.

Mandatory defect reporting is already a requirement for many prod-

ucts that affect human life or safety, such as medicines, aircraft engines,

automobiles, and many other consumer products. Mandatory reporting

of business and financial information is also required. Software affects

human life and safety in critical ways, and it affects business operations

in critical ways, but to date software has been exempt from serious study

due to the lack of any mandate for measuring and reporting released

defect levels.

Somewhat surprisingly, the open-source software community appears

to be pulling ahead of old-line commercial software vendors in terms

of measuring and reporting defects. Many open-source companies have

added defect tracking and static-analysis tools to their quality arsenal,

and are making data available to customers that is not available from

many commercial software vendors.

The author would also favor a "lemon law" for commercial software

similar to the lemon law for automobiles. If serious defects occur that

users cannot get repaired when making good-faith effort to resolve the

situation with vendors, vendors should be required to return the full

purchase or lease price of the offending software application.

A form of lemon law might also be applied to outsource contracts,

except the litigation already provides relief for outsource failures that

cannot be used against commercial software vendors due to their one-

sided EULA agreements, which disclaim any responsibility for quality

other than media replacement.

No doubt software vendors would object to both mandatory defect

tracking and also to a lemon law. But shrewd and farsighted vendors

would soon perceive that both topics offer significant competitive advan-

tages to software companies that know how to control quality. Since

high-quality software is also cheaper and faster to develop and has

lower maintenance costs than buggy software, there are even more

important economic advantages for shrewd vendors.

570

Chapter Nine

The author hypothesizes that a combination of mandatory defect

reporting by software vendors plus a lemon law would have the effect

of improving software quality by about 50 percent every five years for

perhaps a 20-year period.

Software quality needs to be taken much more seriously than it has

been. Now that the recession is expanding, better software quality con-

trol is one of the most effective strategies for lowering software costs.

But effective quality control depends on better measures of quality

and on proven combinations of defect prevention and defect removal

activities.

Quality prediction, quality measurement, better defect prevention,

and better defect removal are on the critical path for advancing software

engineering to the status of a true engineering discipline instead of

a craft or art form as it is today in 2009.

Defining and Predicting Software Defects

If delivered defects are the main quality problem for software, it is

important to know what causes these defects, so that they can be pre-

vented from occurring or removed before delivery.

The software quality literature includes a great deal of pedantic

bickering about various terms such as "fault," "error," "bug," "defect"

and many other terms. For this book, if software stops working, won't

load, operates erratically, or produces incorrect results due to mis-

takes in its own code, then that is called a "defect." (This same defi-

nition has been used in 14 of the author's previous books and also in

more than 30 journal articles. The author's first use of this definition

started in 1978.)

However, in the modern world, the same set of problems can occur

without the developers or the code being the cause. Software infected

by a virus or spyware can also stop working, refuse to load, operate

erratically, and produce incorrect results. In today's world, some defect

reports may well be caused by outside attacks.

Attacks on software from hackers are not the same as self-inflicted

defects, although successful attacks do imply security vulnerabilities.

In this book and the author's previous books, software defects have

five main points of origin:

1. Requirements

2. Design

3. Code

4. User documents

5. Bad fixes (new defects due to repairs of older defects)

Software Quality: The Key to Successful Software Engineering

571

Because the author worked for IBM when starting research on quality,

the IBM severity scale for classifying defect severity levels is used in this

book and the author's previous books. There are four severity levels:

Severity 1: Software does not operate at all

■

Severity 2: Major features disabled or incorrect

■

Severity 3: Minor features disabled or incorrect

■

Severity 4: Cosmetic error that does not affect operation

■

There are other methods of classifying severity levels, but these four

are the most common due to IBM introducing them in the 1960s, so they

became a de facto standard.

Software defects have seven kinds of causes, with the major causes

including

Errors of omission:

Something needed was accidentally left out

Errors of commission:

Something needed is incorrect

Errors of ambiguity:

Something is interpreted in several ways

Errors of performance:

Some routines are too slow to be useful

Errors of security:

Security vulnerabilities allow attacks from outside

Errors of excess:

Irrelevant code and unneeded features are included

Errors of poor removal:

Defects that should easily have been found

These seven causes occur with different frequencies for different

deliverables. For paper documents such as requirements and design,

errors of ambiguity are most common, followed by errors of omission.

For source code, errors of commission are most common, followed by

errors of performance and security.

The seventh category, "errors of poor removal," would require root-cause

analysis for identification. The implication is that the defect was neither

subtle nor hard to find, but was missed because test cases did not cover the

code segment or because of partial inspections that overlooked the defect.

In a sense, all delivered defects might be viewed as errors of poor

removal, but it is important to find out why various kinds of inspection,

static analysis, or testing missed obvious bugs. This category should not

be assigned for subtle defects, but rather for obvious defects that should

have been found but for some reason escaped to the outside world.

The main reason for including errors of poor removal is to encourage

more study and research on the effectiveness of various kinds of defect

removal operations. More solid data is needed on the removal efficiency

levels of inspections, static analysis, automatic testing, and all forms of

manual testing.

The combination of defect origins, defect severity, and defect causes

provides a useful taxonomy for classifying defects for statistical analysis

or root-cause analysis. For example, the Y2K problem was cited earlier

572

Chapter Nine

in this chapter. In its most common manifestation, the Y2K problem

might have this description using the taxonomy just discussed:

Y2K origin:

Requirements

Y2K severity:

Severity 2 major features disabled

Y2K primary cause:

Error of commission

Y2K secondary cause:

Error of poor removal

Note that this taxonomy allows the use of primary and secondary fac-

tors since sometimes more than one problem is behind having a defect

in software.

Note also that the Y2K problem did not have the same severity for

every application. An approximate distribution of Y2K severity levels

for several hundred applications noted that the software stopped in

about 15 percent of instances, which are severity 1 problems; it created

severity 2 problems in about 50 percent; it created severity 3 problems

in about 25 percent; and had no operational consequences in about 10

percent of the applications in the sample.

To know the origin of a defect, some research is required. Most defects

are initially found because the code stops working or produces erratic

results. But it is important to know if upstream problems such as

requirements or design issues are the true cause. Root-cause analysis

can find the true causes of software defects.

Several other factors should be included in a taxonomy for tracking

defects. These include whether a reported defect is valid or invalid.

(Invalid defects are common and fairly expensive, since they still require

analysis and a response.) Another factor is whether a defect report is

new and unique, or merely a duplicate of a prior defect report.

For testing and static analysis, the category of "false positives" needs

to be included. A false positive is the mistaken identification of a code

segment that initially seems to be incorrect, but which later research

reveals is actually correct.

A third factor deals with whether the repair team can make the same

problem occur on their own systems, or whether the defect was caused

by a unique configuration on the client's system. When defects cannot be

duplicated, they were termed abeyant defects by IBM, since additional

information needed to be collected to solve the problem.

Adding these additional topics to the Y2K example would result in

an expanded taxonomy:

Y2K origin:

Requirements

Y2K validity:

Valid defect report

Y2K uniqueness:

Duplicate (this problem was reported millions of times)

Y2K severity:

Severity 2 major features disabled

Y2K primary cause:

Error of commission

Y2K secondary cause:

Error of poor removal

Software Quality: The Key to Successful Software Engineering

573

When defects are being counted or predicted, it is useful to have a

standard metric for normalizing the results. As discussed in Chapter 5,

there are at least ten candidates for such a normalizing metric, including

function points, story points, use-case points, lines of code, and so on.

In this book and also in the author's previous books, the function

point metric defined by the International Function Point Users Group

(IFPUG) is used to quantify and normalize data for both defects and

productivity.

There are several reasons for using IFPUG function points. The most

important reason in terms of measuring software defects is that non-

code defects in requirements, design, and documents are major defect

sources and cannot be measured using the older "lines of code" metric.

Another important reason is that all of the major benchmark data

collections for productivity and quality use function point metrics, and

data expressed via IFPUG function points composes about 85 percent

of all known benchmarks.

It is not impossible to use other metrics for normalization, but if

results are to be compared against industry benchmarks such as those

published by the International Software Benchmarking Standards

Group (ISBSG), the IFPUG function points are the most convenient.

Later in the discussion of defect prediction, examples will be given of

using other metrics in addition to IFPUG function points.

It is interesting to combine the origin, severity, and cause factors to

examine the approximate frequency of each.

Table 9-1 shows the combination of these factors for software applica-

tions during development. Therefore, Table 9-1 shows defect potentials,

or the probable numbers of defects that will be encountered during

development and after release. Only severity 1 and severity 2 defects

are shown in Table 9-1.

Data on defect potentials is based on long-range studies of defects and

defect removal efficiency carried out by organizations such as the IBM

Software Quality Assurance groups, which have been studying software

quality for more than 35 years.

TABLE 9-1

Overview of Software Defect Potentials

Defect

Defects per

Severity 1

Severity 2

Most Frequent

Origins

Function Point

Defects

Defect Cause

Requirements

1.00

11.00%

15.00%

Omission

Design

1.25

15.00%

20.00%

Omission

Code

1.75

70.00%

57.00%

Commission

Documents

0.60

1.00%

Ambiguity

Bad fixes

0.40

3.00%

7.00%

Commission

TOTAL

5.00

100.00%

Omission

574

Chapter Nine

Other corporations such as AT&T, Coverity, Computer Aid Inc. (CAI),

Dov�l Technologies, Motorola, Software Productivity Research (SPR),

Galorath Associates, the David Consulting Group, the Quality and

Productivity Management Group (QPMG), Unisys, Microsoft, and the

like, also carry out long-range studies of defects and removal efficiency

levels.

Most such studies are carried out by corporations rather than uni-

versities because academia is not really set up to carry out longitudinal

studies that may last more than ten years.

While coding bugs or coding defects are the most numerous during

development, they are also the easiest to find and to get rid of. A

combination of inspections, static analysis, and testing can wipe out

more than 95 percent of coding defects and sometimes top 99 percent.

Requirements defects and bad fixes are the toughest categories of defect

to eliminate.

Table 9-2 uses Table 9-1 as a starting point, but shows the latent

defects that will still be present when the software application is deliv-

ered to users. Table 9-2 shows approximate U.S. averages circa 2009.

Note the variations in defect removal efficiency by origin.

It is interesting that when the software is delivered to clients, require-

ments defects are the most numerous, primarily because they are the

most difficult to prevent and also the most difficult to find. Only formal

requirements-gathering methods combined with formal requirements

inspections can improve the situation for finding and removing require-

ments defects.

If not prevented or removed, both requirements bugs and design bugs

eventually find their way into the code. These are not coding bugs per

se, such as branching to a wrong address, but more serious and deep-

seated kinds of bugs or defects.

It was noted earlier in this chapter that requirements defects cannot

be found and removed by means of testing. If a requirements defect is

not prevented or removed via inspection, all test cases created using the

requirements will confirm the defect and not identify it.

TABLE 9-2

Overview of Delivered Software Defects

Defect

Defects per

Removal

Delivered Defects per

Most Frequent

Origins

Function Point

Efficiency

Function Point

Defect Cause

Requirements

1.00

70.00%

0.30

Commission

Design

1.25

85.00%

0.19

Commission

Code

1.75

95.00%

0.09

Commission

Documents

0.60

91.00%

0.05

Omission

Bad fixes

0.40

70.00%

0.12

Commission

TOTAL

5.00

85.02%

0.75

Commission

Software Quality: The Key to Successful Software Engineering

575

Since Table 9-2 reflects approximate U.S. averages, the methods

assumed are those of fairly careless requirements gathering: water-

fall development, CMMI level 1, no formal inspections of requirements,

design, or code; no static analysis; and using only five forms of testing:

(1) unit test, (2) new function test, (3) regression test, (4) system test,

and (5 )acceptance test.

Note also that during development, requirements will continue to

grow and change at rates of 1 percent to 2 percent every calendar month.

These changing requirements have higher defect potentials than the

original requirements and lower levels of defect removal efficiency. This

is yet another reason why requirements defects cause more problems

than any other defect origin point.

Software requirements are the most intractable source of software

defects. However, methods such as joint application design (JAD), qual-

ity function deployment (QFD), Six Sigma analysis, root-cause analy-

sis, embedding users with the development team as practiced by Agile

development, prototypes, and the use of formal requirements inspec-

tions can assist in bringing requirements defects under control.

Table 9-3 shows what quality might look like if an optimal combina-

tion of defect prevention and defect removal activities were utilized.

Table 9-3 assumes formal requirements methods, rigorous development

such as practiced using the Team Software Process (TSP) or the higher

CMMI levels, prototypes and JAD, formal inspections of all deliverables,

static analysis of code, and a full set of eight testing stages: (1) unit

test, (2) new function test, (3) regression test, (4) performance test, (5)

security test, (6) usability test, (7) system test, and (8) acceptance test.

Table 9-3 also assumes a software quality assurance (SQA) group

and rigorous reporting of software defects starting with requirements,

continuing through inspections, static analysis and testing, and out

into the field with multiple years of customer-reported defects, main-

tenance, and enhancements. Accumulating data such as that shown in

Tables 9-1 through 9-3 requires longitudinal data collection that runs

for many years.

TABLE 9-3

Optimal Defect Prevention and Defect Removal Activities

Defect

Defects per

Removal

Delivered Defects per Most Frequent

Origins

Function Point

Efficiency

Function Point

Defect Cause

Requirements

0.50

95.00%

0.03

Omission

Design

0.75

97.00%

0.02

Omission

Code

0.50

99.00%

0.01

Commission

Documents

0.40

96.00%

0.02

Omission

Bad fixes

0.20

92.00%

0.02

Commission

TOTAL

2.35

96.40%

0.08

Omission

576

Chapter Nine

This combination has the effect of cutting defect potentials by more

than 50 percent and of raising cumulative defect removal efficiency from

today's average of 85 percent up to more than 96 percent.

It might be possible to even exceed the results shown in Table 9-3,

but doing so would require additional methods such as the availability

of a full suite of certified reusable materials.

Tables 9-2 and 9-3 are oversimplifications of real-life results. Defect

potentials vary with the size of the application and with other factors.

Defect removal efficiency levels also vary with application size. Bad-fix

injections also vary by defect origins. Both defect potentials and defect

removal efficiency levels vary by methodology, by CMMI levels, and by

other factors as well. These will be discussed later in the section of this

chapter dealing with defect prediction.

Because of the many definitions of quality used by the industry, it is

best to start by showing what is predictable and measurable and what is

not. To sort out the relevance of the many quality definitions, the author

has developed a 10-point scoring method for software quality factors.

If a factor leads to improvement in quality, its maximum score is 3.

■

If a factor leads to improvement in customer satisfaction, its maxi-

■

mum score is 3.

If a factor leads to improvement in team morale, its maximum score

■

is 2.

If a factor is predictable, its maximum score is 1.

■

If a factor is measurable, its maximum score is 1.

■

The total maximum score is 10.

■

The lowest possible score is 0.

■

Table 9-4 lists all of the quality factors discussed in this chapter in

rank order by using the scoring factor just outlined. Table 9-4 shows

whether a specific quality factor is measurable and predictable, and

also the relevance of the factor to quality as based on surveys of soft-

ware customers. It also includes a weighted judgment as to whether

the factor has led to improvements in quality among the organizations

that use it.

The quality definitions with a score of 10 have been the most effec-

tive in leading to quality improvements over time. As a rule, the quality

definitions scoring higher than 7 are useful. However, the quality defi-

nitions that score below 5 have no empirical data available that shows

any quality improvement at all.

While Table 9-4 is somewhat subjective, at least it provides a math-

ematical basis for scoring the relevance and importance of the rather

Software Quality: The Key to Successful Software Engineering

577

TABLE 9-4

Rank Order of Quality Factors by Importance to Quality

Measurable

Predictable

Relevance to

Property?

Quality

Score

Best Quality Definitions

Defect potentials

Yes

Very high

10.00

Defect removal efficiency

Yes

Very high

10.00

Defect severity levels

Yes

Very high

10.00

Defect origins

Yes

Very high

10.00

Reliability

Yes

Very high

10.00

Good Quality Definitions

Toxic requirements

Yes

Very high

9.50

Missing requirements

Yes

Very high

9.50

Requirements conformance

Yes

Very high

9.00

Excess requirements

Yes

Medium

9.00

Usability

Yes

Very high

8.00

Testability

Yes

High

8.00

Defect causes

Yes

Very high

8.00

Fair Quality Definitions

Maintainability

Yes

High

7.00

Understandability

Yes

Medium

6.00

Traceability

Yes

Low

6.00

Modifiability

Yes

Medium

5.00

Verifiability

Yes

Medium

5.00

Poor Quality Definitions

Portability

Yes

Low

4.00

Expandability

Yes

Low

3.00

Scalability

Yes

Low

2.00

Interoperability

Yes

Low

1.00

Survivability

Yes

Low

1.00

Augmentability

Low

0.00

Flexibility

Low

0.00

Manageability

Low

0.00

Operability

Low

0.00

vague and ambiguous collection of quality factors used by the software

industry. In essence, Table 9-4 makes these points:

1. Conformance to requirements is hazardous unless incorrect, toxic,

or dangerous requirements are weeded out. This definition has not

demonstrated any improvements in quality for more than 30 years.

2. Most of the -ility quality definitions are hard to measure, and many

are of marginal significance. Some are not measurable either. None

of the -ility words tend to lead to tangible quality gains.

578

Chapter Nine

3. Quantification of defect potentials and defect removal efficiency

levels have had the greatest impact on improving quality and also

the greatest impact on customer satisfaction levels.

If software engineering is to evolve from a craft or art form into a true

engineering field, it is necessary to put quality on a firm quantitative

basis and to move away from vague and subjective quality definitions.

These will still have a place, of course, but they should not be the pri-

mary definitions for software quality.

Predicting Software Defect Potentials

To predict software quality, it is necessary to measure software quality.

Since companies such as IBM have been doing this for more than 40

years, the best available data comes from companies that have full life-

cycle quality measurement programs that start with requirements, con-

tinue through development, and then extend out to customer-reported

defects for as long as the software is used, which may be 25 years or

more. The next best source of data comes from benchmark and com-

mercial software estimating tool companies, since they collect historical

data on quality as well as on productivity.

Because software defects come from five different sources, the quick-

est way to get a useful approximation of software defect potentials is to

use IFPUG function point metrics.

The basic sizing rule for predicting defect potentials with function

point is: Take the size of a software application in function points and

raise it to the 1.25 power. The result will be a useful approximation

of software defect potentials for applications between a low of about

10 function points and a high of about 5000 function points.

The exponent for this rule of thumb would need to be adjusted down-

wards for the higher CMMI levels, Agile, RUP, and the Team Software

Process (TSP). But since the rule is intended to be applied early, before

any costs are expended, it still provides a useful starting point. Readers

might want to experiment with local data and find an exponent that

gives useful results against local quality and defect data.

Table 9-5 shows approximate U.S. averages for defect potentials.

Recall that defect potentials are the sum of five defect origins: require-

ments defects, design defects, code defects, document defects, and bad-

fix injections.

As can be seen from Table 9-5, defect potentials increase with applica-

tion size. Of course, other factors can reduce or increase the potentials,

as will be discussed later in the section on defect prevention.

While the total defect potential is useful, it is also useful to know

the distribution of defects among the five origins or sources. Table 9-6

Software Quality: The Key to Successful Software Engineering

579

TABLE 9-5

U.S. Averages for Software Defect Potentials

Size in FP

Defects per

Defect

Function Points

Function Point

Potentials

1.50

2.34

100

3.04

304

1,000

4.62

4,621

10,000

6.16

61,643

100,000

7.77

777,143

1,000,000

8.56

8,557,143

Average

4.86

1,342,983

illustrates typical defect distribution percentages using approximate

average values.

Applying the distribution shown in Table 9-6 to a sample application

of 1500 function points, Table 9-7 illustrates the approximate defect

potential, or the total number of defects that might be found during

development and by customers.

These simple overall examples are not intended as substitutes for

commercial quality estimation tools such as KnowledgePlan and SEER,

which can adjust their predictions based on CMMI levels; development

methods such as Agile, TSP, or RUP; use of inspections; use of static

analysis; and other factors which would cause defect potentials to vary

and also which cause defect removal efficiency levels to vary.

Rules of thumb are never very accurate, but their convenience and

ease of use provide value for rough estimates and early sizing. However,

such rules should not be used for contracts or serious estimates.

Predicting Code Defects

Using function point metrics as an overall tool for quality prediction is

useful because noncoding defects outnumber code defects. That being

said, there are more coding defects than any other single source.

TABLE 9-6

Percentages of Defects by Origin

Defects per

Percent of Total

Defect Origins

Function Point

Defects

Requirements

1.00

20.00%

Design

1.25

25.00%

Source code

1.75

35.00%

User documents

0.60

12.00%

Bad fixes

0.40

8.00%

TOTAL

5.00

100.00%

580

Chapter Nine

TABLE 9-7

Defect Potentials for a Sample Application

(Application size = 1500 Function Points)

Defects per

Defect

Percent of

Defect Origins

Function Point

Potentials

Total Defects

Requirements

1.00

1,500

20.00%

Design

1.25

1,875

25.00%

Source code

1.75

2,625

35.00%

User documents

0.60

900

12.00%

Bad fixes

0.40

600

8.00%

TOTAL

5.00

7,500

100.00%

Predicting code defects is fairly tricky for six reasons:

1. More than 2,500 programming languages are in existence, and they

are not equal as sources of defects.

2. A majority of modern software applications use more than one language,

and some use as many as 15 different programming languages.

3. The measured range of performance by a sample of programmers

using the same language for the same test application varies by

more than 10 to 1. Individual skills and programming styles create

significant variations in the amount of code written for the same

problem, in defect potentials, and also in productivity.

4. Lines of code can be counted using either physical lines or logical

statements. For some languages, the two counts are identical, but

for others, there may be as much as a 500 percent variance between

physical and logical counts.

5. For a number of languages starting with Visual Basic, some program-

ming is done by means of buttons or pull-down menus. Therefore,

programming is done without using procedural source code. There are

no effective rules for counting source code with such languages.

6. Reuse of source code from older applications or from libraries of

reusable code is quite common. If the reused code is certified, it will

have very few defects compared with new custom code.

To predict coding defects, it is necessary to know the level of a pro-

gramming language. The concept of the level of a language is often used

informally in phrases such as "high-level" or "low-level" languages.

Within IBM in the 1970s, when research was first carried out on

predicting code defects, it was necessary to give a formal mathematical

definition to language levels. Within IBM the level was defined as the

number of statements in basic assembly language needed to equal the

functionality of 1 statement in a higher-level language.

Software Quality: The Key to Successful Software Engineering

581

Using this definition, COBOL was a level 3 language, because it took

3 basic assembly statements to equal 1 COBOL statement. Using the

same rule, SMALLTALK is a level 15 language.

(For several years before function points were invented, IBM used

"equivalent assembly statements" as the basis for estimating non-code

work such as user manuals. Thus, instead of basing a publication budget

on 10 percent of the effort for writing a program in PL/S, the budget

would be based on 10 percent of the effort if the code were basic assem-

bly language. This method was crude but reasonably effective.)

Dissatisfaction with the equivalent assembler method for estimation

was one of the reasons IBM assigned Allan Albrecht and his colleagues

to develop function point metrics.

Additional programming languages such as APL, Forth, Jovial, and

others were starting to appear, and IBM wanted both a metric and esti-

mating methods that could deal with both noncoding and coding work

in an accurate fashion. IBM also wanted to predict coding defects.

The use of macro-assembly language had introduced reuse, and this

caused measurement problems, too. It raised the issue of how to count

reused code in software applications or any other reused material. The

solution here was to separate productivity and quality into two topics:

(1) development and (2) delivery.

The former dealt with the code and materials that had to be constructed

from scratch. The latter dealt with the final application as delivered,

including reused material. For example, using macro-assembly language

a productivity rate for development productivity might be 300 lines of

code per month. But due to reusing code in the form of macro expansions,

delivery productivity might be as high as 750 lines of code per month.

The same distinction affects quality, too. Assume a program had 1000

lines of new code and 1000 lines of reused code. There might be 15 bugs

per KLOC in the new code but 0 bugs per KLOC in the reused code.

This is an important business distinction that is not well understood

even in 2009. The true goal of software engineering is to improve the

rate of delivery productivity and quality rather than development pro-

ductivity and quality.

After function point metrics were developed circa 1975, the defini-

tion of "language level" was expanded to include the number of logical

code statements equivalent to 1 function point. COBOL, for example,

requires about 105 statements per function point in the procedure and

data divisions. (This expansion is the mathematical basis for backfiring,

or direct conversion from source code to function points.)

Table 9-8 illustrates how code size and coding defects would vary if

15 different programming languages were used for the same applica-

tion, which is 1000 function points. Table 9-8 assumes a constant value

of 15 potential coding defects per KLOC for all languages. However,

582

Chapter Nine

TABLE 9-8

Examples of Defects per KLOC and Function Point for 15 Languages

(Assumes a constant of 15 defects per KLOC for all languages)

Language

Sample

Source Code per Source Code per Coding

Defects per

Level

Languages Function Point

1000 FP

Defects

Function Point

Assembly

320

320,000

4,800

4.80

160

160,000

2,400

2.40

COBOL

107

106,667

1,600

1.60

PL/I

80,000

1,200

1.20

Ada95

64,000

960

0.96

Java

53,333

800

0.80

Ruby

45,714

686

0.69

40,000

600

0.60

Perl

35,556

533

0.53

10.

C++

32,000

480

0.48

11.

29,091

436

0.44

12.

Visual

26,667

400

0.40

Basic

13.

ASP NET

24,615

369

0.37

14.

Objective C

22,857

343

0.34

15.

Smalltalk

21,333

320

0.32

the 15 languages have levels that vary from 1 to 15, so very different

quantities of code will be created for the same 1000 function points.

Note: Language levels are variable and change based on volumes of

reused code or calls to external functions. The levels shown in Table 9-8

are only approximate and are not constants.

As can be seen from Table 9-8, in order to predict coding defects, it is

critical to know the programming language (or languages) that will be

used and also the size of the application using both function point and

lines of code metrics.

The situation is even more tricky when combinations of two or more

languages are used within the same application. However, this prob-

lem is handled by commercial software cost-estimating tools such as

KnowledgePlan, which include multilanguage estimating capabilities.

Reused code also adds to the complexity of predicting coding defects.

To show the results of multiple languages in the same application, let

us consider two case studies.

In Case Study A, there are three different languages and each lan-

guage has 1000 lines of code, counted using logical statements. In Case

Study B, we have the same three languages, but now each language

comprises exactly 25 function points each.

For Case A, the total volume of source code is 3000 lines of code; total

function points are 73; and total code defect potentials are 45.

Software Quality: The Key to Successful Software Engineering

583

Case A: Three Languages with 1000 Lines of Code Each

Lines of

LOC per

Function

Defect

Languages

Levels Code (LOC) Function Point

Points

Potential

2.00

1,000

160

Java

6.00

1,000

Smalltalk

15.00

1,000

TOTAL

3,000

AVERAGE

7.76

When we change the assumptions to Case B and use a constant value

of 25 function points for each language, the total number of function

points only changes from 73 to 75. But the volume of source code almost

doubles, as do numbers of defects. This is because of the much greater

impact of the lowest-level language, the C programming language.

When considering either Case A or Case B, it is easily seen that pre-

dicting either size or quality for a multi language application is a great

deal more complicated than for a single-language application.

Case B: Three Languages with 25 Function Points Each

Lines of

LOC per

Function Defect

Languages

Levels Code (LOC) Function Point

Points Potential

2.00

4,000

160

Java

6.00

1,325

Smalltalk

15.00

525

TOTAL

5,850

AVERAGE

4.10

It is interesting to look at Case A and Case B in a side-by-side format

to highlight the differences. Note that in Case B the influence of the

lowest-level language, the C programming language, increases both

code volumes and defect potentials:

Source Code

(Logical statements)

Case A

Case B

1,000

4,000

Java

1,000

1,325

Smalltalk

1,000

525

Total lines of code

3,000

5,850

Total KLOC

3.00

5.85

Function Points

Code Defects

Defects per KLOC

Defects per Function Point

0.62

1.17

584

Chapter Nine

Cases A and B oversimplify real-life problems because each case study

uses constants for data items that in real-life are variable. For example,

the constant of 15 defects per KLOC for code defects is really a variable

that can range from less than 5 to more than 25 defects per KLOC.

The number of source code statements per function point is also a vari-

able, and each language can vary by perhaps a range of 2 to 1 around the

average values shown by the nominal language "level" default values.

These variables illustrate why predicting quality and defect levels

depends so heavily upon measuring quality and defect levels. The exam-

ples also illustrate why definitions of quality need to be both measurable

and predictable.

Other variables can affect the ability to predict size and defects as well.

Suppose, for example, that reused code composed 50 percent of the code

volume in Case A. Suppose also that the reused code is certified and has

zero defects. Now the calculations for defect predictions need to include

reuse, which in this example lowers defect potentials by 50 percent.

When the size of the application is used for productivity calculations,

it is necessary to decide whether development productivity or delivery

productivity, or both, are the figures of interest.

Predicting software defects is possible to accomplish with fairly

good accuracy, but the calculations are not trivial, and they need to

include a number of variables that can only be determined by careful

measurements.

The Quality Impacts of Creeping Requirements

Function point analysis at the end of the requirements phase and then

again at application delivery shows that requirements grow and change

at rates in excess of 1 percent per calendar month during the design

and coding phases. The total growth in creeping requirements ranges

from a low of less than 10 percent of total requirements to a high of

more than 50 percent. (One unique project had requirements growth in

excess of 200 percent.)

As an example, if an application is sized at 1000 function points when

the initial requirements phase is over, then every month at least 10 new

function points will be added in the form of new requirements. This

growth might continue for perhaps six months, and so increase the size

of the application from 1000 to 1060 function points. For small projects,

the growth of creeping requirements is more of an inconvenience than

a serious issue.

Larger applications have longer schedules and usually higher rates of

requirements change as well. For an application initially sized at 10,000

function points, new requirements might lead to monthly growth rates of

125 function points for perhaps 20 calendar months. The delivered applica-

tion might be 12,500 function points rather than 10,000 function points.

Software Quality: The Key to Successful Software Engineering

585

As this example illustrates, creeping requirements growth of a full 25

percent can have a major impact on development schedules, costs, and

also on quality and delivered defects.

Because new and changing requirements are occurring later in devel-

opment than the original requirements, they are often rushed. As a

result, defect potentials for creeping requirements are about 10 percent

greater than for the original requirements. This is true for toxic require-

ments and design errors. Code bugs may or may not increase, based

upon the schedule pressure applied to the software engineering team.

Creeping requirements also tend to bypass formal inspections and

also have fewer test cases created for them. As a result, defect removal

efficiency is lower against both toxic requirements and also design

errors by at least 5 percent. This seems to be true for code errors as well,

with the exception that applications coded in C or Java that use static

analysis tools will still achieve high levels of defect removal efficiency

against code bugs.

The combined results of higher defect potentials and lower levels

of defect removal for creeping requirements result in a much greater

percentage of delivered defects stemming from changed requirements

than any other source of error. This has been a chronic problem for the

software industry.

The bottom line is that creeping requirements combined with below

optimum levels of defect prevention and defect removal are a primary

cause of cancelled projects, schedule delays, and cost overruns.

As will be discussed later in the sections on defect prevention and

defect removal, there are technologies available for minimizing the harm

from creeping requirements. However, these effective methods, such as

formal requirements and design inspections, are not widely used.

Measuring Software Quality

In spite of the fact that defect removal efficiency is a critical topic for suc-

cessful software projects, measuring defect removal efficiency or software

quality in general are seldom done. From visiting over 300 companies

in the United States, Europe, and Asia, the author found the following

distribution of the frequency of various kinds of quality measures:

No quality measures at all

44%

Measuring only customer-reported defects

30%

Measuring test and customer-reported defects

18%

Measuring inspection, static analysis, test, and customer-reported defects

Using volunteers for measuring personal defect removal

Overall Distribution

100%

586

Chapter Nine

The mathematics of measuring defect removal efficiency is not com-

plicated. Twelve steps in the sequence of data collection and calculations

are needed to quantify defect removal efficiency levels:

1. Accumulate data on every defect that occurs, starting with

requirements.

2. Assign severity levels to each reported defect as it is fixed.

3. Measure how many defects are removed by every defect removal

activity.

4. Use root-cause analysis to identify origins of high-severity defects.

5. Measure invalid defects, duplicates, and false positives, too.

6. After the software is released, measure customer-reported defects.

7. Record hours worked for defect prevention, removal, and repairs.

8. Select a fixed point such as 90 days after release for the calculations.

9. Use volunteers to record private defect removal such as desk

checking.

10. Calculate cumulative defect removal efficiency for the entire series.

11. Calculate the defect removal efficiency for each step in the series.

12. Use the data to improve both defect prevention and defect removal.

The effort and costs required to measure defect removal efficiency

levels are trivial compared with the value of such information. The total

effort required to measure each defect and its associated repair work

amounts to only about an hour. Of this time, probably half is expended

on customer-reported defects, and the other half is expended on internal

defect reports.

However, step 4, root-cause analysis, can take several additional

hours based on how well requirements and design are handled by the

development team.

The value of measuring defect removal efficiency encompasses the

following benefits:

Finding and fixing bugs is the most expensive activity in all of software,

■

so reducing these costs yields a very large return on investment.

Excessive numbers of bugs constitute the main reason for schedule

■

slippage, so reducing defects in all deliverables will shorten develop-

ment schedules.

Delivered defects are the major cost driver of software maintenance

■

for the first two years after release, so improving removal efficiency

lowers maintenance costs.

Software Quality: The Key to Successful Software Engineering

587

Customer satisfaction correlates inversely to numbers of delivered

■

defects, so reducing delivered defects will result in happier customers.

Team morale correlates with both effective defect prevention and

■

effective defect removal.

Later in the section on the economics of quality, these benefits will

be quantified to show the overall value of defect prevention and defect

removal.

Many companies and government organizations track software

defects found during static analysis, testing, and also defects reported

by customers. In fact, a number of commercial software defect tracking

tools are available.

These tools normally track defect symptoms, applications containing

defects, hardware and software platforms, and other kinds of indicative

data such as release number, built number, and so on.

However, more sophisticated organizations also utilize formal inspec-

tions of requirements, design, and other materials. Such companies

often utilize static analysis in addition to testing and therefore measure

a wider range of defects than just those found in source code by ordinary

testing.

Some additional information is needed in order to use expanded defect

data for root-cause analysis and other forms of defect prevention. These

additional topics include

It is important to record information on the point

Defect discovery point

at which any specific defect is found. Since requirements defects cannot

normally be found via testing, it is important to try and identify noncode

defect discovery points.

Collectively, noncode defects in requirements and design are more

numerous than coding defects, and also may be high in severity levels.

Defect repair costs for noncode defects are often higher than for coding

defects. Note that there are more than 17 kinds of software testing, and

companies do not use the same names for various test stages.

Date of defect discovery: ________________

Defect Discovery Point:

Customer defect report

■

Quality assurance defect report

■

Test stage _________________ defect report

■

Static analysis defect report

■

Code inspection defect report

■

Document inspection defect report

■

588

Chapter Nine

Design inspection defect report

■

Architecture inspection defect report

■

Requirements inspection defect report

■

Other ____________________ defect report

■

Defect origin point It is also important to record information on where

software defects originate. This information requires careful analysis

on the part of the change team, so many companies limit defect origin

research to high-severity defects such as Severity 1 and Severity 2.

Date of defect origination: ____________________

Defect Origin Point:

Application name

■

Release number

■

Build number

■

Source code (internal)

■

Source code (reused from legacy application)

■

Source code (reused from commercial source)

■

Source code (commercial software package)

■

Source code (bad-fix or previous defect repair)

■

User manual

■

Design document

■

Architecture document

■

Requirement document

■

Other _____________________ origination point

■

Ideally, the lag time between defect origins and defect discovery will

be less than a month and hopefully less than a week. It is very impor-

tant that defects that originate within a phase such as the requirements

or design phases should also be discovered and fixed during the same

phase.

When there is a long gap between origins and discovery, such as not

finding a design problem until system test, it is a sign that software

development and quality control processes need to improve.

The best solution for shortening the gap between defect origination

and defect discovery is that of formal inspections of requirements,

design, and other deliverables. Both static analysis and code inspections

are also valuable for shortening the intervals between defect origination

and defect discovery.

Software Quality: The Key to Successful Software Engineering

589

TABLE 9-9

Best-Case Defect Discovery Points

Defect Origins

Optimal Defect Discovery

Requirements

Requirements inspection

Design

Design inspection

Code

Static analysis

Bad fixes

Static analysis

Documentation

Editing

Test cases

Test case inspection

Table 9-9 shows the best-case scenario for defect discovery methods

for various defect origins.

Inspections are best at finding subtle and complex bugs and problems

that are difficult to find via testing because sometimes no test cases

are created for them. The example of the Y2K problem is illustrative

of a problem that could be found via testing so long as two-digit dates

were mistakenly believed to be acceptable. Code inspections are useful

for finding subtle problems such as security vulnerabilities that may

escape both testing and even static analysis.

Static analysis is best at finding common coding errors such as

branches to incorrect locations, overflow conditions, poor error handling,

and the like. Static analysis prior to testing or as an adjunct to testing

will lower testing costs.

Testing is best at finding problems that only show up when the code is

operating, such as performance problems, usability problems, interface

problems, and other issues such as mathematical errors or format errors

for screens and reports.

Given the diverse nature of software bugs and defects, it is obvious

that all three defect removal methods are important for success: inspec-

tions, static analysis, and testing.

Table 9-10 illustrates the fact that long delays between defect origins

and defect discovery lead to very troubling situations. Long gaps also

raise bad-fix injections, accidentally including new defects in attempts

to repair older defects.

TABLE 9-10

Worst-Case Defect Discovery Points

Defect Origins

Latest Defect Discovery

Requirements

Deployment

Design

System testing

Code

New function testing

Bad fixes

Regression testing

Documentation

Deployment

Test cases

Not discovered

590

Chapter Nine

In the worst-case scenario, requirements defects are not found until

deployment, while design defects are not found until system test, when

it is difficult to fix them without extending the overall schedule for

the project. Note that in the worst-case scenario, bugs or errors in

test cases themselves are never discovered, so they fester on for many

releases.

Defect prevention and early defect removal are far more cost-effective

than depending on testing alone.

Other quality measures include some or all of the following:

Since it is possible to predict defect potentials

Earned quality value (EQV)

and also to predict defect removal efficiency levels, some companies

such as IBM have used a form of "earned value" where predictions of

defects that would probably be found via inspections, static analysis,

and testing are compared with actual defect discovery rates. Predicted

and actual defect removal costs are also compared.

If fewer defects are found than predicted, then root-cause analysis can

be applied to discover if quality is really better than planned or if defect

removal is lax. (Usually quality is better when this happens.)

If more defects are found than predicted, then root-cause analysis

can be applied to discover if quality is worse than planned or if defect

removal is more effective than anticipated. (Usually, quality is worse

when this happens.)

Cost of quality (COQ) Collectively, the costs of finding and fixing bugs are

the most expensive known activity in the history of software. Therefore,

it is important to gather effort and cost data in such a fashion that cost

of quality (COQ) calculations can be performed.

However, for software, normal COQ calculations need to be tailored

to match the specifics of software engineering. Usually, data is recorded

in terms of hours and then converted into costs by applying salaries,

burden rates, and other cost items.

Defect discovery activity: ___________________

■

Defect prevention activities: ___________________

■

Defect effort reported by users

■

Defect damages reported by users

■

Preparation hours for inspections

■

Preparation hours for static analysis

■

Preparation hours for testing

■

Defect discovery hours

■

Defect reporting hours

■

Software Quality: The Key to Successful Software Engineering

591

Defect analysis hours

■

Defect repair hours

■

Defect inspection hours

■

Defect static analysis hours

■

Test stages used for defect

■

Test cases created for defect

■

Defect test hours

■

The software industry has long used the "cost per defect" metric

without actually analyzing how this metric works. Indeed, hundreds

of articles and books parrot similar phrases such as "it costs 100 times

as much to fix a bug after release than during coding" or some minor

variation on this phrase. The gist of these dogmatic statements is that

the cost per defect rises steadily as the later defects are found.

What few people realize is that cost per defect is always cheapest

where the most bugs are found and is most expensive where the fewest

bugs are found. In fact, as normally calculated, this metric violates stan-

dard economic assumptions because it ignores fixed costs. The cost per

defect metric actually penalizes quality and achieves the lowest results

for the buggiest applications!

Following is an analysis of why cost per defect penalizes quality and

achieves its best results for the buggiest applications. The same math-

ematical analysis also shows why defects seem to be cheaper if found

early rather than found later.

Furthermore, when zero-defect applications are reached, there

are still substantial appraisal and testing activities that need to be

accounted for. Obviously, the cost per defect metric is useless for zero-

defect applications.

Because of the way cost per defect is normally measured, as quality

improves, cost per defect steadily increases until zero-defect software is

achieved, at which point the metric cannot be used at all.

As with the errors in KLOC metrics, the main source of error is that

of ignoring fixed costs. Three examples will illustrate how cost per defect

behaves as quality improves.

In all three cases, A, B, and C, we can assume that test personnel work

40 hours per week and are compensated at a rate of $2500 per week or

$62.50 per hour. Assume that all three software features that are being

tested are 100 function points.

Case A: Poor Quality

Assume that a tester spent 15 hours writing test cases, 10 hours run-

ning them, and 15 hours fixing 10 bugs. The total hours spent was 40,

592

Chapter Nine

and the total cost was $2500. Since 10 bugs were found, the cost per

defect was $250. The cost per function point for the week of testing

would be $25.00.

Case B: Good Quality

In this second case, assume that a tester spent 15 hours writing test

cases, 10 hours running them, and 5 hours fixing one bug, which was

the only bug discovered. However, since no other assignments were

waiting and the tester worked a full week, 40 hours were charged to

the project.

The total cost for the week was still $2500, so the cost per defect has

jumped to $2500. If the 10 hours of slack time are backed out, leaving

30 hours for actual testing and bug repairs, the cost per defect would be

$1875. As quality improves, cost per defect rises sharply.

Let us now consider cost per function point. With the slack removed,

the cost per function point would be $18.75. As can easily be seen, cost

per defect goes up as quality improves, thus violating the assumptions

of standard economic measures.

However, as can also be seen, testing cost per function point declines

as quality improves. This matches the assumptions of standard econom-

ics. The 10 hours of slack time illustrate another issue: when quality

improves, defects can decline faster than personnel can be reassigned.

Case C: Zero Defects

In this third case, assume that a tester spent 15 hours writing test

cases and 10 hours running them. No bugs or defects were discovered.

Because no defects were found, the cost per defect metric cannot be

used at all.

But 25 hours of actual effort were expended writing and running test

cases. If the tester had no other assignments, he or she would still have

worked a 40-hour week, and the costs would have been $2500. If the 15

hours of slack time are backed out, leaving 25 hours for actual testing,

the costs would have been $1562.

With slack time removed, the cost per function point would be $15.63.

As can be seen again, testing cost per function point declines as quality

improves. Here, too, the decline in cost per function point matches the

assumptions of standard economics.

Time and motion studies of defect repairs do not support the aphorism

that it costs 100 times as much to fix a bug after release as before. Bugs

typically require between 15 minutes and 4 hours to repair.

Some bugs are expensive; these are called abeyant defects by IBM.

Abeyant defects are customer-reported defects that the repair center

cannot re-create, due to some special combination of hardware and

Software Quality: The Key to Successful Software Engineering

593

software at the client site. Abeyant defects constitute less than 5 percent

of customer-reported defects.

Because of the fixed or inelastic costs associated with defect removal

operations, cost per defect always increases as numbers of defects

decline. Because more defects are found at the beginning of a testing

cycle than after release, this explains why cost per defect always goes

up later in the cycle. It is because the costs of writing test cases, running

them, and having maintenance personnel available act as fixed costs.

In any manufacturing cycle with a high percentage of fixed costs, the

cost per unit will go up as the number of units goes down. This basic fact

of manufacturing economics is why cost per defect metrics are hazard-

ous and invalid for economic analysis of software applications.

What would be more effective is to record the hours spent for all forms

of defect removal activity. Once the hours are recorded, the data could

be converted into cost data, and also normalized by converting hours

into standard units such as hours per function point.

Table 9-11 shows a sample of the kinds of data that are useful in

assessing cost of quality and also doing economic studies and effective-

ness studies.

Of course, knowing defect removal hours implies that data is also

collected on defect volumes and severity levels. Table 9-12 shows the

same set of activities as Table 9-11, but switches from hours to defects.

Both Tables 9-11 and 9-12 could also be combined into a single large

spreadsheet. However, defect counts and defect effort accumulation

tend to come from different sources and may not be simultaneously

available.

Defect effort and discovered defect counts are important data ele-

ments for long-range quality improvements. In fact, without such data,

quality improvement is likely to be minimal or not even occur at all.

Failure to record defect volumes and repair effort is a chronic weak-

ness of the software engineering domain. However, several software

development methods such as Team Software Process (TSP) and the

Rational Unified Process (RUP) do include careful defect measures.

The Agile method, on the other hand, is neither strong nor consistent

on software quality measures.

For software engineering to become a true engineering discipline and

not just a craft as it is in 2009, defect measurements, defect prediction,

defect prevention, and defect removal need to become a major focus of

software engineering.

Measuring Defect Removal Efficiency

One of the most effective metrics for demonstrating and improving soft-

ware quality is that of defect removal efficiency. This metric is simple in

concept but somewhat tricky to apply. The basic idea of this metric is to

594

Chapter Nine

TABLE 9-11

Software Defect Removal Effort Accumulation

Defect Removal Effort (Hours Worked)

Preparation

Execution

Repair

TOTAL

Removal Stage

Hours

HOURS

Inspections:

Requirements

Architecture

Design

Source code

Documents

Static analysis

Test stages:

Unit

New function

Regression

Performance

Usability

Security

System

Independent

Beta

Acceptance

Supply chain

Maintenance:

Customers

Internal SQA

calculate the percentage of software defects found by means of defect

removal operations such as inspections, static analysis, and testing.

What makes the calculations for defect removal efficiency tricky is

that it includes noncode defects found in requirements, design, and

other paper deliverables, as well as coding defects.

Table 9-13 illustrates an example of defect removal efficiency levels

for a full suite of removal operations starting with requirements inspec-

tions and ending with Acceptance testing.

Table 9-13 makes a number of simplifying assumptions. One of these

is the assumption that all delivered defects will be found by customers

in the first 90 days of usage. In real life, of course, many latent defects in

delivered software will stay hidden for months or even years. However,

after 90 days, new releases will usually occur, and they make it difficult

to measure defects for prior releases.

Software Quality: The Key to Successful Software Engineering

595

TABLE 9-12

Software Defect Severity Level Accumulation

Defect Severity Levels

Severity 1

Severity 2

Severity 3

Severity 4

TOTAL

Removal Stage

(Critical)

(Serious)

(Minor)

(Cosmetic)

DEFECTS

Inspections:

Requirements

Architecture

Design

Source code

Documents

Static Analysis

Test stages:

Unit

New function

Regression

Performance

Usability

Security

System

Independent

Beta

Acceptance

Supply chain

Maintenance:

Customers

Internal SQA

It is interesting to see what kind of defect removal efficiency levels

occur with less sophisticated series of defect removal steps that do not

include either formal inspections or static analysis.

Since noncode defects that originate in requirements and design even-

tually find their way into the code, the overall removal efficiency levels

of testing by itself without any precursor inspections or static analysis

are seriously degraded, as shown in Table 9-14.

When comparing Tables 9-13 and 9-14, it can easily be seen that a

full suite of defect removal activities is more efficient and effective than

testing alone in finding and removing software defects that originate

outside of the source code. In fact, inspections and static analysis are

also very efficient in finding coding defects and have the additional

property of raising testing efficiency and lowering testing costs.

596

Chapter Nine

TABLE 9-13

Software Defect Removal Efficiency Levels

(Assumes inspections, static analysis, and normal testing)

Application size

(function points)

1,000

Language

Code size

125,000

Noncode defects

3,000

Code defects

2,000

TOTAL DEFECTS

5,000

Defect Removal Efficiency by Removal Stage

Noncode

Code

Total

Removal

Removal Stage

Defects

Efficiency

Inspections:

Requirements

750

Architecture

200

Design

1,250

Source code

100

800

900

Documents

250

Subtotal

2,550

800

3,350

67.00%

Static Analysis

800

66.67%

Test stages:

Unit

New function

100

150

Regression

Performance

Usability

Security

System

Independent

Beta

Acceptance

Supply chain

Subtotal

200

300

500

58.82%

Prerelease Defects

2,750

1,900

4,650

93.00%

Maintenance:

Customers (90 days)

250

100

350

100.00%

TOTAL

3,000

2,000

5,000

Removal Efficiency

91.67%

95.00%

93.00%

Software Quality: The Key to Successful Software Engineering

597

TABLE 9-14

Software Defect Removal Efficiency Levels

(Assumes normal testing without inspections or static analysis)

Application size

1000

(function points)

Language

Code size

125,000

Noncode defects

3,000

Code defects

2,000

TOTAL DEFECTS

5,000

Defect Removal Efficiency by Removal Stage

Noncode

Code

Total

Removal

Removal Stage

Defects

Efficiency

Inspections:

Requirements

Architecture

Design

Source code

Documents

Subtotal

0.00%

Static Analysis

0.00%

Test stages:

Unit

200

350

550

New function

450

600

1,050

Regression

100

Performance

Usability

200

275

Security

System

300

200

500

Independent

Beta

150

175

Acceptance

175

195

Supply chain

Subtotal

1,600

1,500

3,100

62.00%

Prerelease Defects

1,600

1,500

3,100

62.00%

Maintenance:

Customers (90 days)

1,400

500

1,900

100.00%

TOTAL

3,000

2,000

5,000

Removal Efficiency

53.33%

75.00%

62.00%

598

Chapter Nine

Without pretest inspections and static analysis, testing will find hun-

dreds of bugs, but the overall defect removal efficiency of the full suite of

test activities will be lower than if inspections and static analysis were

part of the suite of removal activities.

In addition to elevating defect removal efficiency levels, adding formal

inspections and static analysis to the suite of defect removal opera-

tions also lowers development and maintenance costs. Development

schedules are also shortened, because traditional lengthy test cycles are

usually the dominant part of software development schedules. Indeed,

poor quality tends to stretch out test schedules by significant amounts

because the software does not work well enough to be released.

Table 9-15 shows a side-by-side comparison of cost structures for the

two examples discussed in this section. Case X is derived from Table

9-13 and uses a sophisticated combination of formal inspections, static

analysis, and normal testing.

Case Y is derived from Table 9-14 and uses only normal testing, with-

out any inspections or static analysis being performed.

The costs in Table 9-15 assume a fully burdened compensation struc-

ture of $10,000 per month. The defect-removal costs assume prepara-

tion, execution, and defect repairs for all defects found and identified.

In addition to the cost advantages, excellence in quality control also

correlates with customer satisfaction and with reliability. Reliability

and customer satisfaction both correlate inversely with levels of deliv-

ered defects.

The more defects there are at delivery, the more unhappy custom-

ers are. In addition, mean time to failure (MTTF) goes up as delivered

defects go down. The reliability correlation is based on high-severity

defects in the Severity 1 and Severity 2 classes.

Table 9-16 shows the approximate relationship between delivered

defects, reliability in terms of mean time to failure (MTTF) hours, and

customer satisfaction with software applications.

Table 9-16 uses integer values, so interpolation between these dis-

crete values would be necessary. Also, the reliability levels are only

approximate. Table 9-13 deals only with the C programming language,

so adjustments in defects per function point would be needed for the

700 other languages that exist. Additional research is needed on the

topics of reliability and customer satisfaction and their correlations

with delivered defect levels.

However, not only do excessive levels of delivered defects generate

negative scores on customer satisfaction surveys, but they also show up

in many lawsuits against outsource contractors and commercial soft-

ware developers. In fact, one lawsuit was even filed by shareholders of

a major software corporation who claimed that excessive defect levels

were lowering the value of their stock.

Software Quality: The Key to Successful Software Engineering

599

TABLE 9-15

Comparison of Software Defect Removal Efficiency Costs

(Case X = inspections, static analysis, normal testing)

(Case Y = normal testing only)

Application size

1,000

(function points)

Language

Code size

125,000

Noncode defects

3,000

Code defects

2,000

TOTAL DEFECTS

5,000

Defect Removal Costs by Activity

Case X

Case Y

Removal Stage

Removal $

Difference

Inspections:

Requirements

Architecture

Design

Source code

Documents

Subtotal

$168,750

Static Analysis

$81,250

Test stages:

Unit

New function

Regression

Performance

Usability

Security

System

Independent

Beta

Acceptance

Supply chain

Subtotal

$150,000

$775,000

$625,000

Prerelease Defects

$400,000

$775,000

$375,000

Maintenance:

Customers (90 days)

$175,000

$950,000

$775,000

TOTAL COSTS

$575,000

$1,725,000

$1,150,000

Cost per Defect

$115.00

$345.00

$230.00

Cost per Function Pt.

$575.00

$1,725.00

$1,150.00

Cost per LOC

$4.60

$13.80

$9.20

ROI from inspections,

$3.00

static analysis

Development Schedule

12.00

16.00

4.00

(Calendar months)

600

Chapter Nine

TABLE 9-16

Delivered Defects, Reliability, Customer Satisfaction

(Note 1: Assumes the C programming language)

(Note 2: Assumes 125 LOC per function point)

(Note 3: Assumes severity 1 and 2 delivered defects)

Delivered Defects

Defects per

Mean Time

Customer

per KLOC

Function Point

to Failure (MTTF hours)

Satisfaction

0.00

Infinite

Excellent

1.00

0.13

303

Very good

2.00

0.25

223

Good

3.00

0.38

157

Fair

4.00

0.50

105

Poor

5.00

0.63

Very poor

6.00

0.75

Very poor

7.00

0.88

Very poor

8.00

1.00

Litigation

9.00

1.13

Litigation

10.00

1.25

Malpractice

Better quality control is the key to successful software engineering.

Software quality needs to be definable, predictable, measurable, and

improvable in order for software engineering to become a true engineer-

ing discipline.

Defect Prevention

The phrase "defect prevention" refers to methods and techniques that

lower the odds of certain kinds of defects occurring at all. The liter-

ature of defect prevention is very sparse, and academic research is

even sparser. The reason for this is that studying defect prevention is

extremely difficult and also somewhat ambiguous at best.

Defect prevention is analogous to vaccination against serious illness

such as pneumonia or flu. There is statistical evidence that vaccination

will lower the odds of patients contracting the diseases for which they

are vaccinated. However, there is no proof that any specific patient

would catch the disease whether receiving a vaccination or not. Also,

a few patients who are vaccinated might contract the disease anyway,

because vaccines are not 100 percent effective. In addition, some vac-

cines may have serious and unexpected side-effects.

All of these issues can occur with software defect prevention, too.

While there is statistical evidence that certain methods such as pro-

totypes, joint application design (JAD), quality function deployment

(QFD), and participation in inspections prevent certain kinds of defects

Software Quality: The Key to Successful Software Engineering

601

from occurring, it is hard to prove that those defects would definitely

occur in the absence of the preventive methodologies.

The way defect prevention is studied experimentally is to have two

versions of similar or identical applications developed, with one version

using a particular defect prevention method while the other version

did not. Obviously, experimental studies such as this must be small in

scale.

The easiest experiments in defect prevention are those dealing with

formal inspections of requirements, design, and code. Because inspec-

tions record all defects, companies that utilize formal inspections soon

accumulate enough data to analyze both defect prevention and defect

removal.

Formal inspections are so effective in terms of defect prevention that

they reduce defect potentials by more than 25 percent per year. In fact,

one issue with inspections is that after about three years of continuous

usage, so few defects occur that inspections become boring.

The more common method for studying defect prevention is to exam-

ine the results of large samples of applications and note differences in

the defect potentials among them. In other words, if 100 applications

that used prototypes are compared with 100 similar applications that

did not use prototypes, are requirements defects lower for the prototype

sample? Are creeping requirements lower for the prototype sample?

This kind of study can only be carried out internally by rather sophis-

ticated companies that have very sophisticated defect and quality mea-

surement programs; that is, companies such as IBM, AT&T, Microsoft,

Raytheon, Lockheed, and the like. (Consultants who work for a number

of companies in the same industry can often observe the effects of defect

prevention by noting similar applications in different companies.)

However, the results of such large-scale statistical studies are some-

times published from benchmark collections by organizations such

as the International Software Benchmarking Standards Group (ISBSG),

the David Consulting Group, Software Productivity Research (SPR),

and the Quality and Productivity Management Group (QPMG).

In addition, consultants such as the author who work as expert wit-

nesses in software litigation may have access to data that is not oth-

erwise available. This data shows the negative effects of failing to use

defect prevention on projects that ended up in court.

Table 9-17 illustrates a large sample of 30 methods and techniques

that have been observed to prevent software defects from occurring.

Although the table shows specific percentages of defect prevention effi-

ciency, the actual data is too sparse to support the results. The percent-

ages are only approximate and merely serve to show the general order

of effectiveness.

602

Chapter Nine

TABLE 9-17

Methods and Techniques that Prevent Defects

Activities Observed to Prevent Software Defects

Defect Prevention Efficiency

Reuse (certified sources)

80.00%

Inspection participation

60.00%

Prototyping-functional

55.00%

PSP/TSP

53.00%

Six Sigma for software

53.00%

Risk analysis (automated)

50.00%

Joint application design (JAD)

45.00%

Test-driven development (TDD)

45.00%

Defect origin measurements

44.00%

10.

Root cause analysis

43.00%

11.

Quality function deployment (QFD)

40.00%

12.

CMM 5

37.00%

13.

Agile embedded users

35.00%

14.

Risk analysis (manual)

32.00%

15.

CMM 4

27.00%

16.

Poka-yoke

23.00%

17.

CMM 3

23.00%

18.

Scrum sessions (daily)

20.00%

19.

Code complexity analysis

19.00%

20.

Use-cases

18.00%

21.

Reuse (uncertified sources)

17.00%

22.

Security plans

15.00%

23.

Rational Unified Process (RUP)

15.00%

24.

Six Sigma (generic)

12.50%

25.

Clean-room development

12.50%

26.

Software Quality Assurance (SQA)

12.50%

27.

CMM 2

12.00%

28.

Total Quality Management (TQM)

10.00%

29.

No use of CMM

0.00%

30.

CMM 1

5.00%

Average

30.12%

Note that because defect prevention deals with reducing defect poten-

tials, percentages show negative values for methods that lower defects.

Positive values indicate methods that raise defect potentials.

The two top-ranked items deserve comment. The phrase "reuse from

certified sources" implies formal reusability where specifications, source

code, test cases, and the like have gone through rigorous inspection and

test stages, and have proven themselves to be reliable in field trials.

Certified reusable components may approach zero defects, and in any

Software Quality: The Key to Successful Software Engineering

603

case contain very few defects. Reuse of uncertified material is somewhat

hazardous by comparison.

The second method, or participation in formal inspections, has more

than 40 years of empirical data. Inspections of requirements, design,

and other deliverables are very effective and efficient in terms of defect

removal efficiency. But in addition, participants in formal inspections

become aware of defect patterns and categories, and spontaneously

avoid them in their own work.

One emerging form of risk analysis is so new that it lacks empirical

data. This new method consists of performing very early sizing and risk

analysis prior to starting a software application or spending any money

on development.

If the risks for the project are significantly higher than its value, not

doing it at all will obviously prevent 100 percent of potential defects. The

Victorian state government in Australia has started such a program,

and by eliminating hazardous software applications before they start,

they have saved millions of dollars.

New sizing methods based on pattern matching can shift the point

at which risk analysis can be performed about six months earlier than

previously possible. This new approach is promising and needs addi-

tional study.

There are other things that also have some impact in terms of defect

prevention. One of these is certification of personnel either for testing

or for software quality assurance knowledge. Certification also has an

effect on defect removal. The defect prevention effects are shown using

negative percentages, while the defect removal effects are shown with

positive percentages.

Here too the data is only approximate, and the specific percentages

are derived from very sparse sources and should not be depended upon.

Table 9-18 is sorted in terms of defect prevention.

The data in Table 9-18 should not be viewed as accurate, but only

approximate. A great deal more research is needed on the effectiveness

of various kinds of certification. Also, the software industry circa 2009

has overlapping and redundant forms of certification. There are mul-

tiple testing and quality associations that offer certification, but these

separate groups certify using different methods and are not coordinated.

In the absence of a single association or certification body, these various

nonprofit and for-profit test and quality assurance associations offer

rival certificates that use very different criteria.

Yet another set of factors that has an effect in terms of defect pre-

vention are various kinds of metrics and measurements, as discussed

earlier in this book.

For metrics and measurements to have an effect, they need to be

capable of demonstrating quality levels and measuring changes against

604

Chapter Nine

TABLE 9-18

Influence of Certification on Defect Prevention and Removal

Defect

Removal

Prevention

Benefit

Certificate

31.

Six Sigma black belt

12.50%

10.00%

32.

International Software Testing Quality Board (ISTQB)

12.00%

10.00%

33.

Certified Software Quality Engineer (CSQE)-ASQ

10.00%

34.

Certified. Software Quality Analyst (CSQA)

10.00%

35.

Certified Software Test Manager (CSTM)

7.00%

36.

Six Sigma green belt

6.00%

5.00%

37.

Microsoft certification (testing)

6.00%

38.

Certified Software Test Professional (CSTP)

5.00%

12.00%

39.

Certified Software Tester (CSTE)

5.00%

12.00%

40.

Certified Software Project Manager (CSPM)

3.00%

Average

7.65%

8.50%

quality baselines. Therefore, many of the -ility measures and metrics

cannot even be included because they are not measurable.

Table 9-19 shows the approximate impacts of various measurements

and metrics on defect prevention and defect removal. IFPUG function

points are top-ranked because they can be used to quantify defects in

TABLE 9-19

Software Metrics, Measures, and Defect Prevention and Removal

Defect Prevention

Defect Removal

Metric

Benefit

41.

IFPUG function points

30.00%

15.00%

42.

Six Sigma

25.00%

20.00%

43.

Cost of quality (COQ)

22.00%

15.00%

44.

Root cause analysis

20.00%

12.00%

45.

TSP/PSP

20.00%

18.00%

46.

Monthly rate of requirements change

17.00%

5.00%

47.

Goal-question metrics

15.00%

10.00%

48.

Defect removal efficiency

12.00%

35.00%

49.

Use-case points

12.00%

5.00%

50.

COSMIC function points

10.00%

51.

Cyclomatic complexity

10.00%

7.00%

52.

Test coverage percent

10.00%

22.00%

53.

Percent of requirements missed

7.00%

3.00%

54.

Story points

5.00%

55.

Cost per defect

10.00%

15.00%

56.

Lines of code (LOC)

15.00%

12.00%

Average

11.25%

9.06%

Software Quality: The Key to Successful Software Engineering

605

requirements and design as well as code. IFPUG function points can

also be used to measure software defect removal costs and quality eco-

nomics.

Note that the bottom two metrics, cost per defect and lines of code, are

shown as harmful metrics rather than beneficial because they violate

the assumptions of standard economics.

Note that the two bottom-ranked measurements from Table 9-16 have

a negative impact; that is, they make quality worse rather than better.

As commonly used in the software literature, both cost per defect and

lines of code are close to being professional malpractice, because they

violate the canons of standard economics and distort results.

The lines of code metric penalizes high-level languages and makes

both the quality and productivity of low-level languages look better

than it really is. In addition, this metric cannot even be used to measure

requirements and design defects or any other form of noncode defect.

The cost per defect metric penalizes quality and makes buggy applica-

tions look better than applications with few defects. This metric cannot

even be used for zero-defect applications. A nominal quality metric that

penalizes quality and can't even be used to show the highest level of

quality is a good candidate for being professional malpractice.

The final aspect of defect prevention discussed in this chapter is that

of the effectiveness of various international standards. Unfortunately,

the effectiveness of international standards has very little empirical

data available.

There are no known controlled studies that demonstrate if adherence

to standards improves quality. There is some anecdotal evidence that at

least some standards, such as ISO 9001-9004, degrade quality because

some companies that did not use these standards had higher quality on

similar applications than companies that had been certified. Table 9-20

shows approximate results, but the table has two flaws. It only shows a

small sample of standards, and the rankings are based on very sparse

and imperfect information.

In fields outside of software such as medical practice, standards are

normally validated by field trials, controlled studies, and extensive anal-

ysis. For software, standards are not validated and are based on the

subjective views of the standards committees. While some of these com-

mittees are staffed by noted experts and the standards may be useful,

the lack of validation and field trials prior to publication is a sign that

software engineering needs additional evolution before being classified

as a full engineering discipline.

Tables 9-17 through 9-20 illustrate a total of 65 defect preven-

tion methods and practices. These are not all used at the same time.

Table 9-18 shows the approximate usage patterns observed in several

hundred U.S. companies (and in about 50 overseas companies).

606

Chapter Nine

TABLE 9-20

International Standards, Defect Prevention and Removal

Defect

Prevention Removal

Benefit

Standard or Government Mandate

57.

ISO/IEC 10181 Security Frameworks

25.00%

58.

ISO 17799 Security

15.00%

59.

Sarbanes-Oxley

12.00%

6.00%

60.

ISO/IEC 25030 Software Product Quality Requirements

10.00%

5.00%

61.

ISO/IEC 9126-1 Software Engineering Product Quality

10.00%

5.00%

62.

IEEE 730-1998 Software Quality Assurance Plans

8.00%

5.00%

63.

IEEE 1061-1992 Software Metrics

7.00%

2.00%

64.

ISO 9000-9003 Quality Management

6.00%

5.00%

65.

ISO 9001:2000 Quality Management System

4.00%

7.00%

Average

10.78%

8.33%

Table 9-21 is somewhat troubling because the three top-ranked meth-

ods have been demonstrated to be harmful and make quality worse

rather than better. In fact, of the really beneficial defect prevention

methods, only a handful such as prototyping, measuring test coverage,

and joint application design (JAD) have more than 50 percent usage in

the United States.

Usage of many of the most powerful and effective methods such as

inspections or measuring cost of quality (COQ) have less than 33 per-

cent usage or penetration. The data shown in Table 9-18 is not precise,

since much larger samples would be needed. However, it does illustrate

a severe disconnect between effective methods of defect prevention and

day-to-day usage in the United States.

Part of the reason for the dismaying patterns of usage is because

of the difficulty of actually measuring and studying defect prevention

methods. Only a few large and sophisticated corporations are able to

carry out studies of defect prevention. Most universities cannot study

defect prevention because they lack sufficient contacts with corpora-

tions and therefore have little data available.

In conclusion, defect prevention is sparsely covered in the software

literature. There is very little empirical data available, and a great deal

more research is needed on this topic.

One way to improve defect prevention and defect removal would be to

create a nonprofit foundation or association that studied a wide range

of quality topics. Both defect prevention and defect removal would be

included. Following is the hypothetical structure and functions of a pro-

posed nonprofit International Software Quality Foundation (ISQF).

Software Quality: The Key to Successful Software Engineering

607

TABLE 9-21

Usage Patterns of Software Defect Prevention Methods

Defect Prevention Method

Percent of U.S. Projects

Reuse (uncertified sources)

90.00%

Cost per defect

75.00%

Lines of code (LOC)

72.00%

Prototyping-functional

70.00%

Test coverage percent

67.00%

No use of CMM

50.00%

Joint application design (JAD)

45.00%

Percent of requirements missed

38.00%

Software quality assurance (SQA)

36.00%

10.

Use-cases

33.00%

11.

IFPUG function points

33.00%

12.

Test-driven development (TDD)

30.00%

13.

Cost of quality (COQ)

29.00%

14.

Scrum sessions (daily)

28.00%

15.

CMM 3

28.00%

16.

Agile embedded users

27.00%

17.

Six Sigma

24.00%

18.

Risk analysis (manual)

22.00%

19.

Rational Unified Process (RUP)

22.00%

20.

Cyclomatic complexity

21.00%

21.

CMM 1

20.00%

22.

Monthly rate of requirements change

20.00%

23.

Code complexity analysis

19.00%

24.

ISO 9001:2000 Quality Management System

19.00%

25.

Microsoft certification (testing)

18.00%

26.

ISO 9000-9003 Quality Management

18.00%

27.

Root cause analysis

17.00%

28.

ISO/IEC 9126-1 Software Engineering Product

17.00%

Quality

29.

TSP/PSP

16.00%

30.

ISO/IEC 25030 Software Product Quality

16.00%

Requirements

31.

IEEE 1061-1992 Software Metrics

16.00%

32.

Defect origin measurements

15.00%

33.

Root cause analysis

15.00%

34.

IEEE 730-1998 Software Quality Assurance Plans

15.00%

35.

PSP/TSP

14.00%

36.

Six Sigma for software

13.00%

37.

Six Sigma (generic)

13.00%

38.

Story points

13.00%

(Continued)

608

Chapter Nine

TABLE 9-21

Usage Patterns of Software Defect Prevention Methods (continued)

Defect Prevention Method

Percent of U.S. Projects

39.

Inspection participation

12.00%

40.

CMM 2

12.00%

41.

Sarbanes-Oxley

12.00%

42.

Six Sigma green belt

11.00%

43.

ISO/IEC 10181 Security Frameworks

11.00%

44.

Six Sigma black belt

10.00%

45.

Defect removal efficiency

10.00%

46.

Use-case points

10.00%

47.

ISO 17799 Security

10.00%

48.

Goal-Question Metrics

9.00%

49.

CMM 4

8.00%

50.

Certified Software Test Professional (CSTP)

8.00%

51.

Security plans

7.00%

52.

Quality function deployment (QFD)

6.00%

53.

Total quality management (TQM)

6.00%

54.

Certified Software Project Manager (CSPM)

6.00%

55.

International Software Testing Quality Board

4.00%

(ISTQB)

56.

Certified Software Quality Analyst (CSQA)

4.00%

57.

Certified Software Tester (CSTE)

4.00%

58.

COSMIC function points

4.00%

59.

Certified Software Quality Engineer (CSQE) ASQ

3.00%

60.

Risk analysis (automated)

2.00%

61.

Certified Software Test Manager (CSTM)

2.00%

62.

Reuse (certified sources)

1.00%

63.

CMM 5

1.00%

64.

Poka-yoke

0.10%

65.

Clean-room development

0.10%

Proposal for a Nonprofit International Software Quality Foundation (ISQF)

The ISQF will be a nonprofit foundation that is dedicated to improv-

ing the quality and economic value of software applications. The form

of incorporation is to be decided by the initial board of directors. The

intent is to incorporate under section 501(c) of the Internal Revenue

Code and thereby be a tax-exempt organization that is authorized to

receive donations.

The fundamental principles of ISQF are the following:

1. Poor quality has been and is damaging the professional reputation

of the software community.

Software Quality: The Key to Successful Software Engineering

609

2. Poor quality has been and is causing significant litigation between

clients and software development corporations.

3. Significant software quality improvements are technically possible.

4. Improved software quality has substantial economic benefits in

reducing software costs and schedules.

5. Improved software quality depends upon accurate measurement

of quality in many forms, including, but not limited to, measuring

software defects, software defect origins, software defect severity

levels, methods of defect prevention, methods of defect removal,

customer satisfaction, and software team morale.

6. The major cost of software development and maintenance is that

of eliminating defects. ISQF will mount major studies on measur-

ing the economic value of defect prevention, defect removal, and

customer satisfaction.

7. Measurement and estimation are synergistic technologies. ISQF

will evaluate software quality and reliability estimation methods,

and will publish the results of their evaluations. No fees from esti-

mation tool vendors will be accepted. The evaluations will be inde-

pendent and based on standard benchmarks and test cases.

8. Software defects can originate in requirements, design, coding, user

documents, and also in test plans and test cases themselves. In

addition, there are secondary defects that are introduced while

attempting to repair earlier defects. ISQF will study all sources of

software problems and attempt to improve all sources of software

defects and user dissatisfaction.

9. ISQF will sponsor research in technical topics that may include, but

are not be limited to, inspections, static analysis, test case design,

test coverage analysis, test tools, defect reporting, defect tracking

tools, bad-fix injections, error-prone module removal, complexity

analysis, defect prevention, formal inspections, quality measure-

ments, and quality metrics.

10. The ISQF will also sponsor research to quantify the effects of all

social factors that influence software quality, including the effective-

ness of software quality assurance organizations (SQA), separate

test organizations, separate maintenance organizations, interna-

tional standards, and the value of certification. Methods of studying

software customer satisfaction will also be supported.

11. The service metrics defined in the Information Technology

Infrastructure Library (ITIL) are all dependent upon achieving

satisfactory levels of quality. ISQF will incorporate principles from

the ITIL library, and will also sponsor research studies to show the

610

Chapter Nine

correlations between reliability and availability and quality levels

in terms of delivered defects.

12. As new technologies appear in the software industry, it is impor-

tant to stay current with their quality impacts. ISQF will perform

or commission studies on the quality results of a variety of new

approaches including but not limited to Agile development, cloud

computing, crystal development, extreme programming, open-

source development, and service-oriented architecture (SOA).

13. ISQF will provide model curricula for university training in soft-

ware measurement, metrics, defect prevention, defect removal, cus-

tomer support, customer satisfaction, and the economic value of

software quality.

14. ISQF will provide model curricula for MBA programs that deal with

the economics of software and the principles of software manage-

ment. The economics of quality will be a major subtopic.

15. ISQF will provide model curricula for corporate and in-house train-

ing in software measurement, metrics, defect prevention, defect

removal, customer support, customer satisfaction, and the economic

value of software quality.

16. ISQF will provide recommended skill profiles for the occupations of

software quality assurance (SQA), software testing, software cus-

tomer support, and software quality measurement.

17. ISQF will offer examinations and licensing certificates for the

occupations of software quality assurance (SQA), software testing,

software customer support, and software quality measurement. Of

these, software quality measurement has no current certification.

18. ISQF will establish boards of competence to administer examina-

tions and define the state of the art for software quality assurance

(SQA), software testing, and software quality measurement. Other

boards and specialties may be added at future times.

19. ISQF will define the conditions of professional malpractice as they

apply to inadequate methods of software quality control. Examples

of such conditions may include failing to keep adequate records of

software defects, failing to utilize sufficient test stages and test cases,

and failing to perform adequate inspections of critical materials.

20. ISQF will cooperate with other nonprofit organizations that are

concerned with similar issues. These organizations include but are

not limited to the Global Association for Software Quality (GASQ)

in Belgium, the World Quality Conference, the IEEE, the ISO, ANSI,

IFPUG, SPIN, and the SEI. IASQ will also cooperate with other

organizations such as universities, the Information Technology

Software Quality: The Key to Successful Software Engineering

611

Metrics and Productivity Institute (ITMPI), the Project Management

Institute (PMI), the Quality Assurance Institute (QAI), software

testing societies, and relevant engineering, benchmarking, and pro-

fessional organizations such as the ISBSG benchmarking group.

ISQF will also cooperate with similar quality organizations abroad

such as those in China, Japan, India, and Russia. This cooperation

might include reciprocal memberships if other organizations are

willing to participate in that fashion.

21. ISQF will be governed by a board of five directors, to be elected by

the membership. The board of directors will appoint a president

or chief executive officer. The president will appoint a treasurer,

secretary, and such additional officers as may be required by the

terms and place of incorporation. Initially, the board, president,

and officers will serve as volunteers on a pro bono basis. To ensure

inclusion of fresh information, the term of president will be two

calendar years.

22. Funding for the ISQF will be a combination of dues, donations,

grants, and possible fund-raising activities such as conferences and

events.

23. The ISQF will also have a technical advisory board of five members

to be appointed by the president. The advisory board will assist

ISQF in staying at the leading edge of research into topics such as

testing, inspections, quality metrics, and also availability and reli-

ability and other ITIL metrics.

24. The ISQF will use modern communication methods to expand the

distribution of information on quality topics. These methods will

include an ISQF web site, webinars, a possible quality Wikipedia,

Twitter, blogs, and online newsletters.

25. The ISQF will have several subcommittees that deal with topics

such as membership, grants and donations, press liaison, university

liaison, and liaison with other nonprofit organizations such as the

Global Association of Software Quality in Belgium.

26. To raise awareness of the importance of quality, the ISQF will

produce a quarterly journal, with a tentative name of Software

Quality Progress. This will be a refereed journal, with the referees

all coming from the ISQF membership.

27. To raise awareness of the importance of quality, the ISQF will spon-

sor an annual conference and will solicit nominations for a series

of "outstanding quality awards." The initial set of awards will be

organized by type of software (information systems, commercial

applications, military software, outsourced applications, systems

and embedded software, web applications). The awards will be for

612

Chapter Nine

lowest numbers of delivered defects, highest levels of defect removal

efficiency, best customer service, and highest rankings of customer

satisfaction.

28. To raise awareness of the importance of software quality, ISQF

members will be encouraged to write and review articles and

books on software quality topics. Both technical journals such as

CrossTalk and mainstream business journals such as the Harvard

Business Review will be journals of choice.

29. To raise awareness of the importance of software quality, ISQF will

begin the collection of a major library of books, journal articles, and

monographs on topics and issues associated with software quality.

30. To raise awareness of the importance of software quality, ISQF will

sponsor benchmark studies of software defects, defect severity levels,

defect removal efficiency, test coverage, inspection efficiency, inspec-

tion and test costs, cost of quality (COQ), and software litigation where

poor quality was one of the principal complaints by the plaintiffs.

31. To raise awareness of the economic consequences of poor quality,

the ISQF will sponsor research on consequential damages, deaths,

and property losses associated with poor software quality.

32. To raise awareness of the economic consequences of poor quality,

the ISQF will collect public information on the results of software

litigation where poor quality was part of the plaintiff's claims. Such

litigation includes breach of contract cases, fraud cases, and cases

where poor quality damaged plaintiff business operations.

33. To raise awareness of the importance of software quality, ISQF

chapters will be encouraged at state and local levels, such as Rhode

Island Software Quality Association or a Boston Software Quality

Association.

34. To ensure high standards of quality education, the ISQF will review

and certify specific courses on software quality matters offered by

universities and private corporations as well. Courses will be sub-

mitted for certification on a voluntary basis. Minimal fees will be

charged for certification in order to defray expenses. Fees will be

based on time and material charges and will be levied whether or

not a specific course passes certification or is denied certification.

35. To ensure that quality topics are included and are properly defined

in contracts and outsource agreements, the ISQF will cooperate

with the American Bar Association, state bar associations, the

American Arbitration Society, and various law schools on the legal

status of software quality and on contractual issues.

Software Quality: The Key to Successful Software Engineering

613

36. ISQF members will be asked to subscribe to a code of ethics that

will be fully defined by the ISQF technical advisory board. The

code of ethics will include topics such as providing full and honest

information about quality to all who ask, avoiding conflicts of inter-

est, and basing recommendations about quality on solid empirical

information.

37. Because security and quality are closely related, the ISQF will also

include security attack prevention and also recovery from security

attacks topics as part of the overall mission. However, security is

highly specialized and requires additional skills outside the normal

training of quality assurance and test personnel.

38. Because of the serious global recession, the ISQF will attempt to

rapidly disseminate empirical data on the economic value of quality.

High quality for software has been proven to shorten development

schedules, lower development costs, improve customer support, and

reduce maintenance costs. But few managers and executives have

access to the data that supports such claims.

Software engineering and software quality need to be more closely

coupled than has been the norm in the past. Better prediction of quality,

better measurement of quality, more widespread usage of effective defect

prevention methods and defect removal methods are all congruent with

advancing software engineering to the status of a true engineering

discipline.

Software Defect Removal

Although both defect prevention and defect removal are important, it

is easier to study and measure defect removal. This is because counts

of defects found by means of inspections, static analysis, and testing

provide a quantitative basis for calculating defect removal efficiency

levels.

In spite of the fact that defect removal is theoretically easy to study,

the literature remains distressingly sparse. For example, testing has

an extensive literature with hundreds of books, thousands of journal

articles, many professional associations, and numerous conferences.

Yet hardly any of the testing literature contains empirical data on

the measured numbers of test cases created, actual counts of defects

found and removed, data on bad-fix injection rates, or other tangible

data points.

Several important topics have almost no citations at all in the test-

ing literature. For example, a study done at IBM found more errors in

test cases than in the software that was being tested. The same study

614

Chapter Nine

found about 35 percent duplicate or redundant test cases. Yet neither

test case errors nor redundant test cases are discussed in the software

testing literature.

Another gap in the literature of both testing and other forms of defect

removal concerns bad-fix injections. About 7 percent of attempts to

repair software defects contain new defects in the repairs themselves.

In fact, sometimes there are secondary and even tertiary bad fixes; that

is, three consecutive attempts to fix a bug may fail to fix the original

bug and introduce new bugs that were not there before!

Another problem with the software engineering literature and also

with software professional associations is a very narrow focus. Most

testing organizations tend to ignore static analysis and inspections.

As a result of this narrow focus, the synergies among various kinds of

defect removal operations are not well covered in the quality or software

engineering literature. For example, carrying out formal inspections

of requirements and design not only finds defects, but also raises the

defect removal efficiency levels of subsequent test stages by at least

5 percent by providing better and more complete source material for

constructing test cases.

Running automated static analysis prior to testing will find numerous

defects having to do with limits, boundary conditions, and structural

problems, and therefore speed up subsequent testing.

Formal inspections are best at finding very complicated and subtle

problems that require human intelligence and insight. Formal inspec-

tions are also best at finding errors of omission and errors of ambiguity.

Static analysis is best at finding structural and mechanical problems

such as boundary conditions, duplications, failures of error-handling,

and branches to incorrect routines. Static analysis can also find security

flaws.

Testing is best at finding problems that occur when software is execut-

ing, such as performance issues, usability issues, and security issues.

Individually, these three methods are useful but incomplete. When

used together, their synergies can elevate defect removal efficiency

levels and also reduce the effort and costs associated with defect removal

activities.

Table 9-22 provides an overview of 80 different forms of software defect

removal: static analysis, inspections, many kinds of testing, and some

special forms of defect removal associated with software litigation.

Although Table 9-22 shows overall values for defect removal efficiency,

the data really deals with removal efficiency against selected defect cat-

egories. For example, automated static analysis might find 87 percent

of structural code problems, but it can't find requirements omissions or

problems such as the Y2K problem that originate in requirements.

Software Quality: The Key to Successful Software Engineering

615

TABLE 9-22

Overview of 80 Varieties of Software Defect Removal Activities

DEFECT REMOVAL ACTIVITIES

Bad-Fix

Defect

Number of

Injection

Removal

Test Cases

Percent

Efficiency

per FP

Activities

STATIC ANALYSIS

Automated static analysis

0.00

87.00%

2.00%

Requirements inspections

0.00

85.00%

6.00%

External design inspection

0.00

85.00%

6.00%

Use-case inspection

0.00

85.00%

4.00%

Internal design inspection

0.00

85.00%

4.00%

New code inspections

0.00

85.00%

4.00%

Reuse certification inspection

0.00

84.00%

2.00%

Test case inspection

0.00

83.00%

5.00%

Automated document analysis

0.00

83.00%

6.00%

10.

Legacy code inspections

0.00

83.00%

6.00%

11.

Quality function deployment

0.00

82.00%

3.00%

12.

Document proof reading

0.00

82.00%

1.00%

13.

Nationalization inspection

0.00

81.00%

3.00%

14.

Architecture inspections

0.00

80.00%

3.00%

15.

Test plan inspection

0.00

80.00%

5.00%

16.

Test script inspection

0.00

78.00%

4.00%

17.

Test coverage analysis

0.00

77.00%

3.00%

18.

Document editing

0.00

77.00%

2.50%

19.

Pair programming review

0.00

75.00%

5.00%

20.

Six Sigma analysis

0.00

75.00%

3.00%

21.

Bug repair inspection

0.00

70.00%

3.00%

22.

Business plan inspections

0.00

70.00%

8.00%

23.

Root-cause analysis

0.00

65.00%

4.00%

24.

Governance reviews

0.00

63.00%

5.00%

25.

Refactoring of code

0.00

62.00%

5.00%

26.

Error-prone module analysis

0.00

60.00%

10.00%

27.

Independent audits

0.00

55.00%

10.00%

28.

Internal audits

0.00

52.00%

10.00%

29.

Scrum sessions (daily)

0.00

50.00%

2.00%

30.

Quality assurance review

0.00

45.00%

7.00%

31.

Sarbanes-Oxley review

0.00

45.00%

10.00%

32.

User story reviews

0.00

40.00%

10.00%

33.

Informal peer reviews

0.00

40.00%

10.00%

34.

Independent verification and validation

0.00

35.00%

12.00%

35.

Private desk checking

0.00

35.00%

7.00%

(Continued)

616

Chapter Nine

TABLE 9-22

Overview of 80 Varieties of Software Defect Removal Activities

(continued)

DEFECT REMOVAL ACTIVITIES

Bad-Fix

Defect

Number of

Injection

Removal

Test Cases

Percent

Efficiency

per FP

Activities

36.

Phase reviews

0.00

30.00%

15.00%

37.

Correctness proofs

0.00

27.00%

20.00%

Average

0.00

66.92%

6.09%

GENERAL TESTING

38.

PSP/TSP unit testing

3.50

52.00%

2.00%

39.

Subroutine testing

0.25

50.00%

2.00%

40.

XP testing

2.00

40.00%

3.00%

41.

Component testing

1.75

40.00%

3.00%

42.

System testing

1.50

40.00%

7.00%

43.

New function testing

2.50

35.00%

5.00%

44.

Regression testing

2.00

30.00%

7.00%

45.

Unit testing

3.00

25.00%

4.00%

Average

2.06

41.00%

4.13%

Sum

16.50

AUTOMATIC TESTING

46.

Virus/spyware test

3.50

80.00%

4.00%

47.

System test

2.00

40.00%

8.00%

48.

Regression test

2.00

37.00%

7.00%

49.

Unit test

0.05

35.00%

4.00%

50.

New function test

3.00

35.00%

5.00%

Average

2.11

45.40%

5.60%

Sum

10.55

SPECIALIZED TESTING

51.

Virus testing

0.70

98.00%

2.00%

52.

Spyware testing

1.00

98.00%

2.00%

53.

Security testing

0.40

90.00%

4.00%

54.

Limits/capacity testing

0.50

90.00%

5.00%

55.

Penetration testing

4.00

90.00%

4.00%

56.

Reusability testing

4.00

88.00%

0.25%

57.

Firewall testing

2.00

87.00%

3.00%

58.

Performance testing

0.50

80.00%

7.00%

59.

Nationalization testing

0.30

75.00%

10.00%

60.

Scalability testing

0.40

65.00%

6.00%

61.

Platform testing

0.20

55.00%

5.00%

62.

Clean-room testing

3.00

45.00%

7.00%

63.

Supply chain testing

0.30

35.00%

10.00%

Software Quality: The Key to Successful Software Engineering

617

TABLE 9-22

Overview of 80 Varieties of Software Defect Removal Activities

(continued)

DEFECT REMOVAL ACTIVITIES

Bad-Fix

Defect

Number of

Injection

Removal

Test Cases

Percent

Efficiency

per FP

Activities

64.

SOA orchestration

0.20

30.00%

5.00%

65.

Independent testing

0.20

25.00%

12.00%

Average

1.18

70.07%

5.48%

Sum

17.70

USER TESTING

66.

Usability testing

0.25

65.00%

4.00%

67.

Local nationalization testing

0.40

60.00%

3.00%

68.

Lab testing

1.25

45.00%

5.00%

69.

External beta testing

1.00

40.00%

7.00%

70.

Internal acceptance testing

0.30

30.00%

8.00%

71.

Outsource acceptance testing

0.05

30.00%

6.00%

72.

COTS acceptance testing

0.10

25.00%

8.00%

Average

0.48

42.14%

5.86%

Sum

3.35

LITIGATION ANALYSIS, TESTING

73.

Intellectual property testing

2.00

80.00%

1.00%

74.

Intellectual property review

0.00

80.00%

3.00%

75.

Breach of contract review

0.00

80.00%

2.00%

76.

Breach of contract testing

2.00

70.00%

2.00%

77.

Tax litigation review

0.00

80.00%

4.00%

78.

Tax litigation testing

1.00

70.00%

4.00%

79.

Fraud code review

0.00

80.00%

2.00%

80.

Embezzlement code review

0.00

80.00%

2.00%

Average

2.35

77.14%

2.71%

Sum

5.00

TOTAL TEST CASES

53.10

PER FUNCTION POINT

Table 9-22 is sorted in descending order of defect removal efficiency.

However, the results shown are maximum values. In real life, the range

of measured defect removal efficiency can be less than half of the nomi-

nal maximum values shown in Table 9-18.

Although Table 9-22 lists 80 different kinds of software defect removal

activities, that does not imply that all of them are used at the same time.

618

Chapter Nine

In fact, the U.S. average for defect removal activities includes only six

kinds of testing:

U.S. Average Sequence of Defect Removal

1. Unit test

2. New function test

3. Performance test

4. Regression test

5. System test

6. Acceptance or beta test

These six forms of testing, collectively, range between about 70 per-

cent and 85 percent in cumulative defect removal efficiency levels: far

below what is needed to achieve high levels of reliability and customer

satisfaction. The bottom line is that testing, by itself, is insufficient to

achieve professional levels of quality.

An optimum sequence of defect removal activities would include sev-

eral kinds of pretest inspections, static analysis, and at least eight forms

of testing:

Optimal Sequence of Software

Defect Removal

Pretest Defect Removal

1. Requirements inspection

2. Architecture inspection

3. Design inspection

4. Code inspection

5. Test case inspection

6. Automated static analysis

Testing Defect Removal

7. Subroutine test

8. Unit test

9. New function test

10. Security test

11. Performance test

12. Usability test

Software Quality: The Key to Successful Software Engineering

619

13. System test

14. Acceptance or beta test

This combination of synergistic forms of defect removal will achieve

cumulative defect removal efficiency levels in excess of 95 percent for

every software project and can achieve 99 percent for some projects.

When the most effective forms of defect removal are combined with

the most effective forms of defect prevention, then software engineering

should be able to achieve consistent levels of excellent quality. If this

combination can occur widely enough to become the norm, then software

engineering can be considered a true engineering discipline.

Software Quality Specialists

As noted earlier in the book, more than 115 types of occupations and

specialists are working in the software engineering domain. In most

knowledge-based occupations such as medicine and law, specialists have

extra training and sometimes extra skills that allow them to outperform

generalists in selected areas such as in neurosurgery or maritime law.

For software engineering, the literature is sparse and somewhat

ambiguous about the roles of specialists. Much of the literature on

software specialization is vaporous and merely expresses some kind

of bias. Many authors prefer a generalist model where individuals are

interchangeable and can handle requirements, design, development,

and testing as needed. Other authors prefer a specialist model where

key skills such as testing, quality assurance, and maintenance are per-

formed by trained specialists.

In this chapter, we will focus primarily on two basic questions:

1. Do specialized skills lower defect potentials and benefit defect

prevention?

2. Do specialized skills raise defect removal efficiency levels?

Not all of the 115 or so specialists will be discussed, but those whose

roles have a potential impact on quality levels will be discussed in terms

of defect prevention and defect removal.

The 20 specialist categories discussed in this chapter include, in

alphabetical order:

1. Architects

2. Business analysts

3. Database analysts

4. Data quality analysts

620

Chapter Nine

5. Enterprise architects

6. Estimating specialists

7. Function point specialists

8. Inspection moderators

9. Maintenance specialists

10. Requirements analysts

11. Performance specialists

12. Risk analysis specialists

13. Security specialists

14. Six Sigma specialists

15. Systems analysts

16. Software quality assurance (SQA)

17. Technical writers

18. Testers

19. Usability specialists

20. Web designers

For each of these 20 specialist groups, we will consider the volume of

potential defects they face, and whether they have a tangible impact on

defect prevention and defect removal activities.

Table 9-23 ranks the specialists in terms of assignment scope. This

metric represents the number of function points normally assigned to

one practitioner. Table 9-23 also shows the volume of defects that the

various occupations face as part of their jobs. Table 9-23 then shows the

approximate impacts of these specialized occupations on both defect

prevention and defect removal.

The top-ranked specialists face large numbers of potential defects

that are also capable of causing great damage to entire corporations

as well as to the software applications owned by those corporations.

Following are short discussions of each of the 20 kinds of specialists.

Risk Analysis Specialists

Assignment scope = 300,000 function points

Defect potentials = 7.00

Defect prevention impact = 75 percent

Defect removal impact = 25 percent

The large assignment scope of 300,000 function points indicates that

companies do not need many risk analysts, but the ones they employ need

to be very competent and understand both technical and financial risks.

Software Quality: The Key to Successful Software Engineering

621

TABLE 9-23

Software Specialization Impact on Software Quality

Assignment

Defect

Specialized Occupations

Scope

Potential

Prevention Removal

Risk analysis specialists

300,000

7.00

75.00%

25.00%

Enterprise architects

250,000

6.00

25.00%

20.00%

Six Sigma specialists

250,000

5.00

25.00%

30.00%

Database analysts

100,000

3.00

15.00%

10.00%

Architects

100,000

3.00

17.00%

12.00%

Usability specialists

100,000

1.00

10.00%

15.00%

Security specialists

50,000

7.00

70.00%

20.00%

Data quality analysts

50,000

5.00

12.00%

15.00%

Business analysts

50,000

3.50

25.00%

10.00%

10.

Estimating specialists

25,000

3.00

20.00%

25.00%

11.

Systems analysts

20,000

6.00

20.00%

12.

Performance specialists

20,000

1.00

10.00%

12.00%

13.

Quality assurance (QA)

10,000

5.50

15.00%

40.00%

14.

Web designers

10,000

4.00

15.00%

12.00%

15.

Requirements analysts

10,000

4.00

20.00%

15.00%

16.

Testers

10,000

3.00

15.00%

50.00%

17.

Function point specialists

5,000

4.00

10.00%

18.

Technical writers

2,000

1.00

10.00%

19.

Maintenance specialists

1,500

3.50

30.00%

20.00%

20.

Inspection moderators

1,000

4.50

27.00%

35.00%

Average

68,225

4.00

23.30%

20.30%

Given the enormous number of business failures as part of the recession,

it is obvious that risk analysis is not yet as sophisticated as it should

be; especially for dealing with financial risks.

Risk analysts face more than 100 percent of the potential defects

associated with any given software application. Not only do they have

to deal with technical risks and quality risks, but they also need to

address financial risks and legal risks that are outside the normal realm

of software quality and defect measurement.

A formal and careful risk analysis prior to committing funds to a

major software application can stop investments in excessively haz-

ardous projects before any serious money is spent. For questionable

projects, a formal and careful risk analysis prior to starting the project

can introduce better technologies prior to committing funds.

The keys to successful early risk analysis include the ability to do

early sizing, early cost estimating, early quality estimating, and knowl-

edge of dozens of potential risks derived from analysis of project failures

and successes.

622

Chapter Nine

The main role of risk analysts in terms of quality are to stop bad proj-

ects before they start, and to ensure that projects that do start utilize

state-of-the-art quality methods. Risk analysts also need to understand

the main reasons for software failures, and they should be familiar

with software litigation results for cases dealing with cancelled proj-

ects, breach of contract, theft of intellectual property, patent violations,

embezzlement via software, fraud, tax issues, Sarbanes-Oxley issues,

and other forms of litigation as well.

Enterprise Architects

Assignment scope = 250,000 function points

Defect potentials = 6.00

Defect prevention impact = 25 percent

Defect removal impact = 20 percent

Enterprise architects are key players whose job is to understand every

aspect of corporate business and to match business needs against entire

portfolios, which may contain more than 3000 separate applications and

total to more than 10 million function points. Not only internal software,

but also open-source applications and commercial software packages

such as Vista and SAP need to be part of the enterprise architect's

domain of knowledge.

The main role of enterprise analysts in terms of quality is to under-

stand the business value of quality to corporate operations, and to

ensure that top executives have similar understandings. Both enter-

prise architects and corporate executives need to push for excellence in

order to achieve speed of delivery.

Enterprise architects also play a role in corporate governance, by

ensuring that critical mistakes such as the Y2K problem are prevented

from occurring in the future.

Six Sigma Specialists

Assignment scope = 250,000 function points

Defect potentials = 5.00

Defect prevention impact = 25 percent

Defect removal impact = 30 percent

The large assignment scope for Six Sigma specialists indicates that

their work is corporate in nature rather than being limited to specific

applications. The main role of Six Sigma specialists in terms of quality

is to provide expert analysis of defect origins and defect causes, and

to suggest effective methods of continuous improvement to reduce the

major sources of software error.

Software Quality: The Key to Successful Software Engineering

623

Database Analysts

Assignment scope = 100,000 function points

Defect potentials = 7.00

Defect prevention impact = 75 percent

Defect removal impact = 25 percent

In today's world of 2009, major corporations and government agencies

own even more data than they own software. Customer data, employee

data, manufacturing data, total to millions of records scattered over

dozens of databases and repositories. This collection of enterprise data

is a valuable asset that needs to be accessed for key business decisions,

and also protected against hacking, theft, and unauthorized access.

There is a major quality weakness in 2009 in the area of data qual-

ity. There are no "data point" metrics that express the size of databases

and repositories. As a result, it is very hard to quantify data quality. In

fact, for all practical purposes, no literature at all on data quality uses

actual counts of errors.

As a result, database analysts and data quality analysts are severely

handicapped. They both play key roles in quality, but lack all of the tools

they need to do a good job.

The major role played by database analysts in terms of quality is to

ensure that databases and repositories are designed and organized in

optimal fashions, and that processes are in place to validate the accu-

racy of all data elements that are added to enterprise data storage.

Architects

Assignment scope = 100,000 function points

Defect potentials = 3.00

Defect prevention impact = 17 percent

Defect removal impact = 12 percent

Architects also have a large assignment scope, and need to be able to

envision and deal with the largest known applications of the modern

world, such as Vista, ERP packages like SAP and Oracle, air-traffic

control, defense applications, and major business applications.

Over the past 50 years, software applications have evolved from run-

ning by themselves to running under an operating system to running

as part of a multitier network and indeed to running in fragments scat-

tered over a cloud of hardware and software platforms that may be

thousands of miles apart.

As a result, the role of architects has become much more complex

in 2009 than it was even ten years ago. Architects need to understand

modern application practices such as service-oriented architecture (SOA),

624

Chapter Nine

cloud computing, and multitier hierarchies. In addition, architects need

to know the sources and certification methods of various kinds of reus-

able material that constitutes more than 50 percent of many large appli-

cations circa 2009.

The main role that architects play in terms of quality is to under-

stand the implications of software defects in complex, multitier, highly

distributed environments where software components may come from

dozens of sources.

Usability Specialists

Assignment scope = 100,000 function points

Defect potentials = 1.00

Defect prevention impact = 10 percent

Defect removal impact = 15 percent

The word "usability" defines what customers need to do to operate

software successfully. It also includes what software customers need to

do when the software misbehaves.

Usability specialists often have a background in cognitive psychol-

ogy and are well versed in various kinds of software interfaces: key-

board commands, buttons, touch screens, voice recognitions, and even

more.

The main role of usability specialists in terms of quality is to ensure

that software applications have interfaces and control sequences that

are as natural and intuitive as possible. Usability studies are normally

carried out with volunteer clients who use the software while it is under

development.

Large computer and software companies such as IBM and Microsoft

have usability laboratories where customers can be observed while they

are using prerelease versions of software and hardware products. These

labs monitor keystrokes, screen touches, voice commands, and other

interface methods. Usability specialists also debrief customers after

every session to find out what customers like and dislike about inter-

faces and command sequences.

Security Specialists

Assignment scope = 50,000 function points

Defect potentials = 7.00

Defect prevention impact = 70 percent

Defect removal impact = 20 percent

There is an increasing need for more software security specialists,

and also for better training of software security specialists both at the

university level and after employment, as security threats evolve and

change.

Software Quality: The Key to Successful Software Engineering

625

As of 2009, due in part to the recession, attacks and data theft are

increasing rapidly in numbers and sophistication. Hacking is rapidly

moving from the domain of individual amateurs to organized crime and

even to hostile foreign governments.

Software applications are not entirely safe behind firewalls, even with

active antivirus and antispyware applications installed. There is an

urgent need to raise the immunity levels of software applications by

using techniques such as Google's Caja, the E programming language,

and changing permission schemas.

Security and quality are not identical, but they are very close together,

and both prevention and removal methods are congruent and synergistic.

The closeness of quality and security is indicated by the fact that major

avenues of attack on software applications are error-handling routines.

The main role of security specialists in terms of quality is to stay cur-

rent on the latest kinds of threats, and to ensure that both new applica-

tions and legacy applications have state-of-the-art security defenses.

Data Quality Analysts

Assignment scope = 50,000 function points

Defect potentials = 5.00

Defect prevention impact = 12 percent

Defect removal impact = 15 percent

As of 2009, data quality analysts are few in number and woefully

under-equipped in terms of tools and technology. There is no effective

size metric for data volumes (i.e., a data point metric similar to func-

tion points). As a result, no empirical data is available on topics such as

defect potentials for databases and repositories, effective defect removal

methods, defect estimation, or defect measurement.

The theoretical role of data quality analysts is to prevent data errors

from occurring, and to recommend effective removal methods. However,

given the very large number of apparent data errors in public records,

credit scores, accounting, taxes, and so on, it is obvious that data quality

lags behind even software quality. In fact, data and software appear to

lag behind every other engineering and technical domain in terms of

quality control.

Business Analysts

Assignment scope = 50,000 function points.

Defect potentials = 3.5

Defect prevention impact = 25 percent

Defect removal impact = 10 percent

In many information technology organizations, business analysts

are the primary connection between the software community and the

626

Chapter Nine

community of software users. Business analysts are required to be well

versed in both business needs and in software engineering technologies.

The main role that business analysts should play in terms of qual-

ity is to convince both the business and technical communities that

high levels of software quality will shorten development schedules and

lower development costs. Too often, the business clients set arbitrary

schedules and then attempt to force the software community to try

and meet those schedules by skimping on inspections and truncating

testing.

Good business analysts should have data available from sources

such as the International Software Benchmarking Standards Group

(ISBSG) that shows the relationships between quality, schedules, and

costs. Business analysts should also understand the value of methods

such as joint application design (JAD), quality function deployment

(QFD), and requirements inspections.

Estimating Specialists

Assignment scope = 25,000 function points

Defect potentials = 3.00

Defect prevention impact = 20

Defect removal impact = 25 percent

It is a sign of sophistication when a company employs software esti-

mating specialists. Usually these specialists work in project offices or

special staff groups that support line managers, who often are not well

trained in estimation.

Estimation specialists should have access to and be familiar with the

major software estimating tools that can predict quality, schedules, and

costs. Examples of such tools include COCOMO, KnowledgePlan, Price-

S, SoftCost, SEER, Slim, and a number of others. In fact, a number of

companies utilize several of these tools for the same applications and

look for convergence.

The main role of an estimating specialist in terms of quality is to pre-

dict quality early. Ideally, quality will be predicted before substantial

funds are spent. Not only that, but multiple estimates may be needed

to show the effects of variations in development practices such as Agile

development, Team Software Process (TSP), Rational Unified Process

(RUP), formal inspections, static analysis, and various kinds of testing.

Systems Analysts

Assignment scope = 20,000 function points

Defect potentials = 6.00

Defect prevention impact = 25 percent

Defect removal impact = 20 percent

Software Quality: The Key to Successful Software Engineering

627

Software systems analysts are one of the interface points between

the software engineering or programming community and end users

of software. Systems analysts and business analysts perform similar

roles, but the title "systems analyst" occurs more often for embedded

and systems software, which are developed for technical purposes rather

than to satisfy local business needs.

The main role of systems analysts in terms of quality is to understand

that all forms of representation for software (user stories, use-cases,

formal specification languages, flowcharts, Nassi-Schneiderman charts,

etc.) may contain errors. These errors may not be amenable to discovery

via testing, which would be too late in any case. Therefore, a key role

of systems analysts is to participate in formal inspections of require-

ments, internal design documents, and external design documents. If

the application is being constructed using test-driven development,

systems analysts will participate in test case design and construction.

Systems analysts will also participate in activities such as joint applica-

tion design (JAD) and quality function deployment (QFD).

Performance Specialists

Assignment scope = 20,000 function points

Defect potentials = 1.00

Defect prevention impact = 10 percent

Defect removal impact = 12 percent

The occupation of "performance specialist" is usually found only in

very large companies that build very large and complex software appli-

cations; that is, IBM, Raytheon, Lockheed, Boeing, SAP, Oracle, Unisys,

Google, Motorola, and the like.

The general role of performance specialists is to understand every

potential bottleneck in hardware and software platforms that might

slow down performance.

Sluggish or poor performance is viewed as a quality issue, so the role

of performance specialists is to assist software engineers and software

designers in building software that will achieve good performance levels.

In today's world of 2009, with multitier architectures as the dominant

model and with multiple programming languages as the dominant form

of development, the work of performance specialists has become much

more difficult than it was only ten years ago. Looking ahead, the work

of performance specialists will probably become even more difficult ten

years from now.

Software Quality Assurance

Assignment scope = 10,000 function points

Defect potentials = 5.50

628

Chapter Nine

Defect prevention impact = 15 percent

Defect removal impact = 40 percent

The general title of "quality assurance" is much older than software

and has been used by engineering companies for about 100 years.

Within the software world, the title of "software quality assurance" has

existed for more than 50 years. Today in 2009, software quality special-

ists average between 2 percent and 6 percent of total software employ-

ment in most large companies. The hi-tech companies such as IBM and

Lockheed employ more software quality assurance personnel than do

lo-tech companies such as insurance and general manufacturing.

A small percentage of software quality assurance personnel have been

certified by one or more of the software quality assurance associations.

The roles of software quality assurance vary from company to com-

pany, but they usually include these core activities: ensuring that

relevant international and corporate quality standards are used and

adhered to, measuring defect removal efficiency, measuring cyclomatic

and essential complexity, teaching classes in quality, and estimating or

predicting quality levels.

A few very sophisticated companies such as IBM have quality assurance

research positions, where the personnel can develop new and improved

quality control methods. Some of the results of these QA research groups

include formal inspections, function point metrics, automated con-

figuration control tools, clean-room development, and joint application

design (JAD).

Given the fact that quality assurance positions have existed for more

than 50 years and that SQA personnel number in the thousands, why is

software quality in 2009 not much better than it was in 1979?

One reason is that in many companies, quality assurance plays an advi-

sory role, but their advice does not have to be followed. In some companies

such as IBM, formal QA approval is necessary prior to delivering a prod-

uct to customers. If the QA team feels that quality methods were deficient,

then delivery will not occur. This is a very serious business issue.

In fact, very few projects are stopped from being delivered. But the

theoretical power to stop delivery if quality is inadequate is a strong

incentive to pursue state-of-the-art quality control methods.

Therefore, a major role of software quality assurance is to ensure that

state-of-the-art measures, methods, and tools are used for quality control,

with the knowledge that poor quality can lead to delays in delivery.

Web Designers

Assignment scope = 10,000 function points

Defect potentials =4.00

Defect prevention impact = 15 percent

Defect removal impact = 12 percent

Software Quality: The Key to Successful Software Engineering

629

Software web design is a fairly new occupation, but one that is grow-

ing faster than almost any other. The fast growth in web design is due

to software companies and other businesses migrating to the Web as

their main channel for marketing and information.

The role of web design in terms of software quality is still evolving

and will continue to do so as web sites move toward virtual reality and

3-D representation. As of 2009, some of the roles are to ensure that all

interfaces are fairly intuitive, and that all links and connections actu-

ally work.

Unfortunately, due to the exponential increase in hacking, data theft,

and denial of service attacks, web quality and web security are now

overlapping. Effective quality for web sites must include effective secu-

rity, and many web design specialists do not yet know enough about

security to be fully effective.

Requirements Analysts

Assignment scope = 10,000 function points

Defect potentials = 4.00

Defect prevention impact = 20 percent

Defect removal impact = 15 percent

The work of requirements analysts overlaps the work of systems ana-

lysts and business analysts. However, those who specialize in require-

ments analysis also know topics such as quality function deployment

(QFD), joint application design (JAD), requirements inspections, and at

least half a dozen requirements representation methods such as use-

cases, user stories, and several others.

Because the majority of "new" applications being developed circa

2009 are really nothing more than replacements for legacy applications,

requirements analysts should also be conversant with data mining. In

fact, the best place to start the requirements analysis for a replacement

application is to mine the older legacy application for business rules

and algorithms that are hidden in the code. Data mining is necessary

because usually the original specifications are either missing completely

or long out of date.

The role of requirements analysis in terms of quality is to ensure that

toxic requirements defects are removed before they enter the design or

find their way into source code. The frequently cited Y2K problem is an

example of a toxic requirement.

Because the measured rate at which requirements grow after the

requirements phase is between 1 percent and 3 percent per calendar

month, another quality role is to ensure that prototypes, embedded

users, JAD, or other methods are used that minimize unplanned changes

in requirements.

630

Chapter Nine

Requirements analysts should also be members of or support change

control boards that review and approve requirements changes.

Testers

Assignment scope = 10,000 function points

Defect potentials = 3.00

Defect prevention impact = 15 percent

Defect removal impact = 50 percent

Software testing is one of the specialized occupations where there is

some empirical evidence that specialists can outperform generalists.

Not every kind of testing is performed by test specialists. For example,

unit testing is almost always carried out by the developers. However, the

forms of testing that integrate the work of entire teams of developers need

testing specialists for large applications. Such forms of testing include new

function testing, regression testing, and system testing among others.

The role of test specialists in terms of quality is to ensure that test

coverage approaches 99 percent, that test cases themselves do not con-

tain errors, and that test libraries are effectively maintained and purged

of duplicate test cases that add cost but not value.

Although not a current requirement for test case personnel, it would

be useful if test specialists also measured defect removal efficiency

levels and attempted to raise average testing efficiency from today's

average of around 35 percent up to at least 75 percent.

Test specialists should also be pioneers in new testing technologies

such as automated testing. Running static analysis tools prior to testing

could also be added with some value accruing.

Function Point Specialists

Assignment scope = 5000 function points

Defect potentials = 4.00

Defect prevention impact = 10 percent

Defect removal impact = 10 percent

Because function point metrics are the best choice for normalizing

quality data and creating effective benchmarks of quality information,

function point specialists are rapidly becoming part of successful quality

improvement programs.

However, traditional manual counts of function points are too slow and

too costly to be used as standard quality control methods. The average

counting speed by a certified function point specialist is only about 400

function points per day. This explains why function point analysis is almost

never used for applications larger than about 10,000 function points.

However, new methods have been developed that allow function points

to be calculated at least six months earlier than previously possible.

Software Quality: The Key to Successful Software Engineering

631

These same methods operate at speeds in excess of 10,000 function

points per minute. This makes it possible to use function points for early

quality estimation, as well as for measuring quality and producing qual-

ity benchmarks.

The role of function point specialists in terms of quality is to create

useful size information fast enough and early enough that it can serve

for risk analysis, quality prediction, and quality measures.

Technical Writers

Assignment scope = 2000 function points

Defect potentials = 1.00

Defect prevention impact = 10 percent

Defect removal impact = 10 percent

Good writing is a fairly rare skill in the human species. As a result,

good software technical manuals are also fairly rare. Many kinds of

quality problems are common in software manuals, including ambigu-

ity, missing information, poor organization structures, and incurred

data.

There are automated tools available that can analyze the readabil-

ity of text, such as the FOG index and the Fleisch index. But these

are seldom used for software manuals. Editing is useful, as are formal

inspections of user documentation.

Another approach, which was actually used by IBM, was to select

examples of user documents with the highest user evaluation scores

and use them as samples.

The role of technical writers in terms of software quality is make sure

that factual data is complete and correct, and that manuals are easy to

read and understand.

Maintenance Specialists

Assignment scope = 1,500 function points

Defect potentials = 3.5

Defect prevention impact = 30 percent

Defect removal impact = 20 percent

Maintenance programming in terms of both enhancing legacy soft-

ware and repairing bugs has been the dominant activity for the software

industry for more than 20 years. This should not be a surprise, because

for every industry older than 50 years, more people are working on

repairs of existing products than are working on new development.

As the recession deepens and lengthens, the U.S. automobile industry

is providing a very painful example of this fact: automotive manufac-

turing is shrinking faster than the polar ice fields, while automotive

repairs are increasing.

632

Chapter Nine

Aging legacy applications have a number of quality problems, includ-

ing poor structure, dead code, error-prone modules, and poor or missing

comments.

As the recession continues, many companies are considering ways of

stretching out the useful lives of legacy applications. In fact, renovation

and data mining of legacy software are both growing, even in the face

of the recession.

The main role of maintenance programmers in terms of quality is to

strengthen the quality of legacy software. The methods available to do this

include full renovation using automated tools; complexity measurement

and reduction; dead code removal; improving comments; identification

and surgical removal of error-prone modules; converting code from orphan

languages such as MUMPS or Coral into modern languages such as Java

or Ruby, and improving the security flaws of legacy applications.

Inspection Moderators

Assignment scope = 1000 function points

Defect potentials = 4.5

Defect prevention impact = 25 percent

Defect removal impact = 35 percent

Software inspections have a number of standard roles, including the

moderator, the recorder, the inspectors, and the person whose work is

being inspected. The moderator is the key to a successful inspection.

The tasks of the moderator include keeping the discussions on track,

minimizing disruptive events, and ensuring that the inspection session

starts and ends on time.

The main role of inspection moderators in terms of quality include

ensuring the materials to be inspected are delivered in time for pre-inspec-

tion review, making sure that the inspectors and other personnel show up

on time, keeping the inspection team focused on defect identification (as

opposed to repairs), and intervening in potential arguments or disputes.

The inspection recorder plays a key role too, because the recorder

keeps notes and fills out the defect reports of all bugs or defects that

the inspection identified. This is not as easy as it sounds, because there

may be some debate as to whether a particular issue is a defect or a

possible enhancement.

Summary and Conclusions on

Software Specialization

The overall topic of software specialization is not well covered in the

software engineering literature. Considering that there are more than

115 specialists associated with software, this fact is mildly surprising.

Software Quality: The Key to Successful Software Engineering

633

When it comes to software quality, some forms of specialization do add

value, and this can be shown by analysis of both defect prevention and

defect removal. The key specialists who add the most value to software

quality include risk analysts, Six Sigma specialists, quality assurance

personnel, inspection moderators, maintenance specialists, and profes-

sional test personnel.

However, many other specialists such as business analysts, enterprise

architects, architects, estimating specialists, and function point special-

ists also add value.

The Economic Value of

Software Quality

The economic value of software quality is not well covered in the soft-

ware engineering literature. There are several reasons for this prob-

lem. One major reason is the rather poor measurement practices of

the software engineering domain. Many cost factors such as unpaid

overtime are routinely ignored. In addition, there are frequent gaps and

omissions in software cost data, such as omission of project manage-

ment costs and the omission of part-time specialists such as technical

writers. In fact, only the effort and costs of coding have fairly good data

available. Everything else, such as requirements, design, inspections,

testing, quality assurance, project offices, and documentation tend to be

underreported or ignored.

As pointed out in other sections, the software engineering literature

depends too much on vague and unpredictable definitions of quality

such as "conformance to requirements" or adhering to a collection of

ambiguous terms ending in ility. These unscientific definitions slow

down research on software quality economics.

Two other measurement problems also affect quality economic stud-

ies. These problems are the usage of two invalid economic measures:

cost per defect and lines of code. As discussed earlier in this chapter,

cost per defect penalizes quality and achieves its lowest costs for the

buggiest applications. Lines of code penalizes high-level programming

languages and disguises the value of high-level languages for studying

either quality or productivity.

In this section, the economic value of quality will be shown by means

of eight case studies. Because the value of software quality correlates

to application size, four discrete size ranges will be used: 100 function

points, 1000 function points, 10,000 function points, and 100,000 func-

tion points.

Applications in the 100function point range are usually small fea-

tures for larger systems rather than stand-alone applications. However,

this is a very common size range for prototypes of larger applications.

634

Chapter Nine

There may be small stand-alone applications in this range such as cur-

rency converters or applets for devices such as iPhones.

Applications in the 1000function point range are normally stand-

alone software applications such as fuel-injection controls, atomic watch

controls, compilers for languages such as Java, and software estimating

tools in the class of COCOMO.

Applications in the 10,000function point range are normally impor-

tant systems that control aspects of business, such as insurance claims

processing, motor vehicle registration, or child-support applications.

Applications in the 100,000function point range are normally major

systems in the class of large international telephone-switching systems,

operating systems in the class of Vista and IBM's MVS, or suites of

linked applications such as Microsoft Office. Some enterprise resource

planning (ERP) applications are in this size range, and may even top

300,000 function points. Also, large defense applications such as the

World Wide Military Command and Control System (WWMCCS) also

top 100,000 function points.

To reduce the number of variables, all eight of the examples are

assumed to be coded in the C programming language and have a ratio

of about 125 code statements per function point.

Because all eight of the applications are assumed to be written in the

same programming language, productivity and quality can be expressed

using the lines of code metric without distortion. The lines of code metric

is invalid for comparisons between unlike programming languages.

For each size plateau, two cases will be illustrated: average quality

and excellent quality. The average quality case assumes waterfall devel-

opment, CMMI level 1, normal testing, and nothing special in terms of

defect prevention.

The excellent quality case assumes at least CMMI level 3, formal

inspections, static analysis, rigorous development such as the Team

Software Process (TSP), and the use of prototypes and joint application

design (JAD) for requirements gathering.

(Some readers may wonder why Agile development is not used for the

case studies. The main reason is that there are no Agile applications

in the 10,000 and 100,000function point ranges. The Agile method

is used primarily for smaller applications in the 1000function point

range.)

Although all of the case studies are derived from actual applications,

to make the calculations consistent, a number of simplifying assump-

tions are used. These assumptions include the following key points:

All cost data is based on a fully burdened cost of $10,000 per staff

■

month. A staff month is considered to have 132 working hours. This

is equivalent to $75.75 per hour.

Software Quality: The Key to Successful Software Engineering

635

Work months are assumed to consist of 22 days, and each day consists

■

of 8 hours. Unpaid overtime is not shown nor is paid overtime.

Defect potentials are the total numbers of defects found in five categories:

■

requirements defects, design defects, code defects, documentation

defects, and bad fixes, or secondary defects accidentally included in

defect repairs.

Creeping requirements are not shown. The sizes of the six case studies

■

reflect application size as delivered to clients.

Software reuse is not shown. All six cases can be assumed to reuse

■

about 15 percent of legacy code. But to simplify assumptions, the

defect potentials in the reused code and other materials are assumed

to equal defect potentials of new material. Larger volumes of certified

reusable material would significantly improve both the quality and

productivity of all six case studies, and especially so for the larger

systems above 10,000 function points.

Bad-fix injections are not shown. About 7 percent of attempts to repair

■

bugs accidentally introduce a new bug, but the mathematics of bad-fix

injection is complicated since the bugs are not found in the activity

where they originate.

The first year of maintenance is assumed to find 100 percent of latent

■

bugs delivered with the software. In reality, many bugs fester for

years, but the examples only show the first year of maintenance.

The maintenance data only shows defect repairs. Enhancements

■

and adding new features are excluded in order to highlight quality

value.

Maintenance defect repair rates are based on average values of

■

12 bugs fixed per staff month. In real life, ranges can run from fewer

than 4 to more than 20 bugs repaired each month.

Application staff size is based on U.S. average assignment scopes for

■

all classes of software personnel, which is approximately 150 function

points. That is, if you divide application size in function points by the

total staffing complement of technical workers plus project manag-

ers, the result will be close to 150 function points. This value includes

software engineers and also specialists such as quality assurance,

technical writers, and test personnel.

Schedules for the "average" cases are based on raising function point

■

size to the 0.4 power. This rule of thumb provides a fairly good approx-

imation of schedules from start of requirements to delivery in terms

of calendar months.

Schedules for the "excellent" cases are based on raising function point

■

size to the 0.36 power. This exponent works well with object-oriented

636

Chapter Nine

software and rigorous development practices. It is also a good fit for

Agile projects, except that the lack of data above 10,000 function

points for Agile makes the upper level uncertain.

Data in this section is expressed using the function point metric defined

■

by the International Function Point Users' Group (IFPUG) version 4.2

of the counting rules. Other functional metrics such as COSMIC func-

tion points or engineering function points or Mark II function points

would yield different results from the values shown here.

Data on source code in this section is expressed using counts of logical

■

statements rather than counts of physical lines. There can be as much

as 500 percent difference in apparent code size based on whether

counts are physical or logical lines. The counting rules are those of

the author's book Applied Software Measurement.

The reason for these simplifying assumptions is to minimize extra-

neous variations among the eight case studies, so that the data is pre-

sented in a consistent fashion for each. Because all of these assumptions

vary in real life, readers are urged to try out alternative values based on

their own local data or on benchmarks from organizations such as the

International Software Benchmarking Standards Group (ISBSG).

The simplifying assumptions serve to make the results consistent,

but each of the assumptions can change in either direction by fairly

large amounts.

The Value of Quality for Very Small

Applications of 100 Function Points

Small applications in this range usually have low defect potentials and

fairly high defect removal efficiency levels. This is because such small

applications can be developed by a single person, so there are no inter-

face problems between features developed by different individuals or

different teams. Table 9-24 shows quality value for very small applica-

tions of 100 function points.

Note that cost per defect goes up as quality improves; not down. This

phenomenon distorts economic analysis. As will be shown in the later

examples, cost per defect tends to decline as applications grow larger. This

is because large applications have many more defects than small ones.

Prototypes or applications in this size range are very sensitive to

individual skill levels, primarily because one person does almost all of

the work. The measured variations for this size range are about 5 to 1 in

how much code gets written for a given specification and about 6 to 1 in

terms of productivity and quality levels. Therefore, average values need

to be used with caution. Averages are particularly unreliable for applica-

tions where one person performs the bulk of the entire application.

Software Quality: The Key to Successful Software Engineering

637

TABLE 9-24

Quality Value for 100 Function Point Applications

(Note: 100 function points = 12,500 C statements)

Average

Excellent

Quality

Difference

Defects per function point

3.50

1.50

2.00

Defect potential

350

150

200.00

Defect removal efficiency

94.00%

99.00%

5.00%

Defects removed

329

149

181

Defects delivered

Cost per defect prerelease

$379

$455

$76

Cost per defect postrelease

$1,061

$1,288

$227

Development schedule (calendar months)

Development staffing

Development effort (staff months)

Development costs

$63,096

$52,481

$10,615

Function points per staff month

15.85

19.05

3.21

LOC per staff month

1,981

2,382

401

Maintenance staff

Maintenance effort (staff months)

1.63

Maintenance costs (year 1)

$17,500

$1,250

$16,250

TOTAL EFFORT

TOTAL COST

$80,596

$53,731

$26,865

TOTAL COST PER STAFF MEMBER

$40,298

$26,865

$13,432

TOTAL COST PER FUNCTION POINT

$805.96

$537.31

$269

TOTAL COST PER LOC

$6.45

$4.30

$2.15

AVERAGE COST PER DEFECT

$720

$871

$152

The Value of Quality for Small Applications

of 1000 Function Points

For small applications of 1000 function points, quality starts to become

very important, but it is also somewhat easier to achieve than it is for

large systems. At this size range, teams are small and methods such

as Agile development tend to be dominant, other than for systems and

embedded software where more rigorous methods such as the Team

Software Process (TSP) and the Rational Unified Process (RUP) are

more common. Table 9-25 shows the value of quality for small applica-

tions in the 1000function point range.

The bulk of the savings for the Excellent Quality column shown in

Table 9-25 would come from shorter testing schedules due to the use of

requirements, design, and code inspections. Other changes that added

value include the use of Team Software Process (TSP), static analysis

prior to testing, and the achievement of higher CMMI levels.

638

Chapter Nine

TABLE 9-25

Quality Value for 1000Function Point Applications

(Note: 1000 function points = 125,000 C statements)

Average

Excellent

Quality

Difference

Defects per function point

4.50

2.50

2.00

Defect potential

4,500

2,500

2,000

Defect removal efficiency

93.00%

97.00%

4.00%

Defects removed

4,185

2,425

1,760

Defects delivered

315

240.00

Cost per defect prerelease

$341

$417

$76

Cost per defect postrelease

$909

$1,136

$227

Development schedule (calendar months)

Development staffing

0.00

Development effort (staff months)

106

Development costs

$1,056,595

$801,510

$255,086

Function points per staff month

9.46

12.48

3.01

LOC per staff month

1,183

1,560

376.51

Maintenance staff

Maintenance effort (staff months)

20.00

Maintenance costs (year 1)

$262,500

$62,500

$200,000

TOTAL EFFORT

132

TOTAL COST

$1,319,095

$864,010

$455,086

TOTAL COST PER STAFF MEMBER

$158,291

$103,681

$54,610

TOTAL COST PER FUNCTION POINT

$1,319.10

$864.01

$455

TOTAL COST PER LOC

$10.55

$6.91

$3.64

AVERAGE COST PER DEFECT

$625

$776

$152

In the size range of 1000 function points, numerous methods are fairly

effective. For example, both Agile development and extreme program-

ming report good results in this size range as do the Rational Unified

Process (RUP) and the Team Software Process (TSP).

The Value of Quality for Large Applications

of 10,000 Function Points

When software applications reach 10,000 function points, they are

very significant systems that require close attention to quality control,

change control, and corporate governance. In fact, without careful qual-

ity and change control, the odds of failure or cancellation top 35 percent

for this size range.

Note that as application size increases, defect potentials increase rap-

idly and defect removal efficiency levels decline, even with sophisticated

quality control steps in place. This is due to the exponential increase in

Software Quality: The Key to Successful Software Engineering

639

the volume of paperwork for requirements and design, which often leads

to partial inspections rather than 100 percent inspections. For large

systems, test coverage declines and the number of test cases mounts rap-

idly, but cannot usually keep pace with complexity. Table 9-26 shows the

increasing value of quality as size goes up to 10,000 function points.

Cost savings from better quality increase as application sizes increase.

The general rule is that the larger the software application, the more valu-

able quality becomes. The same principle is true for change control, because

the volume of creeping requirements goes up with application size.

For large systems, the available methods that demonstrate improve-

ment begin to decline. For example, Agile methods are difficult to apply,

and when they are, the results are not always good. For large systems,

rigorous methods such as the Rational Unified Process (RUP) or Team

Software Process (TSP) yield the best results and have the greatest

amount of empirical data.

TABLE 9-26

Quality Value for 10,000Function Point Applications

(Note: 10,000 function points = 1,250,000 C statements)

Average

Excellent

Quality

Difference

Defects per function point

6.00

3.50

2.50

Defect potential

60,000

35,000

25,000

Defect removal efficiency

84.00%

96.00%

12.00%

Defects removed

50,400

33,600

16,800

Defects delivered

9,600

1,400

8,200

Cost per defect prerelease

$341

$417

$76

Cost per defect postrelease

$833

$1,061

$227

Development schedule (calendar months)

Development staffing

0.00

Development effort (staff months)

2,654

1,836

818

Development costs

$26,540,478

$18,361,525

$8,178,953

Function points per staff month

3.77

5.45

1.68

LOC per staff month

471

681

209.79

Maintenance staff

Maintenance effort (staff months)

800

117

683.33

Maintenance costs (year 1)

$8,000,000

$1,166,667

$6,833,333

TOTAL EFFORT (STAFF MONTHS)

3,454

1,953

1501

TOTAL COST

$34,540,478

$19,528,191 $15,012,287

TOTAL COST PER STAFF MEMBER

$414,486

$234,338

$180,147

TOTAL COST PER FUNCTION POINT

$3,454.05

$1,952.82

$1,501.23

TOTAL COST PER LOC

$27.63

$15.62

$12.01

AVERAGE COST PER DEFECT

$587

$739

$152

640

Chapter Nine

The Value of Quality for Very Large

Applications of 100,000 Function Points

Software applications in the 100,000function point range are among

the most costly endeavors of modern business. These large systems

are also hazardous, because many of them fail, and almost all of them

exceed their budgets and planned schedules.

Without excellence in software quality control, the odds of complet-

ing a software application of 100,000 function points are only about

20 percent. The odds of finishing it on time and within budget hover

close to 0 percent.

Even with excellent quality control and excellent change control, mas-

sive applications in the 100,000function point range are expensive

and troublesome. Table 9-27 illustrates the two cases for such massive

applications.

TABLE 9-27

Quality Value for 100,000Function Point Applications

(Note: 100,000 function points = 12,500,000 C statements)

Average

Excellent

Quality

Difference

Defects per function point

7.00

4.00

3.00

Defect potential

700,000

400,000

300,000

Defect removal efficiency

81.00%

94.00%

13.00%

Defects removed

567,000

376,000

191,000

Defects delivered

133,000

24,000

109,000

Cost per defect prerelease

$303

$379

$76

Cost per defect postrelease

$758

$985

$227

Development schedule (calendar months)

100

Development staffing

667

0.00

Development effort (staff months)

66,667

42,064

24,603

Development costs

$666,666,667 $420,638,230 $246,028,437

Function points per staff month

1.50

2.38

0.88

LOC per staff month

188

297

109.67

Maintenance staff

167

Maintenance effort (staff months)

11,083

2,000

9,083

Maintenance costs (year 1)

$110,833,333

$20,000,000

$90,833,333

TOTAL EFFORT

77,750

44,064

33686

TOTAL COST

$777,500,000 $440,638,230 $336,861,770

TOTAL COST PER STAFF MEMBER

$933,000

$528,766

$404,234

TOTAL COST PER FUNCTION POINT

$7,775.00

$4,406.38

$3,368.62

TOTAL COST PER LOC

$62.20

$352.51

$290.31

AVERAGE COST PER DEFECT

$530

$682

$152

Software Quality: The Key to Successful Software Engineering

641

There are several reasons why defect potentials are so high for mas-

sive applications and why defect removal efficiency levels are reduced.

The first reason is that for such massive applications, requirements

changes will be so numerous that they exceed most companies' ability

to control them well.

The second reason is that paperwork volumes tend to rise with applica-

tion size, and this slows down activities such as inspections of requirements

and design. As a result, massive applications tend to use partial inspec-

tions rather than 100 percent inspections of major deliverable items.

A third reason, which was worked out mathematically at IBM in the

1970s, is that the number of test cases needed to achieve 90 percent

coverage of code rise exponentially with size. In fact, the number of test

cases required to fully test a massive system of 100,000 function points

approaches infinity. As a result, testing efficiency declines with size,

even though static analysis and inspections stay about the same.

A useful rule of thumb for predicting overall number of test cases is to

raise application size in function points to the 1.2 power. As can be seen,

test case volumes rise very rapidly, and most companies cannot keep

pace, so test coverage declines. Automated static analysis is still effec-

tive. Inspections are also effective, but for 100,000 function points, partial

inspections of key deliverables are the norm rather than 100 percent

inspections. This is because paperwork volumes also rise exponentially

with size.

Return on Investment in Software Quality

As already mentioned, the value of software quality goes up as appli-

cation size goes up. Table 9-28 calculates the approximate return on

investment for the "excellent" case studies of 100 function points, 1000

function points, 10,000 function points, and 100,000 function points.

Here too the assumptions are simplified to make calculations easy

and understandable. The basic assumption is that every software team

member needs five days of training to get up to speed in software inspec-

tions and the Team Software Process (TSP). These training days are

then multiplied by average hourly costs of $75.75 per employee.

These training expenses are then divided into the total savings figure

that includes both development and maintenance savings due to high

quality. The final result is the approximate ROI based on dividing value

by training expenses. Table 9-28 illustrates the ROI calculations.

The ROI figure reflects the total savings divided by the total train-

ing expenses needed to bring team members up to speed in quality

technologies.

In real life, these simple assumptions would vary widely, and other

factors might also be considered. Even so, high levels of software quality

642

Chapter Nine

TABLE 9-28

Return on Investment in Software Quality

Function point size

100

1,000

10,000

100,000

Education hours

560

5,360

53,360

Education costs

$6,060

$42,420

$406,020

$4,042,020

Savings from high quality

$26,865

$455,086

$15,012,287

$336,861,770

Return on investment (ROI)

$4.43

$10.73

$36.97

$83.34

have a very solid return on investment due to the reduction in develop-

ment schedules, development costs, and maintenance costs.

There may be many other topics where software engineers and man-

agers need training, and there may be other cost elements such as the

costs of ascending to the higher levels of the capability maturity model.

While the savings from high quality are frequently observed, the exact

ROI will vary based on the way training and process improvement work

is handled under local accounting rules.

If the reduced risks of cancelled projects or major overruns were

included in the ROI calculations, the value would be even higher.

Other technologies such as high volumes of certified reusable mate-

rial would also have a beneficial impact on both quality and productiv-

ity. However, as this book is written in 2009, only limited sources are

available for certified reusable materials. Uncertified reuse is hazardous

and may even be harmful rather than beneficial.

Summary and Conclusions

In spite of the fact that the software industry spends more money on

finding and fixing bugs than any other activity, software quality remains

ambiguous and poorly covered in the software engineering literature.

There are dozens of books on software quality and testing, but hardly

any of them contain quantitative data on defect volumes, numbers of

test cases, test coverage, or the costs associated with defect removal

activities.

Even worse, much of the literature on quality merely cites urban

legends of how "cost per defect rises throughout development and into

the field," without realizing that such a trend is caused by ignoring

fixed costs.

Software quality does have value, and the value increases as applica-

tion sizes get bigger. In fact, without excellence in quality control, even

completing a large software application is highly unlikely. Completing

it on time and within budget in the absence of excellent quality control

is essentially impossible.

Software Quality: The Key to Successful Software Engineering

643

Readings and References

Beck, Kent. Test-Driven Development. Boston, MA: Addison Wesley, 2002.

Chelf, Ben and Raoul Jetley. Diagnosing Medical Device Software Defects Using Static

Analysis. San Francisco, CA: Coverity Technical Report, 2008.

Chess, Brian and Jacob West. Secure Programming with Static Analysis. Boston, MA:

Addison Wesley, 2007.

Cohen, Lou. Quality Function Deployment--How to Make QFD Work for You. Upper

Saddle River, NJ: Prentice Hall, 1995.

Crosby, Philip B. Quality is Free. New York, NY: New American Library, Mentor Books,

1979.

Everett, Gerald D. and Raymond McLeod. Software Testing. Hoboken, NJ: John Wiley &

Sons, 2007.

Gack, Gary. Applying Six Sigma to Software Implementation Projects. http://software

.isixsigma.com/library/content/c040915b.asp.

Gilb, Tom and Dorothy Graham. Software Inspections. Reading, MA: Addison Wesley,

1993.

Hallowell, David L. Six Sigma Software Metrics, Part 1. http://software.isixsigma.com/

library/content/c03910a.asp.

International Organization for Standards. ISO 9000 / ISO 14000. http://www.iso.org/iso/

en/iso9000-14000/index.html.

Jones, Capers. Software Quality--Analysis and Guidelines for Success. Boston, MA:

International Thomson Computer Press, 1997.

Kan, Stephen H. Metrics and Models in Software Quality Engineering, Second Edition.

Boston, MA: Addison Wesley Longman, 2003.

Land, Susan K., Douglas B. Smith, John Z. Walz. Practical Support for Lean Six Sigma

Software Process Definition: Using IEEE Software Engineering Standards. Los

Alamitos, CA; Wiley-IEEE Computer Society Press, 2008.

Mosley, Daniel J. The Handbook of MIS Application Software Testing. Englewood Cliffs,

NJ: Yourdon Press, Prentice Hall, 1993.

Myers, Glenford. The Art of Software Testing. New York, NY: John Wiley & Sons, 1979.

Nandyal. Raghav. Making Sense of Software Quality Assurance. New Delhi: Tata

McGraw-Hill Publishing, 2007.

Radice, Ronald A. High Quality Low Cost Software Inspections. Andover, MA:

Paradoxicon Publishing, 2002.

Wiegers, Karl E. Peer Reviews in Software--A Practical Guide. Boston, MA: Addison

Wesley Longman, 2002.

This page intentionally left blank

Table of Contents: