Programming and Code Development:How Many Programming Languages Are Really Needed?

<< Requirements, Business Analysis, Architecture, Enterprise Architecture, and Design:Software Requirements

Software Quality: The Key to Successful Software Engineering, Measuring Software Quality, Software Defect Removal >>

Chapter

Programming and Code

Development

Introduction

This chapter has an unusual slant compared with other books on soft-

ware engineering. Among other topics, it deals with 12 important ques-

tions that are not well covered in the software engineering literature:

1. Why do we have more than 2500 programming languages?

2. Why does a new programming language appear more than once a

month?

3. How many programming languages are really needed by software

engineering?

4. Why do most modern applications use between 2 and 15 different

languages?

5. How many applications being maintained are written in "dead"

programming languages with few programmers?

6. How many programmers use major languages; how many use minor

languages?

7. Should there be a national translation center that maintains com-

pilers and tools for dead programming languages and that can con-

vert antique languages into modern languages?

8. What are the major kinds of bugs found in source code?

9. How effective are debuggers and static analysis tools compared

with inspections?

489

490

Chapter Eight

10. How effective are various kinds of testing in terms of bug

removal?

11. How effective is reusable code in terms of quality, security, and

costs?

12. Why has the "lines of code" metric stopped being effective for soft-

ware economic studies?

These 12 topics are not the only topics that are important about pro-

gramming, but they are not discussed often in software engineering

journals or books. Following are discussions of the 12 topics.

A Short History of Programming and

Language Development

It is interesting to consider the history of programming and the devel-

opment of programming languages. The early history of mechanical

computers driven by gears, cogs, and later punched cards is interest-

ing, but not relevant. However, these devices did embody the essence of

computer programming, which is to control the behavior of a mechanical

device by means of discrete instructions that could be varied in order to

change the behavior of the machine.

The pioneers of computer design include Charles Babbage, Ada

Lovelace, Hermann Hollerith, Alan Turing, John Von Neumann, Conrad

Zuse, J. Presper Eckert, John Mauchly, and a number of others. John

Backus, Konrad Zuse, and others contributed to the foundations of pro-

gramming languages. David Parnas and Edsger Dijkstra contributed

to the development of structured programming that minimized the ten-

dency of code branching to form "spaghetti bowls" of so many branches

that the code became nearly unreadable.

Ada Lovelace was an associate of Charles Babbage. In 1842 and 1843,

she described a method of calculating Bernoulli numbers for use on the

Babbage analytical engine. Her work is often cited as the world's first

computer program, although there is some debate about this.

In the years during and prior to World War II, a number of companies

in various countries built electro-mechanical computing devices, pri-

marily for special purposes such as calculating trajectories or handling

mathematical tasks.

The earliest models were "programmed" in part by changing wire con-

nections or using plug boards. But during World War II, computing devices

were developed with memory that could store both data and instructions.

The ability to have language instructions stored in memory opened the

gates to modern computer programming as we know it today.

Konrad Zuse of Germany built the Z3 computer in 1941 and later

designed what seems to be the first high-level language, Plankalkül,

Programming and Code Development

491

in 1948, although no compiler was created and the language was not

used.

The earliest "languages" that were stored in computers were binary

codes or machine languages, which obviously were extremely difficult to

understand, code, or modify. The difficulty of working with machine codes

directly led to languages that were more amenable to human under-

standing but capable of being translated into machine instructions.

The earliest of these languages were termed assembly languages and

usually had a one-to-one correspondence between the human-readable

instructions (called source code) and the executable instructions (called

object code).

The idea of developing languages that humans could use to describe

various algorithms or data manipulation steps proved to be so useful that

very shortly a number of more specialized languages were developed.

In these languages the human portions were optimized for certain

kinds of problems, and the work of translating the languages into

machine code was left to the compilers. Incidentally, the main difference

between an assembler and a compiler is that assemblers tend to have a

one-to-one ratio between source code and object code, while compilers

have a one-to-many ratio. In other words, one statement in a compiled

language might generate a dozen or more machine instructions.

The ability to translate a single source instruction into many object

instructions led to the concept of high-level programming languages. In

general, the higher the level of a programming language, the more object

code can be created from a single source code statement.

Both assembly and compilation were handled by special translation

programs as batch activities. The source code could not be run immedi-

ately. Sometimes translation might be delayed for hours if the computer

was being used for other work and other computers were not available.

These delays led to another form of code translation. Programming

language translators called interpreters were soon developed, which

allowed source code to be converted into object code immediately.

In the early days of computing and programming, software was used

primarily for a narrow range of mathematical calculations. But the

speed of digital computers soon gave rise to wider ranges of applica-

tions. When computers started to be aimed at business problems and

to manipulate text and data, it became obvious that if the source code

included some of the language and vocabulary of the problem domain,

then programming languages would be easier to learn and use. The use

of computers to control physical devices opened up yet another need for

languages optimized for dealing with physical objects.

As a result, scores of domain-specific programming languages were

developed that were aimed at tasks such as list processing, business

applications, astronomy, embedded applications, and a host of others.

492

Chapter Eight

Why Do We Have More than 2500

Programming Languages?

The concept of having source code optimized for specific kinds of busi-

ness or technical problems is one of the factors that led to the enormous

proliferation of programming languages.

There are some technical advantages for having programming lan-

guages match the vocabulary of various problem domains. For one thing,

such languages are easy to learn for programmers who are familiar with

the various domains.

It is actually fairly easy to develop a new programming language.

As computers began to be used for more and more kinds of problems,

the result was more and more programming languages. Developing a

new programming language that attracted other programmers also had

social and prestige value.

As a result of these technical and social reasons, the software industry

developed new programming languages with astonishing frequency.

Today, as of 2009, no one really knows the actual number of program-

ming languages and dialects, but the largest published lists of program-

ming languages now contain 2500 languages (The Language List by Bill

Kinnersley, http://people.ku.edu).

The author's former company, Software Productivity Research, has

been keeping a list of common programming languages since 1984, and

the current version contains more than 700 programming languages.

New programming languages continue to come out at a rate of two or

three per calendar month; some months, more than 10 languages have

arrived. There is no end in sight.

One reason for the plethora of languages is that a new language can

be developed by a single software engineer in only a month or two. In

fact, with compiler-compilers, a new programming language can evolve

from a vague idea to compiled code in 60 days or less.

In 1984, the author's first commercial software estimating tool was

put on the market. The first release of the tool could perform cost and

quality estimates for 30 different programming languages, but the tool

itself could handle other languages using the same logic and algorithms.

Therefore, we made a statement to customers that our tool could sup-

port cost estimates for "all known programming languages."

Having made the claim, it was necessary to back it up by assembling

a list of all known programming languages and their levels. At the time

the claim was made in 1984, the author hypothesized that the list might

include 50 languages. However, when the data was collected, it was

discovered that the set of "all known programming languages" included

about 250 languages and dialects circa 1984.

It was also discovered while compiling the list that new languages

were popping up about once a month; sometimes quite a few more.

Programming and Code Development

493

It became obvious that keeping track of languages was not going to be

quick and easy, but would require continuous effort.

Today, as of 2009, the current list of languages maintained by Software

Productivity Research has grown to more than 700 programming lan-

guages, and the frequency with which new languages come out seems

to be increasing from about one new language per month up to perhaps

two or even four and occasionally ten new languages per month.)

An approximate chronology of significant programming languages is

shown in Table 8-1.

Table 8-1 is only a tiny subset of the total number of programming

languages. It is included just to give readers who may not be practicing

programmers an idea of the wide variety of languages in existence.

Those familiar with programming concepts can see from the list that

programming language design took two divergent paths:

Specialized languages that were optimal for narrow sets of problems

■

such as FORTRAN, Lisp, ASP, and SQL

General-purpose languages that could be used for a wide range of

■

problems such as Ada, Objective C, PL/I, and Ruby.

It is of sociological interest that the vast majority of special-purpose

languages were developed by individuals or perhaps two individuals.

For example, Basic was developed by John Kemeny and Thomas Kurtz;

C was developed by Dennis Ritchie; FORTRAN was developed by John

Backus; Java was developed by James Gosling; and Objective C was

developed by Brad Cox and Tom Love.

The general-purpose languages were usually developed by commit-

tees. For example, COBOL was developed by a famous committee with

major inputs from Grace Hopper of the U.S. Navy. Other languages

developed by committees include Ada and PL/I. However, some general-

purpose languages were also developed by individuals or colleagues,

such as Ruby and Objective C.

For reasons that are perhaps more sociological than technological, the

attempts at building general-purpose languages such as PL/I and Ada

have not been as popular with programmers as many of the special-

purpose languages.

This is a topic that needs both sociological and technical research,

because PL/I and Ada appear to be well designed, robust, and capable

of tacking a wide variety of applications with good results.

Another major divergence in programming languages occurred during

the late 1970s, although research had started earlier. This is the split

between object-oriented languages such as SMALLTALK, C++, and

Objective C and languages that did not adopt OO methods and termi-

nology, such as Basic, Visual Basic, and XML.

494

Chapter Eight

TABLE 8-1

Chronology of Programming Language Development

1951

Assembly languages

1954

FORTRAN (Formula Translator)

1958

Lisp (List Processing)

1959

COBOL (Common Business-Oriented Language)

1959

JOVIAL (Jules Own Version of the International Algorithmic Language)

1959

RPG (formerly Report Program Generator)

1960

ALGOL (Algorithmic Language)

1962

APL (A Programming Language)

1962

SIMULA

1964

Basic (Beginner's all-purpose symbolic instruction code)

1964

PL/I

1964

CORAL

1967

MUMPS

1970

PASCAL

1970

Prolog

1970

Forth

1972

1978

SQL (Structured query language)

1980

CHILL

1980

dBASE II

1982

SMALLTALK

1983

Ada83

1985

Quick Basic

1985

Objective C

1986

C++

1986

Eiffel

1986

JavaScript

1987

Visual Basic

1987

PERL

1989

HTML (Hypertext Markup Language)

1993

AppleScript

1995

Java

1995

Ruby

1999

XML (Extensible Markup Language)

2000

ASP (Active Server Pages)

2002

ASP.NET

Today in 2009, more than 50 percent of active programming languages

tend to be in the object-oriented camp, while the other languages are

procedural languages, functional languages, or use some other method

of operation.

Programming and Code Development

495

Yet another dichotomy among programming languages is whether

they are typed or un-typed. The term typed means that operations in

a language are restricted to only specific data types. For example, a

typed language would not allow mathematical operations against char-

acter data. Examples of typed languages include Ruby, SMALLTALK,

and Lisp.

The opposite case, or un-typed languages, means that operations can

be performed against any type of data. Examples of un-typed languages

include assembly language and Forth.

The terms type and un-typed are somewhat ambiguous, as are the

related terms of strongly typed and weakly typed. Over and above ambi-

guity, there is some debate as to the virtues and limits of typed versus

un-typed languages.

Exploring the Popularity of Programming

Languages

There are a number of ways of studying the usage and popularity of

programming languages. These include

1. Statistical analysis of web searches for specific languages

2. Statistical analysis of books and articles published about specific

languages

3. Statistical analysis of citations in the literature about specific lan-

guages

4. Statistical analysis of job ads for programmers that cite language

skills

5. Surveys and statistical analysis of languages in legacy applica-

tions

6. Surveys and statistical analysis of languages used for new applica-

tions

A company called Tiobe publishes a monthly analysis of programming

language popularity that ranks 100 different programming languages.

Since this section is being written in May 2009, the 20 most popular lan-

guages for this month from the Tiobe rankings are listed in Table 8-2.

Older readers may wonder where COBOL, FORTRAN, PL/I, and Ada

reside. They are further down the Tiobe list in languages 21 through 40.

Since new languages pop up at a rate of more than one per month,

language popularity actually fluctuates rather widely on a monthly

basis. As interesting new programming languages appear, their popu-

larity goes up rapidly. But based on their utility or lack of utility over

longer periods, they may drop down again just as fast.

496

Chapter Eight

TABLE 8-2

Popularity Ranking of Programming Languages as of May 2009

Java

C++

PHP

Visual Basic

Python

JavaScript

Perl

10.

Ruby

11.

Delphi

12.

PL/SQL

13.

SAS

14.

PASCAL

15.

RPG (OS/400)

16.

ABAP

17.

18.

MATLAB

19.

Logo

20.

Lua

The popularity of programming languages bears a certain resemblance

to the popularity of prime-time television shows. Some new shows such

as Two and a Half Men surface, attract millions of viewers, and may last

for a number of seasons. A few shows such as Seinfeld become so popular

that they go into syndication and continue to be aired long after produc-

tion stops. But many shows are dropped after a single season.

It is interesting that the life expectancy of programming languages

and the life expectancy of television shows are about the same. Many

programming languages have active lives that span only a few "seasons"

and then disappear. Other languages become standards and may last for

many years. However, when all 2500 languages are considered, the aver-

age active life of a programming language when it is being used for new

development is less than five years. Very few programming languages

attract development programmers after more than ten years.

Some of the languages that are in the class of Seinfeld or I Love Lucy

and may last more than 25 years under syndication include

Ada

■

C++

■

Programming and Code Development

497

COBOL

■

Java

■

Objective C

■

PL/I

■

SQL

■

Visual Basic

■

XML

■

In a programming language context, the term syndication means that

the language is no longer under the direct control of its originator, but

rather control has passed to a user group or to a commercial company,

or that the language has been put in the public domain and is available

via open-source compilers.

It would be interesting and valuable if there were benchmarks and

statistics kept of the numbers of applications written in these long-lived

programming languages. No doubt C and COBOL have each been used

for more than 1 million applications on a global basis.

In fact, continuing with the analogy of the entertainment business,

it might be interesting to have awards for languages that have been

used for large numbers of applications. Perhaps "silver" might go for

100,000 applications, "gold" for 1 million applications, and "platinum"

for 10 million applications.

If such an award were created, a good name for it might be the

"Hopper," after Admiral Grace Hopper, who did so much to advance

programming languages and especially COBOL. In fact, COBOL is prob-

ably the first programming language in history to achieve the 1-million-

application plateau.

Although the idea of awards for various numbers of applications is

interesting, that would mean that statistics were available for ascer-

taining how many applications were created in specific languages or

combinations of languages. As of 2009, the software industry does not

keep such data.

The choice of which language should be used for specific kinds of

applications is surprisingly subjective. A colleague at IBM was asked

in a meeting if he programmed in the APL language. His response was,

"No, I'm not of that faith."

It would be technically possible to develop a standard method of

describing and cataloging the features of programming languages.

Indeed, with more than 2500 languages in existence, such a catalog is

urgently needed. Even if the catalog only started with 100 of the most

widely used languages, it would provide valuable information.

498

Chapter Eight

The full set of topics included to create an effective taxonomy of pro-

gramming languages is outside the scope of this book, but might contain

factors such as:

Language name

Name of language

Architecture

Object-oriented, functional, procedural, etc.

Origin

Year of creation, names of inventors

Sources

URLs of distributors of language compilers

Current version

Version number of current release; 1, 2, or

whatever

Support

URLs or addresses of maintenance organizations

User associations

Names, URLs, and locations of user groups

Tutorial materials

Books and learning sources about the language

Reviews or critiques

Published reviews of language in refereed

journals

10.

Legal status

Public domain, licensed, patents, etc.

11.

Language definition

Whether it is formal, informal

12.

Language syntax

Description of syntax

13.

Language typing

Strongly typed, weakly typed, un-typed, etc.

14.

Problem domains

Mathematics, web, embedded, graphics, etc.

15.

Hardware platforms

Hardware language was intended to support

16.

OS platforms

Operating systems language compilers work

with

17.

Intended uses

Targeted application types

18.

Known limitations

Performance, security, problem domains, etc.

19.

Dialects

Variations of the basic language

20.

Companion languages

.NET, XML, etc. (languages used jointly)

21.

Extensibility

Commands added by language users

22.

Level

Logical statements relative to assembly

language

23.

Backfire level

Logical statements per function point

24.

Reuse sources

Certified modules, uncertified, etc.

25.

Security features

Intrinsic security features, such as in

the E language

26.

Debuggers available

Names of debugging tools

27.

Static analysis available

Names of static analysis tools

28.

Development tools available

Names of development tools

29.

Maintenance tools available

Names of maintenance tools

30.

Applications to date

Approximately 100, 1000, 10,000, 100,000, etc.

Given the huge number of programming languages, it is surprising

that no standard taxonomy exists. Web searches reveal more than a dozen

topics when using search arguments such as "taxonomies of program-

ming languages" or "categories of programming languages." However,

Programming and Code Development

499

these vary widely, and some contain more than 50 different descriptive

forms, but seem to lack any fundamental organizing principle.

Returning now to the main theme, somewhat alarmingly, the life

expectancy of many software applications is longer than the active life

of the languages in which they were written. An example of this is the

patient-record systems of medical records maintained by the Veterans

Administration. It is written in the MUMPS programming language

and has far outlived MUMPS itself.

It is obvious to students of software engineering economics that if

programming languages have an average life expectancy of only 5 years,

but large applications last an average of 25 years, then software mainte-

nance costs are being driven higher than they should be due to the very

large number of aging applications that were coded in programming

languages that are now dead or dying.

How Many Programming Languages

Are Really Needed?

The plethora of programming languages raises basic questions that

need to be addressed by the software engineering literature: How many

programming languages does software engineering really need?

Having thousands of programming languages raises a corollary ques-

tion: Is the existence of more than 2500 programming languages a good

thing or a bad thing?

The argument that asserts having thousands of languages is a good

thing centers around the fact that languages tend to be optimized for

unique classes of problems. As new problems are encountered, they

demand new programming languages, or at least that is a hypothesis.

The argument that asserts having thousands of languages is a bad

thing centers around economics. Maintenance of legacy applications

written in dead languages is an expensive nightmare. The constant

need to train development programmers in the latest cult language

is expensive. Many useful tools such as static analysis tools and auto-

mated test tools support only a small subset of programming languages,

and therefore may require expensive modifications for new languages.

Accumulating large volumes of certified reusable code is more difficult

and expensive if thousands of languages have to be dealt with.

The existence of thousands of programming languages has created a

new subindustry within software engineering. This new subindustry is

concerned with translating dead or dying languages into new living lan-

guages. For example, it is now possible to translate the MUMPS language

circa 1967 into the C or Java languages and to do so automatically.

A corollary subindustry is that of renovation or periodically perform-

ing special maintenance activities on legacy applications to clean out

500

Chapter Eight

dead code, remove error-prone modules, and to reduce the inevitable

increase in cyclomatic and essential complexity that occurs over time

due to repeated small changes.

Linguists and those familiar with natural human languages are

aware that translation from one language to another is not perfect. For

example, some Eskimo dialects include more than 30 different words

for various kinds of snow. It is hard to get an exact translation into a

language such as English that developed in a temperate climate and

has only a few variations on "snow."

Since many programming languages have specialized constructs for

certain classes of problem, the translation into other languages may lead

to awkward constructs that might be difficult for human programmers

to understand or deal with during maintenance and enhancement work.

Even so, if the translation opens up a dead language to a variety of static

analysis and maintenance tools, the effort is probably worthwhile.

To deal with the question of how many programming languages are

needed, it is useful to start by considering the universe of problem areas

that need to be put onto computers. There seem to be ten discrete prob-

lem areas, divided into two different major kinds of processing, as shown

in Table 8-3.

These two general categories reflect the major forms of software that

actually exist today: (1) software that processes information, and (2)

software that controls physical devices or deals with physical properties

such as sound or light or music.

These two broad categories might lead to the conclusion that per-

haps two programming languages would be the minimum number that

would be able to address all problem areas. One language would be

optimized for information systems, and another would be optimized

for dealing with physical devices and electronic signals. However, the

TABLE 8-3

Problem Domains of Software Applications

Logical and Mathematical Problem Areas

Mathematical calculations

Logic and algorithmic expressions

Numerical data

Text and string data

Time and dates

Physical Problem Areas

Sensor-based electronic signals

Audible signals and music

Static images

Dynamic or moving images

Colors

Programming and Code Development

501

track records of general-purpose languages such as PL/I and Ada have

not indicated much success for languages that attempt to do too many

things at once.

Few problems are "pure" and deal with only one narrow topic. In fact,

most applications deal with hybrid problem domains. This leads to a

possible conclusion that programming languages may reflect the permu-

tations of problem areas rather than the problem areas individually.

If the permutations of all ten problem areas were considered, then we

might eventually end up with 3,628,800 programming languages. This

is even more unlikely to occur than having one "superlanguage" that

could tackle all problem areas.

From examining samples of both information processing applications

and embedded and systems software applications, a provisional hypoth-

esis is that about four different problem areas occur in typical software

applications. The permutation of four topics out of a total of ten topics

leads to the hypothesis that the software engineering domain will even-

tually end up with about 5,040 different programming languages.

Since we already have about 2500 programming languages and dia-

lects in 2009, there may yet be another 2500 languages still to be devel-

oped in the future. At the rate new languages are occurring of roughly

100 per year, it can be projected that new languages will proceed at

about the same rate for another 25 years. From an economic standpoint,

this does not seem to be a very cost-effective engineering solution.

Assuming that the software engineering community does reach 5040

languages, the probable distribution of those languages would be

4800 languages would be dead or dying, with few programmers

■

200 languages would be in legacy applications and therefore need

■

maintenance

40 languages would be new and gathering increasing numbers of

■

programmers

A technical alternative to churning out another 2500 specialized lan-

guages for every new kind of problem that surfaces would be to consider

building polymorphic compilers that would support any combination of

problem areas.

Creating a National Programming

Language Translation Center

When considering alternatives to churning out another 2500 program-

ming languages, it might be of value to create a formal programming

language translation center stocked with the language definitions of all

known programming languages.

502

Chapter Eight

This center could provide guidance in the translation of dead or dying

languages into modern active languages. Some companies already per-

form translation, but out of today's total of 2500 languages, only a few

are handled with technical and linguistic accuracy. Automated transla-

tion as of 2009 probably only handles 50 languages out of 2500 total

languages.

Given the huge number of existing programming languages and the

rapid rate of creation of new programming languages, such a transla-

tion center would probably require a full-time staff of at least 50 person-

nel. This would mean that only very large companies such as IBM or

Microsoft or large government agencies such as Homeland Security or the

Department of Defense would be likely to attempt such an activity.

Over and above translation, the national programming language

translation center could also perform careful linguistic analyses of all

current languages in order to identify the main strengths and weak-

nesses of current languages. One obvious weakness of most languages

is that they are not very secure.

Another function of the translation center would be to record demo-

graphic information about the numbers and kinds of applications that

use various languages. For example, the languages used for financial

systems, for weapons systems, for medical applications, for taxation

systems, and for patient records have economic and even national

importance. It would be useful to keep records of the programming

languages used for such vital applications. Obviously, maintenance and

restoration of these vital applications has major business and national

importance.

Table 8-4 is a summary of 40 kinds of software applications that

have critical importance to the United States. Table 8-4 also shows the

various programming languages used in these 40 kinds of applications.

A major function of a code translation center would be to accumu-

late more precise data on critical applications and the languages used

in them.

Both columns of Table 8-4 need additional research. There are no

doubt more kinds of critical applications than the 40 listed here. Also, in

order to fit on a printed page, the second column of the table is limited

to about six or seven programming languages. For many of these criti-

cal applications, there may be 50 or more languages in use at national

levels.

The North American Industry Classification (NAIC) codes of the

Department of Commerce identify at least 250 industries that the

author knows create software in substantial volumes. However, the 40

industries shown in Table 8-4 probably contain almost 50 percent of

applications critical to U.S. business and government operations.

Programming and Code Development

503

TABLE 8-4

Programming Languages Used for Critical Software Applications

Critical Software

Programming Languages

1. Air traffic control

Ada, Assembly, C, Jovial, PL/I

2. Antivirus & security

ActiveX, C, C++, Oberon7

3. Automotive engines

C, C++, Forth, Giotto

4. Banking applications

C, COBOL, E, HTML, Java, PL/I, SQL, XML

5. Broadband

C, C++, CESOF, JAVA

6. Cell phones

C, C++, C#, Objective C

7. Credit cards

ASP.NET, C, COBOL, Java, Perl, PHP, PL/I

8. Credit checking

ABAP, COBOL, FORTRAN, PL/I, XML

9. Credit unions

C, COBOL, HTML, PL/I, SQL

10. Criminal records

ABAP, C, COBOL, FORTRAN, Hancock

11. Defense applications

Ada, Assembly, C, CMS2, FORTRAN, Java, Jovial, SPL

12. Electric power

Assembly, C, DCOPEJ, Java, Matpower

13. FBI, CIA, NSA, etc.

Ada, APL, Assembly, C, C++, FORTRAN, Hancock

14. Federal taxation

C, COBOL, Delphi, FORTRAN, Java, SQL

15. Flight controls

Ada, Assembly, C, C++, C#, LabView

16. Insurance

ABAP, COBOL, FORTRAN, Java, PL/I

17. Mail and shipping

COBOL, dBase2, PL/I, Python, SQL

18. Manufacturing

AML, APT, C, Forth, Lua, RLL

19. Medical equipment

Assembly, Basic, C, CO, CMS2, Java

20. Medical records

ABAP, COBOL, MUMPS. SQL

21. Medicare

Assembly, COBOL, Java, PL/I, dBase2, SQL

22. Municipal taxation

C, COBOL, Delphi, Java

23. Navigation

Assembly, C, C++, C#, Lua, Logo, MatLab

24. Oil and Energy

AMPL,C, G, GAMS/MPSGE, SLP,

25. Open-source software

C, C++, JavaScript, Python, Suneido, XUL

26. Operating systems, large

Assembly, C, C#, Objective C, PL/S, VB

27. Operating systems, small C, C++, Objective C, OSL, SR

28. Pharmaceuticals

C, C++, Java, PASCAL, SAS, Visual Basic

29. Police records

C, COBOL, DBase2, Hancock, SQL

30. Satellites

C, C++, C#, Java, Jovial, PHP, Pluto

31. Securities trading

ABAP, C #,COBOL, DBase2, Java, SQL

32. Social Security

Assembly, COBOL, PL/I, dBase2, SQL

33. State taxation

C, COBOL, Delphi, FORTRAN, Java, SQL

34. Surface transportation

C, C++, COBOL, FORTRAN, HTML, SQL

35. Telephone switching

C, CHILL, CORAL, Erlang, ESPL1,ESTEREL

36. Television broadcasts

C, C++, C#, Java, Forth

37. Voting equipment

Ada, C, C++, Java

38. Weapons systems

Ada, Assembly, C, C++, Jovial

39. Web applications

AppleScript, ASP, CMM, Dylan, E, Perl, PHP, .NET

40. Welfare (State)

ASP.NET, C, COBOL, dBASE2, PL/I, SQL

504

Chapter Eight

As a result of the importance of these 40 software application areas

to the United States business and to government operations, they prob-

ably receive almost 75 percent of cyberattacks in the form of viruses,

spyware, search-bots, and denial of service attacks. These 40 industries

need to focus on security. Even a cursory examination of the program-

ming languages used by these industries reveals that few of them are

particularly resistant to viruses or malware attacks.

For all 40, maintenance is expensive and for many, it is growing

progressively more expensive due to the difficulty of simultaneously

maintaining applications written in so many different programming

languages.

As a technical byproduct of translation from older languages to new

languages, one value-added function of a national programming lan-

guage translation center would be to eliminate security vulnerabilities

at the same time the older languages are being translated.

If the language translation center operated as a profit-making busi-

ness, it might well grow a good-sized company. Assuming the company

billed at the same rate as Y2K companies (about $1.00 per logical state-

ment), a national translation center might clear $75 million per year,

assuming accurate and competent translation technology.

What the author suggests is that rather than continue to develop

random programming languages at random but rapid intervals, there is

a need to address programming languages at a fundamental linguistic

level.

A study team that included linguists, software engineers, and domain

specialists might be able to address the problems of the most effective

ways of expressing the ten problem areas and their permutations. The

goal would be to understand the minimum set of programming lan-

guages capable of handling any combination of problem areas.

If economists were added to the study team, they would also be able to

address the financial impact of attempting to maintain and occasionally

renovate applications written in hundreds of dead and dying program-

ming languages.

Why Do Most Applications Use Between 2

and 15 Programming Languages

A striking phenomenon of software engineering is the presence of mul-

tiple programming languages in the same applications. This is not a

new trend, and many older applications used combinations such as

COBOL and SQL. More recent combinations might include Java and

HTML or XML.

A similar phenomenon is the fact that many programming lan-

guages are themselves combinations of two or more other programming

Programming and Code Development

505

languages. For example, the Objective C language combines features

from SMALLTALK and C. The Ruby language combines features from

Ada, Eiffel, Perl, and Python among others.

Recall that a majority of programming languages are somewhat

specialized, and these seem to be more popular than general-purposes

languages. A hypothesis that explains why applications use several

different programming languages is that the "problem space" of the

application is broader than the "solution space" of individual program-

ming languages.

It was mentioned earlier that many applications include at least

four of the ten problem areas cited in Table 8-3. However, many pro-

gramming languages seem to be optimized only for one to three of the

problem areas. This creates a situation where multiple programming

languages are needed to implement all of the problem areas in the

application.

Of course, using any of the more general-purpose languages such as

Ada or PL/I would reduce the numbers of languages, but for sociological

reasons, these general-purpose languages have not been as popular as

the more specialized languages.

The implications of having many different languages in the same

application are that development is more difficult, debugging is

more difficult, static analysis is more difficult, and code inspection is

more difficult. After release, maintenance and enhancement tasks are more

difficult.

Table 8-5 illustrates how both development and maintenance costs

go up as the number of languages in an application increase. The costs

show the rate of increase compared with a single language.

Both development and maintenance costs increase as numbers of pro-

gramming languages in the same application increase, but maintenance

is more severely impacted.

TABLE 8-5

Impact of Multiple Languages on Costs

Languages in Application

Development Costs

Maintenance Costs

$1.00

$1.07

$1.14

$1.12

$1.17

$1.13

$1.20

$1.18

$1.24

$1.22

$1.30

$1.23

$1.35

$1.27

$1.40

$1.30

$1.47

$1.34

$1.55

506

Chapter Eight

How Many Programmers Use Various

Programming Languages?

There is no real census of either languages used in applications or

number of programmers. While the Department of Commerce and the

Bureau of Labor Statistics do issue reports on such topics in the United

States, their statistics are known to be inaccurate.

A survey done by the author and his colleagues a few years ago found

that the human resources organizations in most large corporations did

not know how many programmers or software engineers were actually

employed. Since government statistics are based on reports from HR

organizations, if they don't know, then HR organizations can't provide

good data to the government.

Among the reasons government statistics probably understate the

numbers of programmers and software engineers is because of ambigu-

ous job titles. For example, some large companies use titles such as

"member of the technical staff" as an umbrella title that might include

software engineers, hardware engineers, systems analysts, and perhaps

another dozen occupations.

Another problem with knowing how many software engineers there

are is the fact that many personnel working on embedded applications

are not software engineers or computer scientists by training, but rather

electrical engineers, aeronautical engineers, telecommunications engi-

neers, or some other type of engineer.

Because the status of these older forms of engineering is higher than

the status of software engineering, many people working on embed-

ded software refuse to be called software engineers and insist on being

identified by their true academic credentials.

The study carried out by the author and his colleagues was to derive

information on the number of software specialists (i.e., quality assurance,

database administration, etc.) employed by large software-intensive com-

panies such as IBM, AT&T, Hartford Insurance, and so forth.

The study included on-site visits and discussions with both HR organi-

zations and also local software managers and executives. It was during

the discussions with local software managers and executives that it was

discovered that not a single HR organization actually had good statistics

on software engineering populations.

Based on on-site interviews with client companies and then extrapola-

tion from their data to national levels, the author assumes that the U.S.

total of software engineers circa 2009 is about 2.5 million. Government

statistics as of 2009 indicate around 600,000 programmers, but these

statistics are low for reasons already discussed. Additionally, the govern-

ment statistics also tend to omit one-person companies and individual

programmers who develop applets or single applications.

Programming and Code Development

507

About 60 percent of these software engineers work in maintenance

and enhancement tasks, and 40 percent work as developers on new

applications. There are of course variations. For example, many more

developers than maintenance personnel work on web applications,

because all of these applications are fairly new. But for traditional

mainframe business applications and ordinary embedded and systems

software applications, maintenance workers outnumber development

workers by a substantial margin.

Table 8-6 shows the approximate numbers of software engineers by

language for the United States. However, the data in Table 8-6 is hypo-

thetical and not exact. Among the reasons that the data is not exact

is that many software engineers know more than one programming

language and work with more than one programming language.

However, Table 8-6 does illustrate a key point: The most common lan-

guages for software development are not the same as the most common

languages for software maintenance. This situation leads to a great deal

of trouble for the software industry.

The most obvious problem illustrated by Table 8-6 is that it is difficult

to get development personnel to work on maintenance tasks because of

the perceived view that older languages are not as glamorous as modern

languages.

A second problem is that due to the differences in programming lan-

guages between maintenance and new development, two different sets

TABLE 8-6

Estimated Number of Software Engineers by Language

Development

Software

Maintenance

Software

Languages

Engineers

Languages

Engineers

Java

175,000

COBOL

575,000

150,000

PL/I

125,000

C++

130,000

Ada

100,000

Visual Basic

100,000

Visual Basic

75,000

90,000

RPG

75,000

Ruby

65,000

Basic

75,000

JavaScript

50,000

Assembler

75,000

Perl

30,000

75,000

Python

20,000

FORTRAN

65,000

COBOL

15,000

Java

60,000

PHP

15,000

JavaScript

40,000

Objective C

10,000

Jovial

10,000

Others

150,000

Others

150,000

1,000,000

1,500,000

508

Chapter Eight

of tools are likely to be needed. The developers are interested in using

modern tools including static analysis, automated testing, and other

fairly new innovations.

However, many of these new tools do not support older languages,

so the software maintenance community needs to be equipped with

maintenance workbenches that include tools with different capabilities.

For example, tools that analyze cyclomatic and essential complexity

are used more often in maintenance work than in new development.

Tools that can trace execution flow are also used more widely in main-

tenance work than in development. Another new kind of tool that sup-

ports maintenance more than development can "mine" legacy code and

extract hidden business rules. Yet another kind of tool that supports

maintenance work is tools that can parse the code and automatically

generate function point totals.

It is fairly easy for programmers to learn new languages, but nobody can

possibly learn 2500 programming languages. An average programmer in

the U.S. is probably fairly expert in one language and fairly knowledgeable

in three others. Some may know as many as ten languages. The plethora

of languages obviously introduces major problems in academic training

and in ways of keeping programmers current in their skill sets.

The bottom line is that development and maintenance tool suites are

often very different, and this is due in large part to the differences in

programming languages used for development and for maintenance.

Since the great majority of languages widely used for development

today in 2009 will fall out of service in less than ten years, the software

industry faces some severe maintenance challenges.

Languages used for new development are surfacing at rates of

more than two per month. Most of these languages will be short-lived.

However, some of the applications created in these ephemeral languages

will last for many years. As a result, the set of programming languages

associated with legacy applications that need maintenance is growing

larger at rates that sometimes might top 50 languages per year!

A major economic problem associated with having thousands of

programming languages is that the plethora of languages is driving

up maintenance costs. Ironically, one of the major claims of new pro-

gramming languages is that "they improve programming productivity."

Assuming that such claims are true at all, they are only true for new

development. Every single new language is eventually going to add to

the U.S. software maintenance burden. This is because programming

languages have shorter life expectancies than the applications created

with them. One by one, today's "new" languages will drop out of use

and leave behind hundreds of aging legacy applications with declining

numbers of trained programmers, few effective tools, and sometimes

not even working compilers.

Programming and Code Development

509

What Kinds of Bugs or Defects

Occur in Source Code?

In 2008 and 2009, a major new study was performed that identified the

most common and serious 25 software bugs or defects. The study was

sponsored by the SANS Institute, with the cooperation of MITRE and

about 30 other organizations.

This new study is deservedly attracting a great deal of attention. In

the history of software quality and security, it will no doubt be ranked

as a landmark report. Indeed, all software engineering groups should

get copies of the report and make it required reading for software engi-

neers, quality assurance personnel, and also for software managers and

executives.

Access to the report can be had via either the SANS Institute or

MITRE web sites. The relevant URLS are

www.SANS.org

■

www.CWE-MITRE.org

■

In spite of the fact that software engineering is now a major occupa-

tion and millions of applications have been coded, only recently has

there been a serious and concentrated effort to understand the nature

of bugs and defects that exist in source code. The SANS report is signifi-

cant because the list of 25 serious problems was developed by a group

of some 40 experts from major software organizations. As a result, it is

obvious that the problems cited are universal programming problems

and not issues for a single company.

Over the years, many large companies such as IBM, AT&T, Microsoft,

and Unisys have had very sophisticated defect tracking and monitor-

ing systems. These same companies have also used root-cause analy-

sis. Some of the results of these internal defect tracking systems have

been published, but they usually were not perceived as having general

applicability.

A number of common problems have long been well understood: buffer

overflows, branches to incorrect locations, and omission of error han-

dling are well known and avoided by experienced software engineers.

But that is not the same as attempting a rigorous analysis and quanti-

fication of coding defects.

The SANS report is a very encouraging example of the kind of prog-

ress that can be made when top experts from many companies work

together in a cooperative manner to explore common problems. The

SANS study group included experts from academia, government, and

commercial companies. It is also encouraging that these three kinds of

organizations were able to cooperate successfully. The normal relation-

ship among the three is often adversarial rather than cooperative, so

510

Chapter Eight

having all three work together and produce a useful report is a fairly

rare occurrence.

Hopefully, the current work will serve as a model of future collabora-

tion that will deal with other important software issues. Some of the

additional topics that might do well in a collaborative mode include:

1. Defect removal methods

2. Economic analysis of software development

3. Economic analysis of software maintenance

4. Software metrics and measurement

5. Software reusability

Some of the organizations that participated in the SANS study include

in alphabetical order:

Apple

■

Aspect Security

■

Breach Security

■

CERT

■

Homeland Security

■

Microsoft

■

Mitre

■

National Security Agency

■

Oracle

■

Perdue University

■

Red Hat

■

Tata

■

University of California

■

This is only a partial list, but it shows that the study included aca-

demia, commercial software organizations, and government agencies.

The overall list of 25 security problems was subdivided into three

larger topical areas. Readers are urged to review the full report, so only

a bare list of topics is included here:

Interactions

1. Poor input validation

2. Poor encoding of output

3. SQL query structures

Programming and Code Development

511

4. Web page structures

5. Operating system command structures

6. Open transmission of sensitive data

7. Forgery of cross-site requests

8. Race conditions

9. Leaks from error messages

Resource Management

10. Unconstrained memory buffers

11. Control loss of state data

12. Control loss of paths and file names

13. Hazardous paths

14. Uncontrolled code generation

15. Reusing code without validation

16. Careless resource shutdown

17. Careless initialization

18. Calculation errors

Defense Leakages

19. Inadequate authorization and access control

20. Inadequate cryptographic algorithms

21. Hard coding and storing passwords

22. Unsafe permission assignments

23. Inadequate randomization

24. Excessive issuance of privileges

25. Client/server security lapses

The complete SANS list contains detailed information about each of

the 25 defects and also supplemental information on how the defects

are likely to occur, methods of prevention, and other important issues.

This is why readers are urged to examine the full SANS list.

As of 2009, these 25 problems may occur in more than 85 percent of

all operational software applications. One or more of these 25 problems

can be cited in more than 95 percent of all successful malware attacks.

Needless to say, the SANS list is a very important document that needs

widespread distribution and extensive study.

512

Chapter Eight

The SANS report is a valuable resource for companies involved in

testing, static analysis, inspections, and quality assurance. It provides

a very solid checklist of topics that need to be validated before code can

be safely released to the outside world.

Logistics of Software Code Defects

While the SANS report does an excellent job of identifying serious soft-

ware and code defects, once the defects are present in the code and the

code is in the hands of users, some additional issues need discussion.

Following is a list of topics that discuss logistical issues associated with

software defects:

1. Defect A problem caused by human beings that causes a software

application to either stop running or to produce incorrect results.

Defects can be errors of commission, where developers did some-

thing wrong, or errors of omission, where developers failed to antici-

pate a specific condition.

2. Defect severity level (IBM definition) Severity 1, software stops

working; Severity 2, major features disabled or incorrect; Severity

3, minor problem; Severity 4, cosmetic error with no operational

impact.

3. Invalid defect A problem reported as a defect but which upon

analysis turns out to be caused by something other than the soft-

ware itself. Hardware problems, user errors, and operating system

errors mistakenly reported as application errors are the most

common invalid defects. These total as many as 15 percent of valid

defect reports.

4. Abeyant defect (IBM term) A defect reported by a specific cus-

tomer that cannot be replicated on any other version of the software

except the one being used by the customer. Usually, abeyant defects

are caused by some unique combination of hardware devices and

other applications that run at the same time as the software against

which the defect was reported. These are rare but very difficult to

track down and repair.

5. False positive A code segment initially identified by a static

analysis tool or a test case as a potential defect. Upon further analy-

sis, the code turns out to be correct.

6. Secondhand defects A defect in an application that was not

caused by any overt mistakes on the part of the development team

itself, but instead was caused by errors in a compiler or tool used

by the development team. Errors in code generators and automatic

test tools are examples of secondhand defects. The developers used

Programming and Code Development

513

the tools in good faith, but as a result, bugs were created. An exam-

ple of a secondhand defect was a compiler error that incorrectly

handled an instruction. The code was compiled and executed, but

the instruction did not operate as defined in the language specifica-

tion. It was necessary to review the machine language listings to

find this secondhand defect since it was not visible in the source

code itself.

7. Undetected defects These are similar to secondhand defects,

but turn out to be due to either incomplete test coverage or to gaps

in static analysis tools. It is widely known that test suites almost

never touch 100 percent of the code in any application, and some-

times less than 60 percent of the code in large applications. To

minimize the impact of undetected defects and partial test cover-

age, it is necessary to use test coverage analysis tools. Major gaps

in coverage may need special testing or formal inspections.

8. Data defects Defects that are not in source code or applications,

but which reside in the data that passes through the application. A

very common example of a data defect would be an incorrect mail-

ing address. Data errors are numerous and may be severe, and they

are also difficult to eliminate. Data defects probably outnumber

code defects, and their status in terms of liability is ambiguous.

More serious examples of data defects are errors in credit reports,

which can lower credit ratings without any legitimate reason and

also without any overt defects in software. Data defects are noto-

riously difficult to repair, in part because there are no effective

quality assurance organizations involved with data defects. In fact,

there may not even be any reporting channels.

9. Externally caused defects A defect that was not originally a

defect, but became one due to external changes such as new tax

laws, changes in pension plans, and other government mandates

that trigger code changes in software applications. An example

would be a change in state sales taxes from 6 percent to 7.5 per-

cent, which necessitates changes in many software applications.

Any application that does not make the change will end up with a

defect even though it may have run successfully for years prior to

the external change. Such changes are frequent but unpredictable

because they are based on government actions.

10. Bad fixes About 7 percent of attempts to repair a software code

defect accidentally contain a new defect. Sometimes there are sec-

ondary and even tertiary bad fixes. In one lawsuit against a soft-

ware vendor, four consecutive attempts to fix a bug in a financial

application added new defects and did not fix the original defect.

The fifth attempt finally got it right.

514

Chapter Eight

11. Legacy defects These are defects that surface today, but which

may have been hidden in software applications for ten years or

more. An example of a legacy defect was a payroll application that

stopped calculating overtime payments correctly. What happened

was that overtime began to exceed $10.00 per hour, and the field

had been defined with $9.99 as the maximum amount. The problem

was more than ten years old when it first occurred and was identi-

fied. (The original developers of the application were no longer even

employed by the company at the time the problem surfaced.)

12. Reused defects Between 15 percent and 50 percent of software

applications are based on reused code either acquired commercially

or picked up from other applications. Due to the lack of certifica-

tion of reusable materials, many bugs or errors are in reused code.

Whether liability should be assigned to the developer or to the user

of reused material is ambiguous as of 2009.

13. Error-prone modules (IBM term) Studies of IBM software dis-

covered that bugs or defects were not randomly distributed but

tended to clump in a small number of places. For example, in the

IMS database product, about 35 modules out of 425 were found to

contain almost 60 percent of total customer-reported bugs. Error-

prone modules are fairly common in large software applications. As

a rule of thumb, about 3 percent of the modules in large systems

are candidates for being classified as error-prone modules.

14. Incident An incident is an abrupt stoppage of a software applica-

tion for unknown reasons. However, when the software is restarted,

it operates successfully. Incidents are not uncommon, but their ori-

gins are difficult to pin down. Some may be caused by momentary

power surges or power outages; some may be caused by hardware

problems or even cosmic rays; and some may be caused by soft-

ware bugs. Because incidents are usually random in occurrence and

cannot be replicated, it is difficult to study them.

15. Security vulnerabilities These are code segments that are

frequently used by viruses, worms, and hackers to gain access to

software applications. Error handling routines and buffer overflows

are common examples of vulnerabilities. As of 2009, these are not

usually classified as defects because they are only channels for

malicious attacks. However, given the alarming increase in such

attacks, there may be a need to reevaluate how to classify security

vulnerabilities.

16. Malicious software engineers From time to time software

engineers become disgruntled with their colleagues, their manag-

ers, or the companies that they work for. When this situation occurs,

Programming and Code Development

515

some software engineers deliberately insert malicious code into the

applications that they are developing. This situation is most likely

to occur in the time interval between a software engineer receiv-

ing a layoff notice and the actual day of departure. While only a

few software engineers cause deliberate harm, the situation may

become more prevalent as the recession deepens and lengthens. In

any case, the fact that software engineers can deliberately perform

harmful acts is one of the reasons why software engineers who work

for the Internal Revenue Service have their tax returns examined

manually. Of course, not only malicious code can occur, but also

other harmful kinds of coding might be used by software engineer-

ing employees, such as diverting funds to personal accounts.

17. Defect potentials This term originated in IBM circa 1973 and is

included in all of my major books. The term defect potential refers

to the sum total of possible defects that are likely to be encoun-

tered during software development. The total includes five sources of

defects: (1) requirements defects, (2) design defects, (3) code defects,

(4) document defects, and (5) bad fixes or secondary defects. Current

U.S. averages for defect potentials are about 5.0 per function point. A

rule of thumb for predicting defect potentials is to raise the size of the

application in function points to the 1.25 power. This gives a useful

approximation of total defects that are likely to occur for applications

between about 100 function points and 10,000 function points.

18. Defect removal efficiency This term also originated in IBM

circa 1973. It refers to the ratio of defects detected to defects pres-

ent. If a unit test finds 30 bugs out of a total of 100 bugs, it is 30

percent efficient. Most forms of testing are less than 50 percent

efficient. Static analysis and formal inspections top 80 percent in

defect removal efficiency.

19. Cumulative defect removal efficiency This term also origi-

nated in IBM circa 1973. It refers to the aggregate total of defects

removed by all forms of inspection, static analysis, and testing. If a

series of removal operations that includes requirement, design, and

code inspections; static analysis; and unit, new function, regression,

performance, and system tests finds 950 defects out of a possible

1000, the cumulative efficiency is 95 percent. Current U.S. averages

are only about 85 percent. Cumulative defect removal efficiency is

calculated at a fixed point in time, usually 90 days after software

is released to customers.

20. Performance issues Some applications have stringent perfor-

mance criteria. An example might be the target-seeking guidance

system in a Patriot missile; another example would be the embed-

ded software inside antilock brakes. If the software fails to achieve

516

Chapter Eight

its performance targets, it may be unusable or even hazardous.

However, performance issues are not usually classified as defects

because no incorrect code is involved. What is involved are execu-

tion paths that are too long or that include too many calls and

branches. Even though there may be no overt errors, there are sub-

stantial liabilities associated with performance problems.

21. Cyclomatic and essential complexity These are mathemati-

cal expressions that provide a quantitative basis for judging the

complexity of source code segments. The metrics were invented by

Dr. Tom McCabe and are sometimes called McCabe complexity

metrics. Calculations are based on graph theory, and the general

formula is "edges nodes + 2." Practically speaking, cyclomatic com-

plexity levels less than ten indicate low complexity when the code

is reviewed by software engineers. Cyclomatic complexity levels

greater than 20 indicate very complex code. The metrics are signifi-

cant because of correlations between defect densities and cyclomatic

complexity levels. Essential complexity is similar, but uses mathe-

matical techniques to simply the graphs by removing redundancy.

22. Toxic requirement This is a new term introduced in 2009 and

derived from the financial phrase toxic assets. A toxic requirement

is defined as an explicit user requirement that is harmful and will

cause serious damages if not removed. Unfortunately, toxic require-

ments cannot be removed by means of regular testing because once

toxic requirements are embedded in requirements and design docu-

ments, any test cases created from those documents will confirm the

error rather than identify it. Toxic requirements can be removed

by formal inspections of requirements, however. An example of a

toxic requirement is the famous Y2K problem, which originated

as a specific user requirement. A more recent example of a toxic

requirement is the file handling of the Quicken financial software

application. If a backup file is "opened" instead of being "restored,"

then Quicken files can lose integrity.

Summary and Conclusions

on Software Defects

As discussed earlier in this book, the current U.S. average for software

defect volumes is about 5.0 per function point. (This total includes

requirements defects, design defects, coding defects, documentation

defects, and bad fixes or secondary defects.)

Cumulative defect removal is only about 85 percent. As a result, soft-

ware applications are routinely delivered with about 0.75 defect per

function point. Note that at the point of delivery, all of the early defects

in requirements and design have found their way into the code. In other

Programming and Code Development

517

words, while the famous Y2K problem originated as a requirements

defect, it eventually found its way into source code. No programming

language was immune, and therefore the Y2K problem was endemic

across thousands of applications written in all known programming

languages.

For a typical application of 1000 function points, 0.75 released defect

per function point implies about 750 delivered defects. Of these, about

20 percent will be high-severity defects: 150 high-severity defects will

probably be in the code when users get the first releases.

Five important kinds of remedial actions can improve this situation:

1. Measurement of defect volumes by 100 percent of software organi-

zations.

2. Measurement of defect removal efficiency for every kind of inspec-

tion, static analysis, and test stage used.

3. Reducing defect potentials by means of effective defect prevention

methods such as joint application design (JAD) and quality function

deployment (QFD), and others.

4. Raising defect removal efficiency levels by means of formal inspec-

tions, static analysis, and improved testing.

5. Examining the results of quality on defect removal costs and also on

total development costs and schedules, plus maintenance costs.

The combination of these five key activities can lower defect poten-

tials down to less than 3.0 defects per function point and raise defect

removal efficiency levels higher than 95 percent on average, with mis-

sion-critical applications hitting 99 percent.

An achievable goal for the software industry would be to achieve aver-

ages of less than 3.0 defects per function point, defect removal efficiency

levels of more than 95 percent, and delivered defect volumes of less than

0.15 defect per function point.

The combined results from better measurement, better defect pre-

vention, and better defect removal would reduce delivered defects for

a 1000function point application from 750 down to only 150. Of these

150, only about 10 percent would be high-severity defects. Thus, instead

of 150 high-severity defects that normally occur today, only 15 high-

severity defects might occur. This is an improvement of a full order of

magnitude.

Even better, empirical data indicates that applications at the high

end of the quality spectrum have shorter development schedules, lower

development costs, and much lower maintenance costs.

Indeed, the main reason for both schedule slippages and cost over-

runs is because of excessive defect volumes at the start of testing.

518

Chapter Eight

Most projects are on schedule and within budget until testing starts,

at which time excessive defects stretch out testing by several hundred

percent compared with plans and cost estimates.

The technologies to achieve better quality results actually exist today

in 2009, but are not widely deployed. That means that better awareness

of quality and the economic value of quality are critical weaknesses of

the software industry circa 2009.

Preventing and Removing Defects from

Application Source Code

During development of software applications, the approximate average

number of defects encountered averages about 1.75 per function point or

17.5 per KLOC for languages where the ratio of lines of code to function

points is about 100. As pointed out earlier in this book, defect volumes

vary by the level of the programming languages, and they also vary by

the experience and skill of the programming team.

The minimum quantity of defects in source code will be about 0.5 per

function point or 5 per KLOC, while the maximum quantity will top

3.5 defects per function point or 35 defects per KLOC, assuming the

same level of programming language.

However, in spite of wide ranges of potential defects, there are still

more coding defects than any other kind of defect. Defect removal effi-

ciency against coding defects is in the range of 80 percent to 99 per-

cent. Some coding defects will slip through even in the best of cases,

although it is certainly better to approach 99 percent than it is to lag

at 80 percent.

For coding defects as with all other defect sources, two channels need

to be included in order to improve code quality:

1. Defect prevention, or methods that can lower defect potentials.

2. Defect removal, or methods that can seek out, find, and eliminate

coding defects.

The available forms of defect prevention for coding defects include

certified reusable code modules, use of patterns or standard coding

approaches for common situations, use of structured programming

methods, use of higher-level programming languages, constructing

prototypes prior to formal development, dividing large applications

into small segments (as does Agile development), participation in

code inspections, test-based development, and usage of static analysis

tools. Pair programming is also reported to have some efficacy in terms

of defect prevention, but this method has very low usage and very

little data.

Programming and Code Development

519

The available forms of defect removal for coding defects include desk

checking, pair programming, debugging tools, code inspections, static

analysis tools, and 17 kinds of conventional testing plus automated unit

testing and regression testing.

Defect removal by individual software engineers is difficult to study.

Desk checking, debugging, and unit testing are usually private activi-

ties with no observers and no detailed records kept. Most corporate

defect-tracking systems do not start to collect data until public defect

removal begins with formal inspections, function tests, and regression

tests. What happens before these public events is usually invisible.

There are some exceptions, however.

At one point, IBM asked for volunteers who were willing to record the

numbers of bugs they found in their own code by themselves. The pur-

pose of the study was to find out what was the actual defect removal effi-

ciency from these normally invisible forms of defect removal. Obviously,

the data was not used in any punitive fashion and was kept confidential,

other than to produce some statistical reports.

More recently the Personal Software Process (PSP) and Team

Software Process (TSP) methods developed by Watts Humphrey have

also included defect recording throughout the code development cycle.

Unfortunately, the Agile development method has moved in the other

direction and usually does not record private defect removal. Indeed,

many Agile projects do not record defect data at all, which is a mistake

because it reduces the ability of the Agile method to prove its value in

terms of quality.

The public forms of defect removal are discussed in this book in

Chapter 9, which deals with quality. The emphasis in this chapter is

more on the private forms of defect removal, which are seldom covered

in the software engineering literature.

Private defect removal lacks the large volumes of data associated with

some of the public forms such as formal inspections, static analysis, and

the test stages that involve other players such as test specialists and soft-

ware quality assurance. But for the sake of completeness, the topics of pri-

vate defect prevention and private defect removal need to be included.

Before discussing the effectiveness of either defect prevention or

defect removal, it should be noted that individual software engineers

or programmers vary widely in experience and skills.

In one controlled study at IBM where a number of programmers were

asked to implement the same trial example, the quantity of code pro-

duced varied by about 6 to 1 between the bulkiest solution and the most

concise solution for the same specification.

Similar studies showed about a 10 to 1 variation in the amount of

time a sample of programmers needed to code and debug a standard

problem statement.

520

Chapter Eight

These wide variations in individual performance mean that individ-

ual human variations in a population of software engineers probably

account for more divergence in results than do methods, tools, or factors

that can be studied objectively.

Forms of Programming Defect Prevention

It is much more difficult to measure or quantify defect prevention than

it is to measure defect removal. With defect removal, it is possible to

accumulate statistics on numbers of defects found and their severity

levels.

Once the project is released to customers, defect counts continue.

After 90 days of usage, it is possible to combine the internally discov-

ered defects with the customer-reported defects and to calculate defect

removal efficiency. If development personnel found 85 defects and cus-

tomers reported 15 defects, the removal efficiency is 85 percent. Such

data is easy to collect, valuable, and fairly accurate, except for some

invisible defects found via private removal actions such as desk check-

ing and unit test.

For defect prevention, there is no easy way to measure the absence of

defects. The methods available for exploring defect prevention require

collecting data from a fairly large number of projects, where some of

them utilized a specific defect prevention method and others did not.

For example, assume you measure a sample of 50 projects that used

structured coding methods and another 50 projects that did not use

structured programming methods. Assume the 50 projects that used

structured programming averaged 10 coding defects per KLOC or 1

per function point. Assume the 50 projects that did not use structured

programming averaged 20 coding defects per KLOC or 2 per function

point. This kind of analysis allows you to make a hypothesis that the

structured coding prevents about 50 percent of coding defects, but it is

still only a hypothesis and not proof.

Further, real-life situations are seldom simple and easy to deal with.

There may be numerous other factors at play, such as usage of static

analysis, usage of higher-level languages, usage of inspections, variations

in programming experience, complexity of the problems, and so forth.

The many different factors that can influence defect prevention mean

that exact knowledge of the effectiveness of any specific factor is some-

what subjective at best, and will probably stay that way.

Academic institutions can perform controlled experiments with stu-

dents where they measure the effectiveness of a single variable, but

such studies are fairly rare concerning defect prevention.

However, from long-range observations involving hundreds of soft-

ware personnel and hundreds of software projects over a multi-year

Programming and Code Development

521

time span, some objective factors about defect prevention have reason-

ably strong support:

Code reuse as defect prevention If reusable code is available that has

been certified to zero-defect levels, or at least carefully inspected, tested,

and subjected to static analysis before being made reusable, this is the

best known form of defect prevention. Defect potentials in certified reus-

able code modules are only a fraction of the 15 per KLOC normally

encountered during custom development; sometimes only about 1/100th

as many defects are encountered.

However, and this is an important point, using uncertified reusable code

can be both hazardous and expensive. If the defect potentials in uncer-

tified reusable code are more than about 1 per KLOC, and the reused

code is plugged into more than ten different applications, the combined

debugging costs will be so high that this example of reuse would have a

negative return on investment.

Although certified reuse is the most effective form of defect prevention

and counts as a best practice, it is also the rarest. Uncertified sources of

reuse outnumber certified sources by at least 50 to 1. Reuse of certified

code and other materials would class as a best practice. But reuse of

materials that are uncertified must be classed as a hazardous practice.

It is much harder for software engineers to debug someone else's

unfamiliar code than it is to debug their own. Every single time a reused

code module is utilized for a new application, there is a good chance that

the same errors will be encountered. Thus, uncertified reuse is hazard-

ous and can be more expensive than custom development of the same

module--hence, the reason the uncertified reuse can have a significant

negative return on investment (ROI).

Code reuse comes from many sources, including commercial vendors,

legacy applications, object-oriented class libraries, corporate reuse

libraries, public-domain and open-source libraries, and a number of

others. While reusable code is fairly plentiful, something that is not

plentiful is data on the repair frequencies of reusable materials. (See

the section on certifying reusable materials earlier in this book for addi-

tional information.)

As mentioned elsewhere in the book, code reuse by itself is only part

of the reusability picture. Reusable designs, data structures, test cases,

tutorial information, work breakdown structures, and HELP text are also

reusable and should be packaged together with the code they support.

Programmers and software engineers who

Patterns as defect prevention

have developed large numbers of software applications tend to be aware

that certain sequences of code occur many times in many applications.

Some of these sequences include validating inputs to ensure that error

522

Chapter Eight

conditions such as having character data entered into a numeric field is

rejected, or that text and numeric strings do not contain more characters

than specified by the application's design.

Patterns gained via personal experience are of course reusable even

if informal and personal. However, it has become clear that this kind

of knowledge occurs so often that it could be written down, illustrated

graphically, and then used to train new software engineers as they learn

trade craft.

Pattern-based development has the potential of lowering defect poten-

tials of young and inexperienced developers by more than 50 percent.

Once standard patterns are widely published and available, they can also

serve to facilitate career changes from one kind of software to another.

For example, there are very different kinds of patterns associated with

embedded applications than with information technology applications.

What is lacking for pattern-based development circa 2009 is an effec-

tive taxonomy that can be used to catalog the patterns and aid in select-

ing the appropriate set of patterns. Also, there is no exact knowledge of

how many patterns are likely to be useful and valuable. In the future,

pattern usage will no doubt be classed as a best practice, although doing

so in 2009 is probably a few years premature.

Individual software engineers working in a narrow range of applica-

tions probably utilize from 25 to 50 common patterns centering in input

and output validation, error handling, and perhaps security-related

topics. But when all types and forms of software are included, such as

financial applications, embedded applications, web applications, operat-

ing systems, compilers, avionics, and so on, the total number of useful

patterns could easily top 1000. This is too large a number to be listed

randomly, so patterns need to be organized if they are to become useful

tools of the trade.

Participation in formal inspections

Inspections as defect prevention

turns out to be equally effective as a defect-prevention method and a

defect-removal method. Participants in formal inspections spontane-

ously avoid making the kinds of mistakes that are found during the

inspection sessions. Therefore, after participating in a number of inspec-

tions, coding defects tend to be reduced by more than 80 percent com-

pared with the volumes encountered prior to starting to use inspections.

As a result, formal inspections get double counted as best practices: they

are highly effective for both defect prevention and defect removal.

Inspections turn out to be so effective in terms of defect prevention

that long-range usage of inspections has a tendency to become boring

for the participants due to a lack of interesting bugs or defects after

about a year of inspections. (Unfortunately, some companies stop using

inspections, so defect volumes begin to creep upwards again.)

Programming and Code Development

523

One other useful aspect of inspections is that when novices inspect

the work of experts, they spontaneously learn improved programming

skills. Conversely, when experts inspect the work of novices, they can

provide a great deal of useful advice as well as find a great many bugs

or defects. Therefore, it is useful to have several experts or top software

engineers as participants in inspections.

Static analysis is a fairly

Automated static analysis as defect prevention

new technology that is distinct from testing. Automated static analy-

sis tools have embedded rules and logic that are set up to discover

common forms of defects in source code. These tools are quite effective

and have defect removal efficiency levels that top 85 percent. A caveat

is that only about 50 languages out of 2500 are supported, and these

are primarily modern languages such as C, C#, C++, Java, and the

like. Older and obscure languages such as MUMPS, Coral, Chill, and

the like are not supported. However, with almost 100 static analysis

tools available, there are tools that can handle some older or special-

ized languages such as ABAP, Ada, COBOL, and PL/I. Some of the

tools have extensible rules, so in theory all of the 2500 languages in

existence might gain access to static analysis, although this is unlikely

to occur.

Because static analysis tools are effective at finding bugs in source

code, and the static analysis tools are usually run by programmers, they

have a double benefit of also acting as defect prevention agents. In other

words, programmers who carefully respond to the defects identified by

automated static analysis tools will spontaneously avoid making the

same defects in the future.

As of 2009, usage of static analysis counts as a best practice for sup-

ported programming languages. The evidence is already significant for

defect removal and is increasing for defect prevention.

Static-analysis tools are widely used by the open-source development

community with good results. Due to the power and utility of static

analysis, usage is expanding and this method should become a stan-

dard activity; in fact, static analysis should be included in every pro-

gramming development and maintenance environment and should be a

normal part of all development and maintenance methodologies.

The extreme pro-

Test-based development (TBD) as defect prevention

gramming (XP) method includes developing test cases prior to devel-

oping source code. Indeed, the test cases are used as an adjunct to the

requirements and design of software applications.

This method of early test-case development focuses attention on qual-

ity, and therefore TBD gets double credit as a best practice for both defect

prevention and defect removal. Because TBD is fairly new, empirical

524

Chapter Eight

data based on large numbers of trials is not yet available. The rather

lax measurement practices of the Agile community add to the problem

of ascertaining the actual effectiveness of TBD.

However, from anecdotal evidence, it appears that TBD may reduce

defect potentials by perhaps 30 percent and raise unit test defect removal

efficiency from around 35 percent up to perhaps 50 percent. Both results

are steps in the right direction, but additional data on TBD is needed.

TBD is a candidate for a best practice and no doubt will be classed as

one when additional quantitative data becomes available.

One of the claimed advantages

High-level languages as defect prevention

of high-level programming languages is that they reduce defect poten-

tials. A related claim is that if defects do occur, they are easier to find.

Both claims appear to be valid, but the situation is somewhat compli-

cated, and there are exceptions to general rules about the effectiveness

of high-level languages.

Any reduction in source code volumes will obviously reduce chances

for errors. If a specific function requires 1000 lines of code in assembly

language, but can be done with only 150 Java statements, the odds are

good that fewer defects will occur with Java. Even if both versions have

a constant ten bugs per KLOC, the larger assembly version might have

10 bugs, while the smaller Java version might have only 1 or 2.

However, some high-level programming languages have fairly com-

plex syntax and therefore make it easy to introduce errors by accident.

The APL programming language is an example of a language that is

very high level, but also difficult to read and understand, and therefore

difficult to debug, and especially so if the person attempting to debug

is not the original programmer.

Observations indicate the languages with regular syntax, mnemonic

labels, and commands that are amenable to human understanding will

have somewhat fewer coding defects than languages of the same level,

but with arcane commands and complicated syntax that include many

nested commands.

What would be useful and interesting would be controlled studies

by academic institutions that measured both defect densities and

debugging times for implementing standard problems in various

languages. It would be very interesting to see defect volumes and

debugging times compared for popular languages such as C, C#, C++,

Objective C, Java, JavaScript, Lua, Ruby, Visual Basic, and perhaps

50 more. However, as of 2009, this kind of controlled study does not

seem to exist.

As of 2009, the plethora of programming languages and their negative

impact on maintenance costs make best practice status for any specific

language somewhat questionable.

Programming and Code Development

525

Prototypes as defect prevention For large and complex applications, it

may be necessary to try out a number of alternative code sequences

before selecting a best-case alternative for the final versions. Prototypes

are useful in reducing defects in the final version by allowing software

engineers to experiment with alternatives in a benign fashion.

As a general rule prototypes are created mainly for the most trouble-

some and complicated pieces of work. As a result, the size of typical

prototypes is only about 5 percent to perhaps 10 percent of the size of

the total application. This practice of concentrating on the toughest

problems makes prototypes useful, and their compact size keeps them

from getting to be expensive in their own right.

Prototypes come in two flavors: disposable and evolutionary. As the

name implies, disposable prototypes are used to try out algorithms and

code sequences and then discarded. Evolutionary prototypes grow into

the finished application.

Because prototypes are usually developed at high speed in an experi-

mental fashion, the disposable prototypes are somewhat safer than evo-

lutionary prototypes. Prototypes may contain more bugs or defects than

polished work, and attempting to convert them into a finished product

may lead to higher than expected bug counts.

Disposable prototypes used to try out alternative solutions or to

experiment with difficult programming problems would be defined as

best practices. However, evolutionary prototypes that are carelessly

developed in the interest of speed are not best practices, but instead

somewhat hazardous.

Professor Edsger Dijkstra published

Code structure as defect prevention

one of the most famous letters in the history of software engineering

entitled "Go-to statements considered harmful." The letter to the editor

was published in August 1968 in The Communications of the ACM.

The thesis of this letter was that excessive use of branches or "go to"

statements made the structure of software applications so complex that

errors of incorrect branch sequences might occur that were very difficult

to identify and remove.

This letter triggered a revolution in programming style that came to

be known as structured programming. Under the principles of struc-

tured programming, branches were reduced and programmers began to

realize that complex loops and clever coding sequences introduced bugs

and made the code harder to test and validate.

As it happens another pioneering software engineer, Dr. Tom McCabe,

developed a way of measuring code structure that was published in

December 1976 in IEEE Transactions on Software. The measures devel-

oped by Dr. McCabe were those of "cyclomatic complexity" and "essential

complexity."

526

Chapter Eight

Cyclomatic complexity is based on graph theory and is a formal way

of evaluating the complexity of a graph that describes the flow of control

through a software application. The formula for calculating cyclomatic

complexity is "edges nodes + two."

Essential complexity is also based on graph theory, only it eliminates

redundant or duplicate paths through code.

In terms of cyclomatic complexity, a code segment with no branches

has a complexity score of 1, which indicates that the code executes in a

linear fashion with no branches or go-to statements. From a psychologi-

cal standpoint, cyclomatic complexity levels of less than 10 are usually

perceived as being well structured. However, as cyclomatic complexity

levels rise to greater than 20, the code segments become increasingly

difficult to understand or to follow from end to end without errors.

There is some empirical evidence that code with cyclomatic complex-

ity levels of less than 10 have only about 40 percent as many errors as

code with cyclomatic complexity levels greater than 20. Code with a

cyclomatic complexity level of 1 seems to have the fewest errors, if other

factors are held constant, such as the programming languages and the

experience of the developer.

One interesting study in IBM found a surprising result: that code

defects were sometimes higher for the work of senior or experienced pro-

grammers compared with the same volume of code written by novices

or new programmers. However, the actual cause of this anomaly was

that the experts were working on very difficult and complex applica-

tions, while the novices were doing only simple routines that were easy

to understand. In any case, the study indicated that problem difficulty

has a significant impact on defect density levels.

The importance of cyclomatic and essential complexity on code defects

led to the development of a number of commercial tools. Many tools

available circa 2009 can calculate cyclomatic and essential complexity

of code in a variety of languages.

In the 1980s, several tools on the market were aimed primarily at

COBOL and not only evaluated code complexity, but also could auto-

matically restructure the code and reduce both cyclomatic and essential

complexity. These tools asserted, with some evidence to back up the

assertions, that the revised code with low complexity levels could be

modified and maintained with less effort than the original code.

Use of structured programming techniques and keeping cyclomatic

complexity levels low would both be viewed as best practices. Code with

low complexity levels and few branches tends to have fewer defects, and

the defects that are present tend to be easier to find. Therefore, struc-

tured programming counts as a best practice for defect prevention.

More than 50 years of empirical data

Segmentation as defect prevention

has proven conclusively that defect potentials correlate almost perfectly

Programming and Code Development

527

with application size measured using both lines of code and function

points. Because size and defects are closely coupled, it is reasonable

to ask, Why not decompose large systems into a number of smaller

segments?

Unfortunately, this is not as easy as it sounds. To make an analogy,

since constructing an 80,000-ton cruise ship is known to be expensive,

why not decompose the ship into 80,000 small boats that are cheap to

build? Obviously, the features and user requirements of 80,000 small

boats are not the same as those of one large 80,000-ton cruise ship.

As of 2009, there are no proven and successful methods for segment-

ing or decomposing large systems into small independent components.

As it happens, the Agile method of dividing a system into segments or

sprints that can be developed sequentially has shown itself to be fairly

successful. But most of the Agile applications are below 10,000 function

points and are comparatively simple in architecture.

There have not yet been any Agile projects that tackle something of

the size of Microsoft Vista at about 150,000 function points or a large

ERP package at perhaps 300,000 function points. Indeed, if Agile sprints

were used for these applications and team sizes were in the range of

average Agile projects (less than ten people) then probably 150 sprints

would be needed for Vista and 300 would be needed for an ERP pack-

age. Assuming one month per sprint, the schedule would be perhaps 12

years for Vista and 25 years for the ERP package. Multiple teams would

speed things up, but interfaces between the code of each team would

add complexity and also add defects.

The bottom line is that segmentation into small independent pack-

ages or components is effective when it can be done well, but not

always possible given the feature sets and architecture of many large

systems. Thus best practice status cannot be assigned to segmenta-

tion as of 2009, due to the lack of standard and effective methods for

segmentation.

For large applications, segmentation is most common for major fea-

tures, but each of these features may themselves be in the range of

10,000 function points or more. There is not yet any proven way to

divide a massive system of 150,000 function points or 15 million lines

of code into perhaps 15,000 small independent pieces. About the best

that occurs circa 2009 is to divide these massive systems into perhaps

ten large segments.

Methodologies and measurements as defect prevention The Personal

Software Process (PSP) and Team Software Process (TSP) developed

by Watts Humphrey feature careful recording of all defects found during

development, including the normally invisible defects found privately

via desk checking and unit testing.

528

Chapter Eight

The act of recording specific defects tends to embed them in the minds

of software engineers and programmers. The result is that after several

projects in succession, coding defects decline by perhaps 40 percent since

they are spontaneously avoided.

Measurements and methodologies are therefore useful in terms of

defect prevention because they tend to focus attention on defects and so

trigger reductions over time. The methods that record defects and focus

on quality are classed as best practices.

One unusual aspect of TSP is that the results seem to improve with

application size. In other words, TSP operates successfully for large

systems in excess of 10,000 function points. This is a fairly rare occur-

rence among development methods.

Pair programming as defect prevention The idea of pair programming is

for two software engineers or programmers to share one workstation.

They take turns coding, and the other member of the pair observes the

code and makes comments and suggestions as the coding takes place.

The pair also has discussions on alternatives prior to actually doing the

code for any module or segment.

The method of pair programming has some experimental data that

suggests it may be effective in terms of both defect removal and defect

prevention. However, the pair programming method has so little usage

on actual software projects that it is not possible to evaluate these

claims as of 2009 on large-scale applications.

On the surface, pair programming would seem to come very close to

doubling the effort required to complete any given code segment. Indeed,

due to normal human tendencies to chat and discuss social topics, there

is some reason to suspect that pair programming would be more than

twice as expensive as individual programming.

Until additional information becomes available from actual projects

rather than from small experiments, there is not enough data to judge

the impact of pair programming in terms of defect removal or defect

prevention.

The methods cited earlier in this

Other methods as defect prevention

chapter have been used enough so that their effectiveness in terms of

code defect prevention can be hypothesized. Other methods seem to

have some benefits in terms of defect prevention, but they are harder to

judge. One of these methods is Six Sigma as it applies to software. The

Six Sigma approach does include measurements of defects and analysis

of causes. However, Six Sigma is usually a corporate approach that is

not applied to specific projects, so it is harder to evaluate. Other code

defect prevention techniques that may be beneficial but for which the

Programming and Code Development

529

author has no solid data include quality function deployment (QFD),

root-cause analysis, the Rational Unified Process (RUP), and many of

the Agile development variations.

Although

Combinations and synergies among defect prevention methods

the methods cited earlier may occur individually, they are often used in

combinations that sometimes appear synergistic. For example, struc-

tured coding is often used with TSP, with inspections, and with static

analysis.

The most frequent combination is the pairing of high-level program-

ming languages with the concepts of structured programming. The

combination that tends to yield the highest overall levels of defect pre-

vention would be methodologies such as TSP teamed with high-level

programming languages, certified reusable code, patterns, prototypes,

static analysis, and inspections.

Overriding all other aspects of defect prevention and defect removal,

individual experience and skill levels of the software engineers continue

to be a dominant factor. However, as of 2009, the software engineering

field lacks standard methods for evaluating human performance; it has

no licensing or certification, no board specialties, and no methods of

judging professional malpractice. Therefore, expertise among software

engineers is important but difficult to evaluate.

Summary of Observations

on Defect Prevention

Because of the difficulty and uncertainty of measuring defect preven-

tion, the suite of defect prevention methods lacks the large volumes of

solid statistical data associated with defect removal.

Personal defect prevention is especially difficult to study because

most of the activities are private and therefore seldom have records or

statistical information available, other than data kept by volunteers.

Long-range measurements over time and involving hundreds of appli-

cations and software engineers give some strong indications of what

works in terms of defect prevention, but the results are still less than

precise and will probably stay that way.

Forms of Programming Defect Removal

There is very good data available on the public forms of defect removal

such as formal inspections, function test, regression test, independent

verification and validation, and many others. But private defect removal

is another story. The phrase private defect removal refers to activities

that software engineers or programmers perform by themselves without

witnesses and usually without keeping any written records.

530

Chapter Eight

The major forms of private defect removal include, but are not limited

to:

1. Desk checking

2. Debugging using automated tools

3. Automated static analysis

4. Subroutine testing

5. Unit testing (manual)

6. Unit testing (automated)

Since most of these defect removal methods are used in private, data

to judge their effectiveness comes from either volunteers who keep

records of bugs found, or from practitioners of methods that include

complete records of all defects, such as PSP and TSP.

Automated static analysis is a method that happens to be used both

privately by individual programmers on their own code, and also pub-

licly by open-source developers who are working collaboratively on

large applications such as Firefox, Linux, and the like. Therefore, static

analysis has substantial data available for its public uses, and it can be

assumed that private use of static analysis will be equally effective.

In the early days of programming and

Desk checking for defect removal

computing, the time lag between writing source code and getting it

assembled or compiled was sometimes as much as 24 hours. When pro-

gram source code was punched into cards and the cards were then put

in a queue for assembly or compilation, many hours would go by before

the code could be executed or tested.

In these early days of programming between the late 1960s and the

1970s, desk checking or carefully reading the listing of a program to

look for errors was the most common method of personal defect removal.

Desk checking was also a technical necessity because errors in a deck

of punch cards could stop the assembly or compilation process and add

perhaps another 24 hours before testing could commence.

Today in 2009, code segments can be compiled or interpreted instantly,

and can be executed instantly as well. Indeed, they can be executed

using programming environments that include debugging tools and

automated static analysis. Therefore, desk checking has declined in

frequency of usage due to the availability of personal workstations and

personal development environments.

Although there is not much in the way of recent data on the effective-

ness of desk checking, historical data from 30 years ago indicates about

40 percent to just over 60 percent in terms of defect removal efficiency

levels.

Programming and Code Development

531

Today in 2009, desk checking is primarily reserved for a small subset

of very tricky bugs or defects that have not been successfully detected

and removed via other methods. These include security vulnerabilities,

performance problems, and sometimes toxic requirements that have

slipped into source code. These are hard to detect via static analysis or

normal testing because they may not involve overt code errors such as

branches to incorrect locations or boundary violations.

These special and unique bugs compose only about 5 percent of total

numbers of bugs likely to be found in software applications. Deck check-

ing is actually close to 70 percent in dealing with these very troublesome

bugs that have eluded other methods. (The reason that desk checking is

not higher is because sometimes software engineers don't realize that

a particular code practice is wrong. This is why proofreading of manu-

scripts is needed. Authors cannot always see their own mistakes.)

While these subtle bugs can be detected using formal inspections,

formal inspections do not occur on more than about 10 percent of soft-

ware applications and require between three and eight participants.

Desk checking, on the other hand, is a one-person activity that can be

performed at any time with no formal preparation or training.

Desk checking in 2009 is a supplemental method that may not be

needed for every software project. It is effective for a number of subtle

bugs and might be viewed as a best practice on an as-needed basis.

Software engineers and pro-

Automated debugging for defect removal

grammers circa 2009 have access to hundreds of debugging tools. These

tools normally support either specific programming languages such as

Java and Ruby or specific operating systems such as Linux, Leopard,

Windows Vista, and many others. In any case, a great many debugging

tools are available.

The features of debugging tools vary, but all of them allow the execu-

tion of code to be stopped at various places; they allow changes to code;

and they may include features to look for common problems such as

buffer overflows and branching errors. Beyond that, the specialized

debugging tools have a number of special features that are relevant to

specific languages or operating systems.

Debugging tools are so common that usage is a standard practice

and therefore would be classed as a best practice. That being said,

none are 100 percent effective, and quite a few bugs can escape. In fact,

given the numbers of bugs found later via inspections, static analysis,

and testing, the average efficiency of program debugging is only about

30 percent or less.

Automated static analysis for defect removal Static analysis tools examine

source code and the paths through the code and look for common errors.

532

Chapter Eight

Some of these tools have built-in sets of rules, while others have exten-

sible rule sets.

A keyword search of the Web using "automated static analysis" turns

up more than 100 such tools including Axivion, CAST, Coverity, Fortify,

GrammaTeck, Klocwork, Lattix, Ounce, Parasoft, ProjectAnalyzer,

ReSharper, SoArc, SofCheck, Viva64, Understand, Visual Studio Team

System, and XTRAN.

Individually, each static analysis tool supports up to 30 languages. For

common languages such as Java and C, dozens of static analysis tools

are available; for older languages such as Ada, Jovial, and PL/I, there

are only a few static analysis tools. For very specialized languages such

as ABAP used for writing code in SAP environments, there are only one

or two static analysis tools.

Without doing an exhaustive search, it appears that out of the current

total of 2500 programming languages developed to date, static analysis

tools are available for perhaps 50 programming languages. However,

some of these static analysis tools support extensible rules, so it is theo-

retically possible to create rules for examining all of the 2500 languages.

This is unlikely to occur, due to economic reasons for obscure languages

or those not used for business or scientific applications.

As a class, static analysis tools seem to be effective and can find per-

haps 85 percent of common programming errors. Therefore, usage of

static analysis tools can be viewed as a best practice; rapidly becoming

a standard practice, too.

However, static analysis tools only find coding problems and do not

find toxic requirements, performance problems, user interface problems,

and some kinds of security vulnerabilities. Therefore, additional forms

of defect removal are needed.

Some static analysis tools provide additional features besides defect

detection. Some are able to assist in translating older languages into

newer languages, such as turning COBOL into Java if desired.

It is also possible to raise the level of static analysis and examine the

meta-languages underlying several forms of requirements and design

documentation such as those created via the unified modeling language

(UML). Indeed, it is theoretically possible to use a form of extended

static analysis to create test suites.

Because static analysis and formal code inspections usually find many

of the same kinds of bugs, normally either one form or the other is uti-

lized, but not both. Static analysis and inspections have roughly the

same levels of defect removal efficiency, but static analysis is cheaper

and quicker. However, code inspections can find more subtle problems

such as performance issues or security vulnerabilities. These are not

code "bugs" per se, but they do cause trouble.

If static analysis and code inspections are both utilized, which occurs

for mission-critical applications such as some medical instruments and

Programming and Code Development

533

some kinds of security and military software, static analysis would nor-

mally come before code inspections.

A small number of issues identified by static analysis tools turn out

to be false positives, or code segments identified as bugs which turn out

to be correct. However, a few false positives is a small price to pay for

such a high level of defect removal efficiency.

Testing comes in many flavors and

Subroutine testing for defect removal

covers many different sizes of code volumes. The phrase subroutine

testing refers to a small collection of perhaps up to ten source code

instructions that produces an output or performs an action that needs

to be verified. Subroutine testing is usually the lowest level of testing

in terms of code volumes.

By contrast, unit testing would normally include perhaps 100 instruc-

tions or more, while the "public" forms of testing such as function testing

and regression testing may deal with thousands of instructions.

As the volume of source code increases, paths through the code

increase, and therefore more and more test cases are needed to actu-

ally cover 100 percent of the code. Indeed, for very large systems,

100 percent coverage appears to be impossible, or at least very rare.

Subroutine testing is a standard practice and also a best practice

because it eliminates a significant number of problems. However, the

defect removal efficiency of subroutine testing is only 30 percent to

perhaps 40 percent. This is because the code volumes are too small for

detecting many kinds of bugs such as branching errors.

Subroutine testing may or may not use actual formal test cases. The

usual mode is to execute the code and check the outputs for validity.

Subroutine test cases, if any, are normally disposable.

Unit testing of complete modules

Manual unit testing for defect removal

is the largest form of testing that is normally private or carried out by

individual programmers without the involvement of other personnel

such as test specialists or software quality assurance.

Manual unit testing is the first and oldest kind of formal testing.

Indeed, in the 1960s and early 1970s, when many applications only

contained 100 code statements or so, unit testing was often the only

form of testing performed.

The phrase unit testing refers to testing a complete module of perhaps

100 code statements that performs a discrete function with inputs, out-

puts, algorithms, and logic that need to be validated.

Unit testing can combine "black box" testing and "white box" test-

ing. The phrase black box means that the internal code of a module

is hidden, so only inputs and outputs are visible. Black box testing

therefore tests input and output validity. The phrase white box means

that internal code is revealed, so branches and control flow through

534

Chapter Eight

an application can be tested. Combining the two forms of testing should

in theory test everything. However, code coverage seldom hits 100 per-

cent, and for large applications that are high in cyclomatic complexity

it may drop below 50 percent.

Unit testing tends to look at limits, ranges of values, error-handling,

and security-related issues. Unfortunately, unit testing is only in the

range of perhaps 30 percent to 50 percent efficient in finding bugs. For

example, unit testing is not able to find many performance-related issues

because they typically involve longer paths and multiple modules.

For modules that tend to include a number of branches or complex

flows, unit testing begins to encounter problems with test coverage. As

cyclomatic complexity levels go up, it takes more and more test cases

to cover every path. In fact, 100 percent coverage almost never occurs

when cyclomatic complexity levels get above 5, even for modules with

only 100 code statements.

Unit testing is a standard activity for software engineering and

therefore counts as a best practice in spite of the somewhat low defect

removal efficiency. Without unit testing, the later stages of testing such

as function testing, stress testing, component testing, and system test-

ing would not be possible.

The test cases created for unit testing are normally placed in a formal

test library so that they can be used later for regression testing. Since

the test cases are going to be long-lived and used repeatedly, they need

proper identification as to what applications and features they test, what

functions they test, when they were created, and by whom. There will

also be accompanying test scripts that deal with invoking and executing

the test cases. The specifics of formal test case design are outside the

scope of this book, but such topics are covered in many other books.

Unit testing can be used in conjunction with other forms of defect

removal such as formal code inspections and static analysis. Usually,

static analysis would be performed prior to unit testing, while code

inspections would be performed after unit testing. This is because static

analysis is quick and inexpensive and finds many bugs that might be

found via unit testing. Unit testing is done prior to code inspections for

the same reason; it is faster and cheaper. However, code inspections are

very effective at finding subtle issues that elude both static analysis and

unit testing, such as security vulnerabilities and performance issues.

Using code inspections, static analysis, and unit testing for the same

code is a fairly rare occurrence that most often occurs on mission-critical

applications such as weapons systems, medical instruments, and other

software applications where failure might cause death or destruction.

Manual unit testing was a normal and standard activity for more than

40 years and is still very widespread. However, performance of units varies

from "poorly performed" to "extremely good." Because of the inconsistencies

Programming and Code Development

535

in methods of carrying out unit testing and in testing results, the ranges

are too wide to say that unit testing per se is a best practice. Careful unit

testing with both black box and white box test cases and thoughtful consid-

eration to test coverage would be considered a best practice. Careless unit

testing with hasty test cases and partial coverage would rank no better

than marginally adequate and would not be a best practice.

Testing is a teachable skill, and there are many classes available by

both academia and commercial test companies. There are also several

forms of certification for test personnel. It would be useful to know if

formal test training and certification elevated test defect removal effi-

ciency by significant amounts. There is considerable anecdotal evidence

that certification is beneficial, but more large-scale surveys and studies

are needed on this topic.

While manual unit testing has

Automated unit testing for defect removal

been part of software engineering since the 1960s, automated unit testing is

newer and started to occur only in the 1980s in response to larger and more

complex applications plus the arrival of graphical user interfaces (GUI),

which greatly expanded the nature of software inputs and outputs.

The phrase "automated unit testing" is somewhat ambiguous circa

2009. The most common usage of the term implies manual creation of

unit test cases combined with a framework or scaffold that allows them

to be run automatically on a regular basis without explicit actions by

software engineers.

Automated unit testing has been adopted by the Agile and extreme

programming (XP) communities together with the corollary idea of cre-

ating test cases before creating code. This combination seems to be fairly

effective in terms of defect removal and also pays off with improved

defect prevention by focusing the attention of software engineers on

quality topics.

The phrase automated unit testing deals mainly with test case execu-

tion and recording of defects that are encountered: most of the test cases

are still created by hand. However, it is theoretically possible to envision

automated test case creation as well.

Recall from Chapter 7 that during requirements gathering and analy-

sis, seven fundamental topics and 30 supplemental topics need to be

considered. As it happens, these same 37 issues also need to be tested.

A form of static analysis elevated to execute against requirements and

specification meta-languages should, in theory, be able to produce a

suite of test cases as a byproduct.

Some forms of test automation are aimed at web applications; others are

aimed at embedded applications; and still others are aimed at information

technology products. Automated testing is an emerging technology that as

of 2009 is still rapidly evolving.

536

Chapter Eight

There is a shortage of solid empirical data that compares automated

unit testing and manual unit testing in a side-by-side fashion for appli-

cations of similar size and complexity. Anecdotal information gives an

edge to automated testing for speed and convenience. However, the most

critical metric for testing is that of defect removal efficiency. As this

book is written, there is not enough solid data that compares automated

unit testing to the best forms of manual unit testing to judge whether

automated unit tests have higher levels of defect removal efficiency

than manual unit tests.

As additional data becomes available, there is a good chance that

automatic unit testing will enter the best practice class. As of 2009, the

data shows some effort and cost benefits, but defect removal efficiency

benefits remain uncertain.

About 40 percent of the software

Defect removal for legacy applications

engineers in the world are faced with performing maintenance on aging

legacy applications that they did not create themselves. Although the

legacy applications may be old, they are far from trouble free, and they

still contain latent bugs or defects.

This situation brings up a number of questions about defect removal for

legacy code where the original developers are gone, the specifications may

be missing or out of date, comments may be sparse or incorrect, regression

tests are of unknown completeness, and the code itself may be in a dead

language or one the current maintenance team has not used.

Fortunately, a number of companies and tools have addressed the

issues of maintaining aging legacy code. Some of these companies have

developed "maintenance workbenches" that include features such as:

1. Automated static analysis

2. Automated test coverage analysis

3. Automated function point calculations

4. Automated cyclomatic and essential complexity calculations

5. Automated debugging support for many (but not all) languages

6. Automated data mining for business rules

7. Automated translation from dead languages to newer languages

With aging legacy applications being written in as many as 2500 dif-

ferent programming languages, no single tool can provide universal sup-

port. However, for legacy code written in the more common languages

such as Ada, COBOL, C, PL/I, and the like, a number of maintenance

tools are available.

Usage of maintenance workbenches as a class counts as a best prac-

tice, but there are too many tools and variations to identify specific

Programming and Code Development

537

workbenches. Also, these tools are evolving fairly rapidly, and new fea-

tures occur frequently.

The methods

Synergies and combinations of personal defect removal

discussed in this section are used in combination rather than alone.

Debugging, automated static analysis, and unit testing form the most

common combination. The combined effectiveness of these three meth-

ods can top 97 percent in terms of defect removal efficiency when per-

formed by experienced software engineers. The combined results can

also drop below 85 percent when performed by novices.

Summary and Conclusions on

Personal Defect Removal

Although personal defect removal activities are private and therefore

difficult to study, they have been the frontline of defense against soft-

ware defects for more than 50 years. That being said, the fact that soft-

ware defects emerge and are still present when software is delivered

indicates that none of the personal defect removal methods are 100

percent effective.

However, some of the newer defect removal tools such as automated

static analysis are improving the situation and adding rigor to the suite

of personal defect removal tools and methods.

Since individual software engineers can keep records of the bugs they

find, it would be useful and valuable if personal defect removal effi-

ciency levels could be elevated up to more than 90 percent before the

public forms of defect removal begin.

Personal defect removal will continue to have a significant role as

software engineering evolves from a craft to a true engineering dis-

cipline. Knowing the most effective and efficient ways for preventing

and removing defects is a sign of software engineering professionalism.

Lack of defect measures and unknown levels of defect removal efficiency

imply amateurishness; not professionalism.

Economic Problems of the

"Lines of Code" Metric

Introduction

Any discussion of programming and code development would be incom-

plete without considering the famous lines of code (LOC) metric, which

has been used to measure both productivity and quality since the dawn

of the computer era.

538

Chapter Eight

The LOC metric was first introduced circa 1960 and was used for

economic, productivity, and quality studies. At first the LOC metric was

reasonably effective for all three purposes.

As additional higher-level programming languages were created, the

LOC metric began to encounter problems. LOC metrics were not able to

measure noncoding activities such as requirements and design, which

were becoming increasingly expensive.

These problems became so severe that a controlled study in 1994

that used both LOC metrics and function point metrics for ten versions

of the same application coded in ten languages reached an alarming

conclusion: LOC metrics violated the standard assumptions of economic

productivity so severely that using LOC metrics for studies involving

more than one programming language constituted professional mal-

practice!

Such a strong statement cannot be made without examples and case

studies to show the LOC problems. Following is a chronology of the use

of LOC metrics that shows when and why the metric began to cease

being useful and start being troublesome. The chronology runs from

1960 to the present day, and it projects some ideas forward to 2020.

Lines of Code Metrics Circa 1960

The lines of code (LOC) metric for software projects was first introduced

circa 1960 and was used for economic, productivity, and quality studies.

The economics of software applications were measured using "dollars

per LOC." Productivity was measured in terms of "lines of code per time

unit." Quality was measured in terms of "defects per KLOC" where "K"

was the symbol for 1000 lines of code. The LOC metric was reasonably

effective for all three purposes.

When the LOC metric was first introduced, there was only one pro-

gramming language, basic assembly language. Programs were small

and coding effort composed about 90 percent of the total work. Physical

lines and logical statements were the same thing for basic assembly

language.

In this early environment, the LOC metric was useful for economic,

productivity, and quality analyses. The LOC metric worked fairly well

for a single language where there was little or no reused code and where

there were no significant differences between counts of physical lines

and counts of logical statements. But the golden age of the LOC metric,

where it was effective and had no rivals, only lasted about ten years.

However, this ten-year span was time enough so that the LOC metric

became firmly embedded in the psychology of software engineering. Once

an idea becomes firmly fixed, it tends to stay in place until new evidence

becomes overwhelming. Unfortunately, as the software industry changed

Programming and Code Development

539

and evolved rapidly, the LOC metric did not change. As time passed,

the LOC metric became less and less useful until by about 1980 it had

become extremely harmful without very many people realizing it. Due to

cognitive dissonance, the LOC metric was used but not examined criti-

cally in the light of changes in other software engineering methods.

Lines of Code Metrics Circa 1970

By 1970, basic assembly had been supplanted by macro-assembly.

The first generation of higher-level programming languages such as

COBOL, FORTRAN, and PL/I was starting to be used. Usage of basic

assembly language was beginning to drop out of use as better alterna-

tives became available. This was perhaps the first instance of a long

series of programming languages that died out, leaving a train of aging

legacy applications that would be difficult to maintain as programmers

and compilers stopped being available who were familiar with the dead

languages.

The first known problem with LOC metrics was in 1970, when many

IBM publication groups exceeded their budgets for that year. It was

discovered (by the author) that technical publication group budgets

had been based on 10 percent of the budgets assigned to programming

or coding.

The publication projects based on code budgets for assembly language

did not overrun their budgets, but manuals for the projects coded in

PL/S (a derivative of PL/I) had major overruns. This was because PL/S

reduced coding effort by half, but the technical manuals were as big as

ever. Therefore, when publication budgets were set at 10 percent of code

budgets, and coding costs declined by 50 percent, all of the publication

budgets for PL/S projects were exceeded.

The initial solution to this problem at IBM was to give a formal math-

ematical definition to language levels. The level was defined as the

number of statements in basic assembly language needed to equal the

functionality of 1 statement in a higher-level language. Thus, COBOL

was a level 3 language because it took three basic assembly statements

to equal one COBOL statement. Using the same rule, SMALLTALK is

a level 18 language.

For several years before function points were invented, IBM used

"equivalent assembly statements" as the basis for estimating noncode

work such as user manuals. (Indeed, a few companies still use equiva-

lent assembly language even in 2009.)

Thus, instead of basing a publication budget on 10 percent of the

effort for writing a program in PL/S, the budget would be based on 10

percent of the effort if the code were basic assembly language. This

method was crude but reasonably effective. This method recognized that

540

Chapter Eight

not all languages required the same number of lines of code to deliver

specific functions.

However, neither IBM customers nor IBM executives were comfort-

able with the need to convert the sizes of modern languages into the

size of an antique language for cost-estimating purposes. Therefore, a

better form of metric was felt to be necessary.

The documentation problem plus dissatisfaction with the equivalent

assembler method were two of the reasons IBM assigned Allan Albrecht

and his colleagues to develop function point metrics. Additional very

powerful programming languages such as APL were starting to appear,

and IBM wanted both a metric and an estimating method that could

deal with noncoding work as well as coding in an accurate fashion.

The use of macro-assembly language had introduced code reuse, and

this caused measurement problems, too. It raised the issue of how to

count reused code in software applications, or how to count any other

reused material for economic purposes.

The solution here was to separate productivity into two discrete topics:

1. Development productivity

2. Delivery productivity

The former, development productivity, dealt with the code and materi-

als that had to be constructed from scratch in the traditional way.

The latter, delivery productivity, dealt with the final application as

delivered, including reused material. For example, using macro-assem-

bly language, a productivity rate for development productivity might

be 300 lines of code per month. But due to reusing code in the form of

macro expansions, delivery productivity might be as high as 750 lines

of code per month.

This is an important business distinction that is not well understood

even in 2009. The true goal of software engineering is to improve the

rate of delivery productivity. Indeed, it is possible for delivery productiv-

ity to rise while development productivity declines!

This might occur by carefully crafting a reusable code module and

certifying it to zero-defect quality levels. Assume a 500line code module

is developed for widespread reuse. Assume the module was carefully

developed, fully inspected, examined via static analysis, and fully tested.

The module was certified to be of zero-defect status.

This kind of careful development and certification might yield a net

development productivity rate of only 100 lines of code per month, while

normal development for a single-use module would be closer to 500 lines

of code per month. Thus, a total of five months instead of a single month

of development effort went to creating the module. This is of course a

very low rate of development productivity.

Programming and Code Development

541

However, once the module is certified and available for reuse, assume

that utilizing it in additional applications can be done in only one hour.

Therefore, every time the module is utilized, it saves about one month

of custom development!

If the module is utilized in only five applications, it will have paid for

its low development productivity. Every time this module is used, its

effective delivery productivity rate is equal to 500 lines of code per hour,

or about 66,000 lines of code per month!

Thus, while the development productivity of the module dropped down

to only 100 lines of code per month, the delivery productivity rate is

equivalent to 66,000 lines of code per month. The true economic value

of this module does not reside in how fast it was developed, but rather

in how many times it can be delivered in other applications because it

is reusable.

To be successful, reused code needs to approach or achieve zero-defect

status. It does not matter what the development speed is, if once com-

pleted the code can then be used in hundreds of applications.

As service-oriented architecture (SOA) and software as a service

(SaaS) approach, their goal is to make dramatic improvements in the

ability to deliver software features. Development speed is comparatively

unimportant so long as quality approaches zero-defect levels.

Returning to the historical chronology, another issue shared between

macro-assembly language and other new languages was the difference

between physical lines of code and logical statements. Some languages,

such as Basic, allowed multiple statements to be placed on a physical

line. Other languages, such as COBOL, divided some logical statements

into multiple physical lines. The difference between a count of physical

lines and a count of logical statements could differ by as much as 500

percent. For some languages, there would be more physical lines than

logical statements, but for other languages, the reverse was true. This

problem was never fully resolved by LOC users and remains trouble-

some even in 2009.

Due to the increasing power and sophistication of high-level program-

ming languages such as C++, Objective C, SMALLTALK, and the like,

the percentage of project effort devoted to coding was dropping from

90 percent down to about 50 percent. As coding effort declined, LOC metrics

were no longer effective for economic, productivity, or quality studies.

After function point metrics were developed circa 1975, the defini-

tion of language level was expanded to include the number of logical

code statements equivalent to 1 function point. COBOL, for example,

requires about 105 statements per function point in the procedure and

data divisions.

This expansion is the mathematical basis for backfiring, or direct

conversion from source code to function points. Of course, individual

542

Chapter Eight

programming styles make backfiring a method with poor accuracy even

though it remains widely used for legacy applications where code exists

but specifications may be missing.

There are tables available from several consulting companies such as

David Consulting, Gartner Group, and Software Productivity Research

(SPR) that provide values for source code statements per function point

for hundreds of programming languages.

In 1978, A.J. Albrecht gave a public lecture on function point metrics

at a joint IBM/SHARE/GUIDE conference in Monterey, California. Soon

after this, function points started to be published in the software litera-

ture. IBM customers soon began to use function points, and this led to

the formation of a function point user's group, originally in Canada.

Lines of Code Metrics Circa 1980

By about 1980, the number of programming languages had topped 50,

and object-oriented languages were rapidly evolving. As a result, soft-

ware reusability was increasing rapidly.

Another issue that surfaced circa 1980 was the fact that many appli-

cations were starting to use more than one programming language, such

as COBOL and SQL. The trend for using multiple languages in the same

application has become the norm rather than the exception. However,

the difficulty of counting lines of code with accuracy was increased when

multiple languages were used.

About the middle of this decade, function point users organized and

created the nonprofit International Function Point Users Group (IFPUG).

Originally based in Canada, IFPUG moved to the United States in the

mid-1980s. Affiliates in other countries soon were formed, so that by the

end of the decade, function point user groups were in a dozen countries.

In 1985, the first commercial software cost-estimating tool based on

function points reached the market, SPQR/20. This tool supported esti-

mates for 30 common programming languages and also could be used

for combinations of more than one programming language.

This tool included sizing and estimating of paper documents such as

requirements, design, and user manuals. It also estimated noncoding

tasks including testing and project management.

Because LOC metrics were still widely used, the SPQR/20 tool

expressed productivity and quality results using both function points

and LOC metrics. Because it was easy to switch from one language

to another, it was interesting to compare the results using both func-

tion point and LOC metrics when changing from macro-assembly to

FORTRAN or Ada or PL/I or Java.

As the level of a programming language goes up, economic productiv-

ity expressed in terms of function points per staff month also goes up,

Programming and Code Development

543

which matches standard economics. But as language levels get higher,

productivity expressed in terms of lines of code per month drops down.

This reversal by LOC metrics violates all rules of standard economics

and is a key reason for asserting that LOC metrics constitute profes-

sional malpractice.

It is a well-known law of manufacturing economics that when a develop-

ment cycle includes a high percentage of fixed costs, and there is a decline

in the number of units manufactured, the cost per unit will go up.

If line of code is considered to be a manufacturing unit and there is a

switch from a low-level language to a high-level language, the number

of units will decline. But the paper documents in the form of require-

ments, specifications, and user documents do not decline. Instead they

stay almost constant and have the economic effect of fixed costs. This

of course will raise the cost per unit. Because this situation is poorly

understood, two examples will clarify the situation.

Suppose we have an application that consists of 1000 lines of

Case A

code in basic assembly language. (We can also assume that the applica-

tion is 5 function points.) Assume the development personnel are paid

at a rate of $5000 per staff month.

Assume that coding took 1 staff month and production of paper docu-

ments in the form of requirements, specifications, and user manuals

also took 1 staff month. The total project took 2 staff months and cost

$10,000. Productivity expressed as LOC per staff month is 500. The cost

per LOC is $10.00. Productivity expressed in terms of function points

per staff month is 2.5. The cost per function point is $2000.

Case B Assume that we are doing the same application using the Java

programming language. Instead of 1000 lines of code, the Java version

only requires 200 lines of code. The function point total stays the same

at 5 function points. Development personnel are also paid at the same

rate of $5000 per staff month.

In Case B suppose that coding took only 1 staff week, but the produc-

tion of paper documents remained constant at 1 staff month.

Now the entire project took only 1.25 staff months instead of 2 staff

months. The cost was only $6250 instead of $10,000. Clearly economic

productivity has improved, since we did the same job as Case A with a

savings of $3750. We delivered exactly the same functions to users, but

with much less code and therefore much less effort, so true economic

productivity increased.

When we measure productivity for the entire project using LOC met-

rics, our rate has dropped down to only 160 LOC per month from the

500 LOC per month shown for Case A!

544

Chapter Eight

Our cost per LOC has soared up to $31.25 per LOC. Obviously, LOC

metrics cannot measure true economic productivity. Also obviously, LOC

metrics penalize high-level languages. In fact, many studies have proven

that the penalty exacted by LOC metrics is directly proportional to the

level of the programming language, with the highest-level languages

looking the worst!

Since the function point totals of both Case A and Case B versions are

the same at 5 function points, Case B has a productivity rate of 4 func-

tion points per staff month. The cost per function point is only $1250.

These improvements match the rules of standard economics, because

the faster and cheaper version has better results than the slower more

expensive version.

What has happened of course is that the paperwork portion of the

project did not decline even though the code portion declined substan-

tially. This is why LOC metrics are professional malpractice if applied

to compare projects that used different programming languages. They

move in the opposite direction from standard economic productivity

rates and penalize high-level languages. Table 8-7 summarizes both

Case A and Case B.

As can be seen by looking at Cases A and B when they are side by side,

LOC metrics actually reverse the terms of the economic equation and

make the large, slow, costly version look better than the small, quick,

cheap version.

It might be said that the reversal of productivity with LOC metrics

is because paperwork was aggregated with coding. But even when only

coding by itself is measured, LOC metrics still violate standard eco-

nomic assumptions.

TABLE 8-7 Comparing Low-Level and High-Level Languages

Case A

Case B

Difference

Language

Assembly

Java

Lines of code (LOC)

1000

200

800

Function points

5.00

Monthly compensation

$5,000.00

$0.00

Paperwork effort (months)

1.00

Coding effort (months)

1.00

0.25

0.75

Total effort (months)

2.00

1.25

0.75

Project cost

$10,000.00

$6,250.00

$3,750.00

LOC per month

500

160

340

Cost per LOC

$10.00

$31.25

$21.25

Function points per month

2.50

4.00

1.5

Cost per function point

$2,000.00

$1,250.00

$750.00

Programming and Code Development

545

The 1000 LOC of assembly code was done in 1 month at a rate of 1000

LOC per month. The pure coding cost was $5000 or $5.00 per LOC.

The 200 LOC of Java code was done in 1 week, or 0.25 month.

Converted into a monthly rate, that is only 800 LOC per month. The

coding cost for Java was $1250, so the cost per LOC was $6.25.

Thus, Java costs more per LOC than assembly, even though Java took

only one-fourth the time and one-fourth the cost! When you try and

measure the two different languages using LOC, assembly looks better

than Java, which is definitely a false conclusion. Table 8-8 shows the

comparison between assembly and Java for coding only.

In real economic terms, the Java code only cost $1250 while the assem-

bly code cost $5000. Obviously, Java has better economics because the

same job was done for a savings of $3750.

But the Java LOC production rate is lower than assembly, and the

cost per LOC has jumped from $5.00 to $6.25! From an economic stand-

point, variations in LOC per month and cost per LOC are unimportant

if there is a major difference in how much code is needed to complete

an application.

Unfortunately, LOC metrics end up as professional malpractice no

matter how you use them if you are trying to measure economic pro-

ductivity between unlike programming languages. By contrast, the Java

code's cost per function point was $250, while the assembly code's cost

per function point was $1000, and this matches the assumptions of

standard economics.

Function point production for Java was 20 function points per staff

month versus only 5 function points per staff month for assembly. Thus,

function points match the assumptions of standard economics while

LOC metrics violate standard economics.

Returning to the main thread, within a few years, all other commercial

software estimating tools would also support function point metrics, so

TABLE 8-8 Comparing Coding for Low-Level and High-Level Languages

Case A

Case B

Difference

Language

Assembly

Java

Lines of code (LOC)

1000

200

800

Function points

5.00

Monthly compensation

$5,000.00

$0.00

Coding effort (months)

1.00

0.25

0.75

Coding cost

$5,000.00

$1,250.00

$3,750.00

LOC per month

1000

800

200

Cost per LOC

$5.00

$6.25

$1.25

Function points per month

Cost per function point

$1,000.00

$250.00

$750.00

546

Chapter Eight

that CHECKPOINT, COCOMO, KnowledgePlan, Price-S, SEER, SLIM

SPQR/20, and others could express estimates in terms of both function

points and LOC metrics.

By the end of this decade, coding effort was below 35 percent of total

project effort, and LOC was no longer valid for either economic or qual-

ity studies. LOC metrics could not quantify requirements and design

defects, which now outnumbered coding defects. LOC metrics could not

be used to measure any of the noncoding activities such as require-

ments, design, documentation, or project management.

The response of the LOC users to these problems was unfortunate:

they merely stopped measuring anything but code production and

coding defects. The bulk of all published reports based on LOC metrics

cover less than 35 percent of development effort and less than 25 per-

cent of defects, with almost no data being published on requirements

and design defects, rates of requirements creep, design costs, and other

modern problems.

The history of the LOC metric provides an interesting example of

Dr. Leon Festinger's theory of cognitive dissonance. Once an idea

becomes entrenched, the human mind tends to reject all evidence to

the contrary. Only when the evidence becomes overwhelming will there

be changes of opinion, and such changes tend to occur rapidly.

Lines of Code Metrics Circa 1990

By about 1990, not only were there more than 500 programming lan-

guages in use, but some applications were written in 12 to 15 different

languages. There were no international standards for counting code, and

many variations were used sometimes without being defined.

In 1991, the first edition of the author's book Applied Software

Measurement included a proposed draft standard for counting lines

of code based on counting logical statements. One year later, Bob Park

from the Software Engineering Institute (SEI), also published a pro-

posed draft standard, only based on counting physical lines.

A survey of software journals by the author in 1993 found that about

one-third of published articles used physical lines, one-third used logical

statements, and the remaining third used LOC metrics without even

bothering to say how they were counted. Since there is about a 500 per-

cent variance between physical LOC and logical statements for many

languages, this was not a good situation.

The technical journals that deal with medical practice and engineer-

ing often devote as much as 50 percent of the text to explaining and

defining the measurement methods used to derive the results. The soft-

ware engineering journals, on the other hand, often fail to define the

measurement methods at all.

Programming and Code Development

547

The software journals seldom devote more than a few lines of text to

explaining the nature of the measurements used for the results. This is

one of several reasons why the term "software engineering" is something

of an oxymoron. In fact it is not even legal to use the term "software

engineering" in some states and countries, because software develop-

ment is not a recognized engineering discipline or a licensed engineering

discipline.

But there was a worse problem approaching than ambiguity in count-

ing lines of code. The arrival of Visual Basic introduced a class of pro-

gramming languages where counting lines of code was not even possible.

This is because a lot of Visual Basic "programming" was not done with

procedural code, but rather with buttons and pull-down menus.

Of the approximate 2500 programming languages and dialects in

existence circa 2009, there are only effective published counting rules

for about 150. About another 2000 are similar to other languages and

could perhaps share the same counting rules. But for at least 50 lan-

guages that use graphics or visual means to augment procedural code,

there are no code counting rules at all. Unfortunately, some of the lan-

guages without code counting rules tend to be most recent languages

that are used for web site development.

In 1994, a controlled study was done that used both LOC metrics

and function points for ten versions of the same application written in

ten different programming languages, including four object-oriented

languages.

The study was published in American Programmer in 1994. This

study found that LOC metrics violated the basic concepts of economic

productivity and penalized high-level and OO languages due to the fixed

costs of requirements, design, and other noncoding activities. This was

the first published study to state that LOC metrics constituted profes-

sional malpractice if used for economic studies where more than one

programming language was involved.

By the 1990s most consulting studies that collected benchmark and

baseline data used function points. There are no large-scale benchmarks

based on LOC metrics. The International Software Benchmarking

Standards Group (ISBSG) was formed in 1997 and only publishes data

in function point form. Consulting companies such as SPR and the

David Consulting Group also use function point metrics.

By the end of the decade, some projects were spending less than 20 per-

cent of the total effort on coding, so LOC metrics could not be used for the

80 percent of effort outside the coding domain. The LOC users remained

blindly indifferent to these problems and continued to measure only

coding, while ignoring the overall economics of complete development

cycles that include requirements, analysis, design, user documentation,

project management, and many other noncoding tasks.

548

Chapter Eight

By the end of the decade, noncoding defects in requirements and

design outnumbered coding defects almost 2 to 1. But since noncode

defects could not be measured with LOC metrics, the LOC literature

simply ignores them.

Indeed, still in 2009, debates occur about the usefulness of the LOC

metric, but the arguments unfortunately are not solidly grounded in

manufacturing economics. The LOC enthusiasts seem to ignore the

impact of fixed costs on software development.

The main argument of the LOC enthusiasts is that development effort

has a solid statistical correlation to size measured in terms of lines of

code. This is true, but irrelevant in terms of standard economics.

If it takes 1000 lines of C code to deliver ten function points to custom-

ers and the cost was $10,000, then the cost per LOC is $10.00. Assuming

one month of programming effort, the productivity rate using LOC is

1000 LOC per month.

If the same ten function points were delivered to customers in

Objective C, there might be only 250 lines of code and the cost might

be only $2500. The effort might take only one week instead of a whole

month. But the cost per LOC is unchanged at $10.00 and the LOC pro-

ductivity rate is also unchanged at 1000 LOC per month.

With LOC metrics, both versions appear to have identical productivity

rates of 1000 LOC per month, but these are development rates; not deliv-

ery rates. Since the functionality is the same for both C and Objective C

versions, it is important that the cost per function point for C was $1000,

while for Objective C the cost per function point was only $250.

Measured in terms of function points per month, the rate for C was

10, while the rate for Objective C increased to 40. Thus, when measured

correctly, the economic value of high-level languages and delivery rates

are clearly revealed, while the LOC metric does not show either eco-

nomic or delivery productivity at all.

Lines of Code Metrics Circa 2000

By the end of the century, the number of programming languages had

topped 2000 and continues to grow at more than one new program-

ming language per month. Current rates of new programming language

development may approach 100 new languages per year.

Web applications are mushrooming, and all of these are based on very

high-level programming languages and substantial reuse. The Agile

methods are also mushrooming and also tend to use high-level pro-

gramming languages. Software reuse in some applications now tops 80

percent. LOC metrics cannot be used for most web applications and are

certainly not useful for measuring Scrum sessions and other noncoding

activities that are part of Agile projects.

Programming and Code Development

549

Function point metrics had become the dominant metric for serious

economic and quality studies. But two new problems appeared that

have kept function point metrics from actually becoming the industry

standard for both economic and quality studies.

The first problem is that some software applications are now so large

(greater than 300,000 function points) that normal function point analy-

sis is too slow and too expensive to be used.

There are gaps at both ends of normal function point analysis. Above

15,000 function points, the costs and schedule for counting function point

metrics become so high that large projects are almost never counted.

(Function point analysis operates between 400 and 600 function points

per day per counter. The approximate cost is about $6.00 per function

point counted.)

At the low end of the scale, the counting rules for function points do

not operate below a size of about 15 function points. Thus, small changes

and bug repairs cannot be counted. Individually, such changes may be as

small as 1/50 of a function point and are rarely larger than 10 function

points. But large companies can make 30,000 or more changes per year,

with a total size that can top 100,000 function points.

The second problem is that the success of the original function point

metric has triggered an explosion of function point clones. As of 2009,

there are at least 24 function point variations. This makes benchmark

and baseline studies difficult, because there are very few conversion

rules from one variation to another.

In addition to standard IFPUG function points, there are also Mark

II function points, COSMIC function points, Finnish function points,

Netherlands function points, story points, feature points, web-object

points, and many others.

Although LOC metrics continue to be used, they continue to have such

major errors that they constitute professional malpractice for economic

and quality studies where more than one language is involved, or where

non-coding issues are significant.

There is also a psychological problem. LOC usage tends to fixate atten-

tion on coding and make the other kinds of software work invisible. For

large software projects there may be many more noncode workers than

programmers. There will be architects, designers, database administra-

tors, quality assurance, technical writers, project managers, and many

other occupations. But since none of these can be measured using LOC

metrics, the LOC literature ignores them.

Lines of Code Metrics Circa 2010

It would be nice to predict an optimistic future, but the recession has

changed the nature of industry and the future is now uncertain.

550

Chapter Eight

If current trends continue, within a few more years the software

industry will have more than 3000 programming languages, of which

about 2900 will be obsolete or nearly dead languages. The industry

will have more than 20 variations for counting lines of code, more than

50 variations for counting function points, and probably another 20

unreliable metrics such as story points, use-case points, cost per defect,

or using percentages of unknown numbers. (The software industry loves

to make claims such as "improve productivity by 10 to 1" without defin-

ing either the starting or the ending point.)

Future generations of sociologists will no doubt be interested in why

the software industry spends so much energy on creating variations of

things, and so little energy on fundamental issues. No doubt large proj-

ects will still be cancelled, litigation for failures will still be common,

software quality will still be bad, software productivity will remain low,

security flaws will be alarming, and the software literature will con-

tinue to offer unsupported claims without actually presenting quanti-

fied data.

What the software industry needs is actually fairly straightforward:

1. Measures of defect potentials from all sources expressed in terms of

function points; that is, requirements defects, design defects, code

defects, document defects, and bad fixes.

2. Measures of defect removal efficiency levels for all forms of inspec-

tion, static analysis, and testing.

3. Activity-based productivity benchmarks from requirements through

delivery and then for maintenance and customer support from

delivery to retirement using function points.

4. Certified sources of reusable material near the zero-defect level.

5. Much improved security methods to guard against viruses, spyware,

and hacking.

6. Licenses and board-certification for software engineering specialties.

But until measurement becomes both accurate and cost-effective,

none of these are likely to occur. An occupation that will not measure

its own performance with accuracy is not a true profession.

Lines of Code Circa 2020

If we look forward to 2020, there are best-case and worst-case scenarios

to consider.

The best-case scenario for lines of code metrics is that usage dimin-

ishes even faster than it has been and that economic productivity based

on delivery becomes the industry focus rather than development and

Programming and Code Development

551

lines of code. For this scenario to occur, the speed of function point analy-

sis needs to increase and the cost per function point counted needs to

decrease from about $6.00 per function point counted to less than $0.10

per function point counted, which is technically possible and indeed

occurs in 2009, although the high-speed methods are not yet widely

deployed since they are so new.

If these changes occur, then function point usage will increase at least

tenfold, and many new kinds of economic studies can be carried out.

Among these will be measurement of entire portfolios that might top

10 million function points. Corporate backlogs could be sized and pri-

oritized, and some of these exceed 1 million function points. Risk/value

analyses for major software applications could become both routine

and professionally competent. It will also be possible to do economic

analyses of interesting new technologies such as the Agile methods,

service-oriented architecture (SOA), software as a service (SaaS), and

of course total cost of ownership (TCO).

Under the best-case scenario, software engineering would evolve from

a craft or art form into a true engineering discipline. Reliable measures

of all activities and tasks will lead to greater success rates on large soft-

ware applications. The goal of software engineering should be to become

a true engineering discipline with recognized specialties, board certifica-

tion, and accurate information on productivity, quality, and costs. But

that cannot be accomplished when project failures outnumber successes

for large applications.

So long as quality and productivity are ambiguous and uncertain, it

is difficult to carry out multiple regression studies and to select really

effective tools and methods. LOC metrics have been a major barrier to

economic and quality studies for software.

The worst-case scenario is that LOC metrics continue at about the

same level as 2009. The software industry will continue to ignore eco-

nomic productivity and remain fixated on the illusory "lines of code per

month" metric. Under the worst-case scenario, "software engineering"

will remain an oxymoron. Trial-and-error methods will continue to dom-

inate, in part because effective tools and methodologies cannot even be

studied using LOC metrics. Under the worst-case scenario, failures and

project disasters will remain common for large software applications.

Function point analysis will continue to serve an important role for

economic studies, benchmarks, and baselines, but only for about 10

percent of software applications of medium size. The cost per function

point under the worst-case scenario will remain so high that usage

above 15,000 function points will continue to be very rare. There will

probably be even more function point variations, and the chronic lack

of conversion rules from one variation to another will make large-scale

international economic studies almost impossible.

552

Chapter Eight

Summary and Conclusions

The history of lines of code metrics is a cautionary tale for all people

who work in software. The LOC metric started out well and was fairly

effective when there was only one programming language and coding

was so difficult it constituted 90 percent of the total effort for putting

software on a computer.

But the software industry began to develop hundreds of program-

ming languages. Applications started to use multiple programming

languages, and that remains the norm today. Applications grew from

less than 1000 lines of code up to more than 10 million lines of code.

Coding is the major task for small applications, but for large systems,

the work shifts to defect removal and production of paper documents

in the forms of requirements, specifications, user manuals, test plans,

and many others.

The LOC metric was not able to keep pace with either change. It does

not work well when there is ambiguity in counting code, which always

occurs with high-level languages and multiple languages in the same

application. It does not work well for large systems where coding is only

a small fraction of the total effort.

As a result, LOC metrics became less and less useful until sometime

around 1985 they started to become actually harmful. Given the errors

and misunderstandings that LOC metrics bring to economic, productiv-

ity, and quality studies, it is fair to say that in many situations usage

of LOC metrics can be viewed as professional malpractice if more than

one programming language is part of the study or the study seeks to

measure real economic productivity.

The final point is that continued usage of LOC metrics is a significant

barrier that is delaying the progress of software engineering from a

craft to a true engineering discipline. An occupation that cannot even

measure its own work with accuracy is hardly qualified to be called

engineering.

Readings and References

Barr, Michael and Anthony Massa. Programming Embedded Systems: With C and GNU

Development Tools. Sebastopol, CA: O'Reilly Media, 2006.

Beck, K. Extreme Programming Explained: Embrace Change. Boston, MA: Addison

Wesley, 1999.

Bott, Frank, A. Coleman, J. Eaton, and D. Rowland. Professional Issues in Software

Engineering, Third Edition. London and New York: Taylor & Francis, 2000.

Cockburn, Alistair. Agile Software Development. Boston, MA: Addison Wesley, 2001.

Cohen, D., M. Lindvall, & P. Costa, "An Introduction to agile methods." Advances in

Computers. New York: Elsevier Science (2004): 166.

Garmus, David and David Herron. Function Point Analysis. Boston: Addison Wesley,

2001.

Garmus, David and David Herron. Measuring the Software Process: A Practical Guide

to Functional Measurement. Englewood Cliffs, NJ: Prentice Hall, 1995.

Programming and Code Development

553

Glass, Robert L. Facts and Fallacies of Software Engineering (Agile Software

Development). Boston: Addison Wesley, 2002.

Hans, Professor van Vliet. Software Engineering Principles and Practices, Third

Edition. London, New York: John Wiley & Sons, 2008.

Highsmith, Jim. Agile Software Development Ecosystems. Boston, MA: Addison Wesley,

2002.

Humphrey, Watts. PSP: A Self-Improvement Process for Software Engineers. Upper

Saddle River, NJ: Addison Wesley, 2005.

Humphrey, Watts. TSP--Leading a Development Team. Boston, MA: Addison Wesley,

2006.

Hunt, Andrew and David Thomas. The Pragmatic Programmer. Boston, MA: Addison

Wesley, 1999.

Jeffries, R., et al. Extreme Programming Installed. Boston, MA: Addison Wesley, 2001.

Jones, Capers. Applied Software Measurement, Third Edition. New York, NY: McGraw-

Hill, 2008.

Jones, Capers. Conflict and Litigation Between Software Clients and Developers,

Version 6. Burlington, MA: Software Productivity Research, June 2006. 54 pages.

Jones, Capers. Estimating Software Costs, Second Edition. New York, NY: McGraw-Hill,

2007.

Jones, Capers. Software Assessments, Benchmarks, and Best Practices. Boston, MA:

Addison Wesley Longman, 2000.

Jones, Capers. "The Economics of Object-Oriented Software." American Programmer

Magazine, October 1994: 2935.

Kan, Stephen H. Metrics and Models in Software Quality Engineering, Second Edition.

Boston, MA: Addison Wesley Longman, 2003.

Krutchen, Phillippe. The Rational Unified Process--An Introduction. Boston, MA:

Addison Wesley, 2003.

Larman, Craig &, Victor Basili. "Iterative and Incremental Development--A Brief

History." IEEE Computer Society, June 2003: 4755.

Love, Tom. Object Lessons. New York, NY: SIGS Books, 1993.

Marciniak, John J. (Ed.) Encyclopedia of Software Engineering. (2 vols.) New York, NY:

John Wiley & Sons, 1994.

McConnell, Steve. Code Complete. Redmond, WA: Microsoft Press, 1993.

------ Software Estimation--Demystifying the Black Art. Redmond, WA: Microsoft

Press, 2006.

Mills, H., M. Dyer, & R. Linger. "Cleanroom Software Engineering." IEEE Software, 4, 5

(Sept. 1987): 1925.

Morrison, J. Paul. Flow-Based Programming. A New Approach to Application

Development. New York, NY: Van Nostrand Reinhold, 1994.

Park, Robert E. SEI-92-TR-20: Software Size Measurement: A Framework for Counting

Software Source Statements. Pittsburgh, PA: Software Engineering Institute, 1992.

Pressman, Roger. Software Engineering--Practitioner's Approach, Sixth Edition. New

York, NY: McGraw-Hill, 2005.

Putnam, Lawrence and Ware Myers. Industrial Strength Software--Effective

Management Using Measurement. Los Alamitos, CA: IEEE Press, 1997.

------ Measures for Excellence--Reliable Software On-Time Within Budget. Englewood

Cliffs, NJ: Yourdon Press, Prentice Hall, 1992.

Sommerville, Ian. Software Engineering, Seventh Edition. Boston, MA: Addison Wesley,

2004.

Stapleton, J. DSDM--Dynamic System Development Method in Practice. Boston, MA :

Addison Wesley, 1997.

Stephens M. and D. Rosenberg. Extreme Programming Refactored: The Case Against

XP. Berkeley, CA: Apress L.P., 2003.

This page intentionally left blank

Table of Contents: