
Programming and Code Development:How Many Programming Languages Are Really Needed?

<< Requirements, Business Analysis, Architecture, Enterprise Architecture, and Design:Software Requirements
Software Quality: The Key to Successful Software Engineering, Measuring Software Quality, Software Defect Removal >>
Programming and Code
This chapter has an unusual slant compared with other books on soft-
ware engineering. Among other topics, it deals with 12 important ques-
tions that are not well covered in the software engineering literature:
1. Why do we have more than 2500 programming languages?
2. Why does a new programming language appear more than once a
3. How many programming languages are really needed by software
4. Why do most modern applications use between 2 and 15 different
5. How many applications being maintained are written in "dead"
programming languages with few programmers?
6. How many programmers use major languages; how many use minor
7. Should there be a national translation center that maintains com-
pilers and tools for dead programming languages and that can con-
vert antique languages into modern languages?
8. What are the major kinds of bugs found in source code?
9. How effective are debuggers and static analysis tools compared
with inspections?
Chapter Eight
10. How effective are various kinds of testing in terms of bug
11. How effective is reusable code in terms of quality, security, and
12. Why has the "lines of code" metric stopped being effective for soft-
ware economic studies?
These 12 topics are not the only topics that are important about pro-
gramming, but they are not discussed often in software engineering
journals or books. Following are discussions of the 12 topics.
A Short History of Programming and
Language Development
It is interesting to consider the history of programming and the devel-
opment of programming languages. The early history of mechanical
computers driven by gears, cogs, and later punched cards is interest-
ing, but not relevant. However, these devices did embody the essence of
computer programming, which is to control the behavior of a mechanical
device by means of discrete instructions that could be varied in order to
change the behavior of the machine.
The pioneers of computer design include Charles Babbage, Ada
Lovelace, Hermann Hollerith, Alan Turing, John Von Neumann, Conrad
Zuse, J. Presper Eckert, John Mauchly, and a number of others. John
Backus, Konrad Zuse, and others contributed to the foundations of pro-
gramming languages. David Parnas and Edsger Dijkstra contributed
to the development of structured programming that minimized the ten-
dency of code branching to form "spaghetti bowls" of so many branches
that the code became nearly unreadable.
Ada Lovelace was an associate of Charles Babbage. In 1842 and 1843,
she described a method of calculating Bernoulli numbers for use on the
Babbage analytical engine. Her work is often cited as the world's first
computer program, although there is some debate about this.
In the years during and prior to World War II, a number of companies
in various countries built electro-mechanical computing devices, pri-
marily for special purposes such as calculating trajectories or handling
mathematical tasks.
The earliest models were "programmed" in part by changing wire con-
nections or using plug boards. But during World War II, computing devices
were developed with memory that could store both data and instructions.
The ability to have language instructions stored in memory opened the
gates to modern computer programming as we know it today.
Konrad Zuse of Germany built the Z3 computer in 1941 and later
designed what seems to be the first high-level language, Plankalkül,
Programming and Code Development
in 1948, although no compiler was created and the language was not
The earliest "languages" that were stored in computers were binary
codes or machine languages, which obviously were extremely difficult to
understand, code, or modify. The difficulty of working with machine codes
directly led to languages that were more amenable to human under-
standing but capable of being translated into machine instructions.
The earliest of these languages were termed assembly languages and
usually had a one-to-one correspondence between the human-readable
instructions (called source code) and the executable instructions (called
object code).
The idea of developing languages that humans could use to describe
various algorithms or data manipulation steps proved to be so useful that
very shortly a number of more specialized languages were developed.
In these languages the human portions were optimized for certain
kinds of problems, and the work of translating the languages into
machine code was left to the compilers. Incidentally, the main difference
between an assembler and a compiler is that assemblers tend to have a
one-to-one ratio between source code and object code, while compilers
have a one-to-many ratio. In other words, one statement in a compiled
language might generate a dozen or more machine instructions.
The ability to translate a single source instruction into many object
instructions led to the concept of high-level programming languages. In
general, the higher the level of a programming language, the more object
code can be created from a single source code statement.
Both assembly and compilation were handled by special translation
programs as batch activities. The source code could not be run immedi-
ately. Sometimes translation might be delayed for hours if the computer
was being used for other work and other computers were not available.
These delays led to another form of code translation. Programming
language translators called interpreters were soon developed, which
allowed source code to be converted into object code immediately.
In the early days of computing and programming, software was used
primarily for a narrow range of mathematical calculations. But the
speed of digital computers soon gave rise to wider ranges of applica-
tions. When computers started to be aimed at business problems and
to manipulate text and data, it became obvious that if the source code
included some of the language and vocabulary of the problem domain,
then programming languages would be easier to learn and use. The use
of computers to control physical devices opened up yet another need for
languages optimized for dealing with physical objects.
As a result, scores of domain-specific programming languages were
developed that were aimed at tasks such as list processing, business
applications, astronomy, embedded applications, and a host of others.
Chapter Eight
Why Do We Have More than 2500
Programming Languages?
The concept of having source code optimized for specific kinds of busi-
ness or technical problems is one of the factors that led to the enormous
proliferation of programming languages.
There are some technical advantages for having programming lan-
guages match the vocabulary of various problem domains. For one thing,
such languages are easy to learn for programmers who are familiar with
the various domains.
It is actually fairly easy to develop a new programming language.
As computers began to be used for more and more kinds of problems,
the result was more and more programming languages. Developing a
new programming language that attracted other programmers also had
social and prestige value.
As a result of these technical and social reasons, the software industry
developed new programming languages with astonishing frequency.
Today, as of 2009, no one really knows the actual number of program-
ming languages and dialects, but the largest published lists of program-
ming languages now contain 2500 languages (The Language List by Bill
Kinnersley, http://people.ku.edu).
The author's former company, Software Productivity Research, has
been keeping a list of common programming languages since 1984, and
the current version contains more than 700 programming languages.
New programming languages continue to come out at a rate of two or
three per calendar month; some months, more than 10 languages have
arrived. There is no end in sight.
One reason for the plethora of languages is that a new language can
be developed by a single software engineer in only a month or two. In
fact, with compiler-compilers, a new programming language can evolve
from a vague idea to compiled code in 60 days or less.
In 1984, the author's first commercial software estimating tool was
put on the market. The first release of the tool could perform cost and
quality estimates for 30 different programming languages, but the tool
itself could handle other languages using the same logic and algorithms.
Therefore, we made a statement to customers that our tool could sup-
port cost estimates for "all known programming languages."
Having made the claim, it was necessary to back it up by assembling
a list of all known programming languages and their levels. At the time
the claim was made in 1984, the author hypothesized that the list might
include 50 languages. However, when the data was collected, it was
discovered that the set of "all known programming languages" included
about 250 languages and dialects circa 1984.
It was also discovered while compiling the list that new languages
were popping up about once a month; sometimes quite a few more.
Programming and Code Development
It became obvious that keeping track of languages was not going to be
quick and easy, but would require continuous effort.
Today, as of 2009, the current list of languages maintained by Software
Productivity Research has grown to more than 700 programming lan-
guages, and the frequency with which new languages come out seems
to be increasing from about one new language per month up to perhaps
two or even four and occasionally ten new languages per month.)
An approximate chronology of significant programming languages is
shown in Table 8-1.
Table 8-1 is only a tiny subset of the total number of programming
languages. It is included just to give readers who may not be practicing
programmers an idea of the wide variety of languages in existence.
Those familiar with programming concepts can see from the list that
programming language design took two divergent paths:
Specialized languages that were optimal for narrow sets of problems
such as FORTRAN, Lisp, ASP, and SQL
General-purpose languages that could be used for a wide range of
problems such as Ada, Objective C, PL/I, and Ruby.
It is of sociological interest that the vast majority of special-purpose
languages were developed by individuals or perhaps two individuals.
For example, Basic was developed by John Kemeny and Thomas Kurtz;
C was developed by Dennis Ritchie; FORTRAN was developed by John
Backus; Java was developed by James Gosling; and Objective C was
developed by Brad Cox and Tom Love.
The general-purpose languages were usually developed by commit-
tees. For example, COBOL was developed by a famous committee with
major inputs from Grace Hopper of the U.S. Navy. Other languages
developed by committees include Ada and PL/I. However, some general-
purpose languages were also developed by individuals or colleagues,
such as Ruby and Objective C.
For reasons that are perhaps more sociological than technological, the
attempts at building general-purpose languages such as PL/I and Ada
have not been as popular with programmers as many of the special-
purpose languages.
This is a topic that needs both sociological and technical research,
because PL/I and Ada appear to be well designed, robust, and capable
of tacking a wide variety of applications with good results.
Another major divergence in programming languages occurred during
the late 1970s, although research had started earlier. This is the split
between object-oriented languages such as SMALLTALK, C++, and
Objective C and languages that did not adopt OO methods and termi-
nology, such as Basic, Visual Basic, and XML.
Chapter Eight
Chronology of Programming Language Development
Assembly languages
FORTRAN (Formula Translator)
Lisp (List Processing)
COBOL (Common Business-Oriented Language)
JOVIAL (Jules Own Version of the International Algorithmic Language)
RPG (formerly Report Program Generator)
ALGOL (Algorithmic Language)
APL (A Programming Language)
Basic (Beginner's all-purpose symbolic instruction code)
SQL (Structured query language)
Quick Basic
Objective C
Visual Basic
HTML (Hypertext Markup Language)
XML (Extensible Markup Language)
ASP (Active Server Pages)
Today in 2009, more than 50 percent of active programming languages
tend to be in the object-oriented camp, while the other languages are
procedural languages, functional languages, or use some other method
of operation.
Programming and Code Development
Yet another dichotomy among programming languages is whether
they are typed or un-typed. The term typed means that operations in
a language are restricted to only specific data types. For example, a
typed language would not allow mathematical operations against char-
acter data. Examples of typed languages include Ruby, SMALLTALK,
and Lisp.
The opposite case, or un-typed languages, means that operations can
be performed against any type of data. Examples of un-typed languages
include assembly language and Forth.
The terms type and un-typed are somewhat ambiguous, as are the
related terms of strongly typed and weakly typed. Over and above ambi-
guity, there is some debate as to the virtues and limits of typed versus
un-typed languages.
Exploring the Popularity of Programming
There are a number of ways of studying the usage and popularity of
programming languages. These include
1. Statistical analysis of web searches for specific languages
2. Statistical analysis of books and articles published about specific
3. Statistical analysis of citations in the literature about specific lan-
4. Statistical analysis of job ads for programmers that cite language
5. Surveys and statistical analysis of languages in legacy applica-
6. Surveys and statistical analysis of languages used for new applica-
A company called Tiobe publishes a monthly analysis of programming
language popularity that ranks 100 different programming languages.
Since this section is being written in May 2009, the 20 most popular lan-
guages for this month from the Tiobe rankings are listed in Table 8-2.
Older readers may wonder where COBOL, FORTRAN, PL/I, and Ada
reside. They are further down the Tiobe list in languages 21 through 40.
Since new languages pop up at a rate of more than one per month,
language popularity actually fluctuates rather widely on a monthly
basis. As interesting new programming languages appear, their popu-
larity goes up rapidly. But based on their utility or lack of utility over
longer periods, they may drop down again just as fast.
Chapter Eight
Popularity Ranking of Programming Languages as of May 2009
Visual Basic
RPG (OS/400)
The popularity of programming languages bears a certain resemblance
to the popularity of prime-time television shows. Some new shows such
as Two and a Half Men surface, attract millions of viewers, and may last
for a number of seasons. A few shows such as Seinfeld become so popular
that they go into syndication and continue to be aired long after produc-
tion stops. But many shows are dropped after a single season.
It is interesting that the life expectancy of programming languages
and the life expectancy of television shows are about the same. Many
programming languages have active lives that span only a few "seasons"
and then disappear. Other languages become standards and may last for
many years. However, when all 2500 languages are considered, the aver-
age active life of a programming language when it is being used for new
development is less than five years. Very few programming languages
attract development programmers after more than ten years.
Some of the languages that are in the class of Seinfeld or I Love Lucy
and may last more than 25 years under syndication include
Programming and Code Development
Objective C
Visual Basic
In a programming language context, the term syndication means that
the language is no longer under the direct control of its originator, but
rather control has passed to a user group or to a commercial company,
or that the language has been put in the public domain and is available
via open-source compilers.
It would be interesting and valuable if there were benchmarks and
statistics kept of the numbers of applications written in these long-lived
programming languages. No doubt C and COBOL have each been used
for more than 1 million applications on a global basis.
In fact, continuing with the analogy of the entertainment business,
it might be interesting to have awards for languages that have been
used for large numbers of applications. Perhaps "silver" might go for
100,000 applications, "gold" for 1 million applications, and "platinum"
for 10 million applications.
If such an award were created, a good name for it might be the
"Hopper," after Admiral Grace Hopper, who did so much to advance
programming languages and especially COBOL. In fact, COBOL is prob-
ably the first programming language in history to achieve the 1-million-
application plateau.
Although the idea of awards for various numbers of applications is
interesting, that would mean that statistics were available for ascer-
taining how many applications were created in specific languages or
combinations of languages. As of 2009, the software industry does not
keep such data.
The choice of which language should be used for specific kinds of
applications is surprisingly subjective. A colleague at IBM was asked
in a meeting if he programmed in the APL language. His response was,
"No, I'm not of that faith."
It would be technically possible to develop a standard method of
describing and cataloging the features of programming languages.
Indeed, with more than 2500 languages in existence, such a catalog is
urgently needed. Even if the catalog only started with 100 of the most
widely used languages, it would provide valuable information.
Chapter Eight
The full set of topics included to create an effective taxonomy of pro-
gramming languages is outside the scope of this book, but might contain
factors such as:
Language name
Name of language
Object-oriented, functional, procedural, etc.
Year of creation, names of inventors
URLs of distributors of language compilers
Current version
Version number of current release; 1, 2, or
URLs or addresses of maintenance organizations
User associations
Names, URLs, and locations of user groups
Tutorial materials
Books and learning sources about the language
Reviews or critiques
Published reviews of language in refereed
Legal status
Public domain, licensed, patents, etc.
Language definition
Whether it is formal, informal
Language syntax
Description of syntax
Language typing
Strongly typed, weakly typed, un-typed, etc.
Problem domains
Mathematics, web, embedded, graphics, etc.
Hardware platforms
Hardware language was intended to support
OS platforms
Operating systems language compilers work
Intended uses
Targeted application types
Known limitations
Performance, security, problem domains, etc.
Variations of the basic language
Companion languages
.NET, XML, etc. (languages used jointly)
Commands added by language users
Logical statements relative to assembly
Backfire level
Logical statements per function point
Reuse sources
Certified modules, uncertified, etc.
Security features
Intrinsic security features, such as in
the E language
Debuggers available
Names of debugging tools
Static analysis available
Names of static analysis tools
Development tools available
Names of development tools
Maintenance tools available
Names of maintenance tools
Applications to date
Approximately 100, 1000, 10,000, 100,000, etc.
Given the huge number of programming languages, it is surprising
that no standard taxonomy exists. Web searches reveal more than a dozen
topics when using search arguments such as "taxonomies of program-
ming languages" or "categories of programming languages." However,
Programming and Code Development
these vary widely, and some contain more than 50 different descriptive
forms, but seem to lack any fundamental organizing principle.
Returning now to the main theme, somewhat alarmingly, the life
expectancy of many software applications is longer than the active life
of the languages in which they were written. An example of this is the
patient-record systems of medical records maintained by the Veterans
Administration. It is written in the MUMPS programming language
and has far outlived MUMPS itself.
It is obvious to students of software engineering economics that if
programming languages have an average life expectancy of only 5 years,
but large applications last an average of 25 years, then software mainte-
nance costs are being driven higher than they should be due to the very
large number of aging applications that were coded in programming
languages that are now dead or dying.
How Many Programming Languages
Are Really Needed?
The plethora of programming languages raises basic questions that
need to be addressed by the software engineering literature: How many
programming languages does software engineering really need?
Having thousands of programming languages raises a corollary ques-
tion: Is the existence of more than 2500 programming languages a good
thing or a bad thing?
The argument that asserts having thousands of languages is a good
thing centers around the fact that languages tend to be optimized for
unique classes of problems. As new problems are encountered, they
demand new programming languages, or at least that is a hypothesis.
The argument that asserts having thousands of languages is a bad
thing centers around economics. Maintenance of legacy applications
written in dead languages is an expensive nightmare. The constant
need to train development programmers in the latest cult language
is expensive. Many useful tools such as static analysis tools and auto-
mated test tools support only a small subset of programming languages,
and therefore may require expensive modifications for new languages.
Accumulating large volumes of certified reusable code is more difficult
and expensive if thousands of languages have to be dealt with.
The existence of thousands of programming languages has created a
new subindustry within software engineering. This new subindustry is
concerned with translating dead or dying languages into new living lan-
guages. For example, it is now possible to translate the MUMPS language
circa 1967 into the C or Java languages and to do so automatically.
A corollary subindustry is that of renovation or periodically perform-
ing special maintenance activities on legacy applications to clean out
Chapter Eight
dead code, remove error-prone modules, and to reduce the inevitable
increase in cyclomatic and essential complexity that occurs over time
due to repeated small changes.
Linguists and those familiar with natural human languages are
aware that translation from one language to another is not perfect. For
example, some Eskimo dialects include more than 30 different words
for various kinds of snow. It is hard to get an exact translation into a
language such as English that developed in a temperate climate and
has only a few variations on "snow."
Since many programming languages have specialized constructs for
certain classes of problem, the translation into other languages may lead
to awkward constructs that might be difficult for human programmers
to understand or deal with during maintenance and enhancement work.
Even so, if the translation opens up a dead language to a variety of static
analysis and maintenance tools, the effort is probably worthwhile.
To deal with the question of how many programming languages are
needed, it is useful to start by considering the universe of problem areas
that need to be put onto computers. There seem to be ten discrete prob-
lem areas, divided into two different major kinds of processing, as shown
in Table 8-3.
These two general categories reflect the major forms of software that
actually exist today: (1) software that processes information, and (2)
software that controls physical devices or deals with physical properties
such as sound or light or music.
These two broad categories might lead to the conclusion that per-
haps two programming languages would be the minimum number that
would be able to address all problem areas. One language would be
optimized for information systems, and another would be optimized
for dealing with physical devices and electronic signals. However, the
Problem Domains of Software Applications
Logical and Mathematical Problem Areas
Mathematical calculations
Logic and algorithmic expressions
Numerical data
Text and string data
Time and dates
Physical Problem Areas
Sensor-based electronic signals
Audible signals and music
Static images
Dynamic or moving images
Programming and Code Development
track records of general-purpose languages such as PL/I and Ada have
not indicated much success for languages that attempt to do too many
things at once.
Few problems are "pure" and deal with only one narrow topic. In fact,
most applications deal with hybrid problem domains. This leads to a
possible conclusion that programming languages may reflect the permu-
tations of problem areas rather than the problem areas individually.
If the permutations of all ten problem areas were considered, then we
might eventually end up with 3,628,800 programming languages. This
is even more unlikely to occur than having one "superlanguage" that
could tackle all problem areas.
From examining samples of both information processing applications
and embedded and systems software applications, a provisional hypoth-
esis is that about four different problem areas occur in typical software
applications. The permutation of four topics out of a total of ten topics
leads to the hypothesis that the software engineering domain will even-
tually end up with about 5,040 different programming languages.
Since we already have about 2500 programming languages and dia-
lects in 2009, there may yet be another 2500 languages still to be devel-
oped in the future. At the rate new languages are occurring of roughly
100 per year, it can be projected that new languages will proceed at
about the same rate for another 25 years. From an economic standpoint,
this does not seem to be a very cost-effective engineering solution.
Assuming that the software engineering community does reach 5040
languages, the probable distribution of those languages would be
4800 languages would be dead or dying, with few programmers
200 languages would be in legacy applications and therefore need
40 languages would be new and gathering increasing numbers of
A technical alternative to churning out another 2500 specialized lan-
guages for every new kind of problem that surfaces would be to consider
building polymorphic compilers that would support any combination of
problem areas.
Creating a National Programming
Language Translation Center
When considering alternatives to churning out another 2500 program-
ming languages, it might be of value to create a formal programming
language translation center stocked with the language definitions of all
known programming languages.
Chapter Eight
This center could provide guidance in the translation of dead or dying
languages into modern active languages. Some companies already per-
form translation, but out of today's total of 2500 languages, only a few
are handled with technical and linguistic accuracy. Automated transla-
tion as of 2009 probably only handles 50 languages out of 2500 total
Given the huge number of existing programming languages and the
rapid rate of creation of new programming languages, such a transla-
tion center would probably require a full-time staff of at least 50 person-
nel. This would mean that only very large companies such as IBM or
Microsoft or large government agencies such as Homeland Security or the
Department of Defense would be likely to attempt such an activity.
Over and above translation, the national programming language
translation center could also perform careful linguistic analyses of all
current languages in order to identify the main strengths and weak-
nesses of current languages. One obvious weakness of most languages
is that they are not very secure.
Another function of the translation center would be to record demo-
graphic information about the numbers and kinds of applications that
use various languages. For example, the languages used for financial
systems, for weapons systems, for medical applications, for taxation
systems, and for patient records have economic and even national
importance. It would be useful to keep records of the programming
languages used for such vital applications. Obviously, maintenance and
restoration of these vital applications has major business and national
Table 8-4 is a summary of 40 kinds of software applications that
have critical importance to the United States. Table 8-4 also shows the
various programming languages used in these 40 kinds of applications.
A major function of a code translation center would be to accumu-
late more precise data on critical applications and the languages used
in them.
Both columns of Table 8-4 need additional research. There are no
doubt more kinds of critical applications than the 40 listed here. Also, in
order to fit on a printed page, the second column of the table is limited
to about six or seven programming languages. For many of these criti-
cal applications, there may be 50 or more languages in use at national
The North American Industry Classification (NAIC) codes of the
Department of Commerce identify at least 250 industries that the
author knows create software in substantial volumes. However, the 40
industries shown in Table 8-4 probably contain almost 50 percent of
applications critical to U.S. business and government operations.
Programming and Code Development
Programming Languages Used for Critical Software Applications
Critical Software
Programming Languages
1. Air traffic control
Ada, Assembly, C, Jovial, PL/I
2. Antivirus & security
ActiveX, C, C++, Oberon7
3. Automotive engines
C, C++, Forth, Giotto
4. Banking applications
5. Broadband
6. Cell phones
C, C++, C#, Objective C
7. Credit cards
ASP.NET, C, COBOL, Java, Perl, PHP, PL/I
8. Credit checking
9. Credit unions
10. Criminal records
11. Defense applications
Ada, Assembly, C, CMS2, FORTRAN, Java, Jovial, SPL
12. Electric power
Assembly, C, DCOPEJ, Java, Matpower
13. FBI, CIA, NSA, etc.
Ada, APL, Assembly, C, C++, FORTRAN, Hancock
14. Federal taxation
C, COBOL, Delphi, FORTRAN, Java, SQL
15. Flight controls
Ada, Assembly, C, C++, C#, LabView
16. Insurance
17. Mail and shipping
COBOL, dBase2, PL/I, Python, SQL
18. Manufacturing
AML, APT, C, Forth, Lua, RLL
19. Medical equipment
Assembly, Basic, C, CO, CMS2, Java
20. Medical records
21. Medicare
Assembly, COBOL, Java, PL/I, dBase2, SQL
22. Municipal taxation
C, COBOL, Delphi, Java
23. Navigation
Assembly, C, C++, C#, Lua, Logo, MatLab
24. Oil and Energy
25. Open-source software
C, C++, JavaScript, Python, Suneido, XUL
26. Operating systems, large
Assembly, C, C#, Objective C, PL/S, VB
27. Operating systems, small C, C++, Objective C, OSL, SR
28. Pharmaceuticals
C, C++, Java, PASCAL, SAS, Visual Basic
29. Police records
C, COBOL, DBase2, Hancock, SQL
30. Satellites
C, C++, C#, Java, Jovial, PHP, Pluto
31. Securities trading
ABAP, C #,COBOL, DBase2, Java, SQL
32. Social Security
Assembly, COBOL, PL/I, dBase2, SQL
33. State taxation
C, COBOL, Delphi, FORTRAN, Java, SQL
34. Surface transportation
35. Telephone switching
36. Television broadcasts
C, C++, C#, Java, Forth
37. Voting equipment
Ada, C, C++, Java
38. Weapons systems
Ada, Assembly, C, C++, Jovial
39. Web applications
AppleScript, ASP, CMM, Dylan, E, Perl, PHP, .NET
40. Welfare (State)
Chapter Eight
As a result of the importance of these 40 software application areas
to the United States business and to government operations, they prob-
ably receive almost 75 percent of cyberattacks in the form of viruses,
spyware, search-bots, and denial of service attacks. These 40 industries
need to focus on security. Even a cursory examination of the program-
ming languages used by these industries reveals that few of them are
particularly resistant to viruses or malware attacks.
For all 40, maintenance is expensive and for many, it is growing
progressively more expensive due to the difficulty of simultaneously
maintaining applications written in so many different programming
As a technical byproduct of translation from older languages to new
languages, one value-added function of a national programming lan-
guage translation center would be to eliminate security vulnerabilities
at the same time the older languages are being translated.
If the language translation center operated as a profit-making busi-
ness, it might well grow a good-sized company. Assuming the company
billed at the same rate as Y2K companies (about $1.00 per logical state-
ment), a national translation center might clear $75 million per year,
assuming accurate and competent translation technology.
What the author suggests is that rather than continue to develop
random programming languages at random but rapid intervals, there is
a need to address programming languages at a fundamental linguistic
A study team that included linguists, software engineers, and domain
specialists might be able to address the problems of the most effective
ways of expressing the ten problem areas and their permutations. The
goal would be to understand the minimum set of programming lan-
guages capable of handling any combination of problem areas.
If economists were added to the study team, they would also be able to
address the financial impact of attempting to maintain and occasionally
renovate applications written in hundreds of dead and dying program-
ming languages.
Why Do Most Applications Use Between 2
and 15 Programming Languages
A striking phenomenon of software engineering is the presence of mul-
tiple programming languages in the same applications. This is not a
new trend, and many older applications used combinations such as
COBOL and SQL. More recent combinations might include Java and
A similar phenomenon is the fact that many programming lan-
guages are themselves combinations of two or more other programming
Programming and Code Development
languages. For example, the Objective C language combines features
from SMALLTALK and C. The Ruby language combines features from
Ada, Eiffel, Perl, and Python among others.
Recall that a majority of programming languages are somewhat
specialized, and these seem to be more popular than general-purposes
languages. A hypothesis that explains why applications use several
different programming languages is that the "problem space" of the
application is broader than the "solution space" of individual program-
ming languages.
It was mentioned earlier that many applications include at least
four of the ten problem areas cited in Table 8-3. However, many pro-
gramming languages seem to be optimized only for one to three of the
problem areas. This creates a situation where multiple programming
languages are needed to implement all of the problem areas in the
Of course, using any of the more general-purpose languages such as
Ada or PL/I would reduce the numbers of languages, but for sociological
reasons, these general-purpose languages have not been as popular as
the more specialized languages.
The implications of having many different languages in the same
application are that development is more difficult, debugging is
more difficult, static analysis is more difficult, and code inspection is
more difficult. After release, maintenance and enhancement tasks are more
Table 8-5 illustrates how both development and maintenance costs
go up as the number of languages in an application increase. The costs
show the rate of increase compared with a single language.
Both development and maintenance costs increase as numbers of pro-
gramming languages in the same application increase, but maintenance
is more severely impacted.
Impact of Multiple Languages on Costs
Languages in Application
Development Costs
Maintenance Costs
Chapter Eight
How Many Programmers Use Various
Programming Languages?
There is no real census of either languages used in applications or
number of programmers. While the Department of Commerce and the
Bureau of Labor Statistics do issue reports on such topics in the United
States, their statistics are known to be inaccurate.
A survey done by the author and his colleagues a few years ago found
that the human resources organizations in most large corporations did
not know how many programmers or software engineers were actually
employed. Since government statistics are based on reports from HR
organizations, if they don't know, then HR organizations can't provide
good data to the government.
Among the reasons government statistics probably understate the
numbers of programmers and software engineers is because of ambigu-
ous job titles. For example, some large companies use titles such as
"member of the technical staff" as an umbrella title that might include
software engineers, hardware engineers, systems analysts, and perhaps
another dozen occupations.
Another problem with knowing how many software engineers there
are is the fact that many personnel working on embedded applications
are not software engineers or computer scientists by training, but rather
electrical engineers, aeronautical engineers, telecommunications engi-
neers, or some other type of engineer.
Because the status of these older forms of engineering is higher than
the status of software engineering, many people working on embed-
ded software refuse to be called software engineers and insist on being
identified by their true academic credentials.
The study carried out by the author and his colleagues was to derive
information on the number of software specialists (i.e., quality assurance,
database administration, etc.) employed by large software-intensive com-
panies such as IBM, AT&T, Hartford Insurance, and so forth.
The study included on-site visits and discussions with both HR organi-
zations and also local software managers and executives. It was during
the discussions with local software managers and executives that it was
discovered that not a single HR organization actually had good statistics
on software engineering populations.
Based on on-site interviews with client companies and then extrapola-
tion from their data to national levels, the author assumes that the U.S.
total of software engineers circa 2009 is about 2.5 million. Government
statistics as of 2009 indicate around 600,000 programmers, but these
statistics are low for reasons already discussed. Additionally, the govern-
ment statistics also tend to omit one-person companies and individual
programmers who develop applets or single applications.
Programming and Code Development
About 60 percent of these software engineers work in maintenance
and enhancement tasks, and 40 percent work as developers on new
applications. There are of course variations. For example, many more
developers than maintenance personnel work on web applications,
because all of these applications are fairly new. But for traditional
mainframe business applications and ordinary embedded and systems
software applications, maintenance workers outnumber development
workers by a substantial margin.
Table 8-6 shows the approximate numbers of software engineers by
language for the United States. However, the data in Table 8-6 is hypo-
thetical and not exact. Among the reasons that the data is not exact
is that many software engineers know more than one programming
language and work with more than one programming language.
However, Table 8-6 does illustrate a key point: The most common lan-
guages for software development are not the same as the most common
languages for software maintenance. This situation leads to a great deal
of trouble for the software industry.
The most obvious problem illustrated by Table 8-6 is that it is difficult
to get development personnel to work on maintenance tasks because of
the perceived view that older languages are not as glamorous as modern
A second problem is that due to the differences in programming lan-
guages between maintenance and new development, two different sets
Estimated Number of Software Engineers by Language
Visual Basic
Visual Basic
Objective C
Chapter Eight
of tools are likely to be needed. The developers are interested in using
modern tools including static analysis, automated testing, and other
fairly new innovations.
However, many of these new tools do not support older languages,
so the software maintenance community needs to be equipped with
maintenance workbenches that include tools with different capabilities.
For example, tools that analyze cyclomatic and essential complexity
are used more often in maintenance work than in new development.
Tools that can trace execution flow are also used more widely in main-
tenance work than in development. Another new kind of tool that sup-
ports maintenance more than development can "mine" legacy code and
extract hidden business rules. Yet another kind of tool that supports
maintenance work is tools that can parse the code and automatically
generate function point totals.
It is fairly easy for programmers to learn new languages, but nobody can
possibly learn 2500 programming languages. An average programmer in
the U.S. is probably fairly expert in one language and fairly knowledgeable
in three others. Some may know as many as ten languages. The plethora
of languages obviously introduces major problems in academic training
and in ways of keeping programmers current in their skill sets.
The bottom line is that development and maintenance tool suites are
often very different, and this is due in large part to the differences in
programming languages used for development and for maintenance.
Since the great majority of languages widely used for development
today in 2009 will fall out of service in less than ten years, the software
industry faces some severe maintenance challenges.
Languages used for new development are surfacing at rates of
more than two per month. Most of these languages will be short-lived.
However, some of the applications created in these ephemeral languages
will last for many years. As a result, the set of programming languages
associated with legacy applications that need maintenance is growing
larger at rates that sometimes might top 50 languages per year!
A major economic problem associated with having thousands of
programming languages is that the plethora of languages is driving
up maintenance costs. Ironically, one of the major claims of new pro-
gramming languages is that "they improve programming productivity."
Assuming that such claims are true at all, they are only true for new
development. Every single new language is eventually going to add to
the U.S. software maintenance burden. This is because programming
languages have shorter life expectancies than the applications created
with them. One by one, today's "new" languages will drop out of use
and leave behind hundreds of aging legacy applications with declining
numbers of trained programmers, few effective tools, and sometimes
not even working compilers.
Programming and Code Development
What Kinds of Bugs or Defects
Occur in Source Code?
In 2008 and 2009, a major new study was performed that identified the
most common and serious 25 software bugs or defects. The study was
sponsored by the SANS Institute, with the cooperation of MITRE and
about 30 other organizations.
This new study is deservedly attracting a great deal of attention. In
the history of software quality and security, it will no doubt be ranked
as a landmark report. Indeed, all software engineering groups should
get copies of the report and make it required reading for software engi-
neers, quality assurance personnel, and also for software managers and
Access to the report can be had via either the SANS Institute or
MITRE web sites. The relevant URLS are
In spite of the fact that software engineering is now a major occupa-
tion and millions of applications have been coded, only recently has
there been a serious and concentrated effort to understand the nature
of bugs and defects that exist in source code. The SANS report is signifi-
cant because the list of 25 serious problems was developed by a group
of some 40 experts from major software organizations. As a result, it is
obvious that the problems cited are universal programming problems
and not issues for a single company.
Over the years, many large companies such as IBM, AT&T, Microsoft,
and Unisys have had very sophisticated defect tracking and monitor-
ing systems. These same companies have also used root-cause analy-
sis. Some of the results of these internal defect tracking systems have
been published, but they usually were not perceived as having general
A number of common problems have long been well understood: buffer
overflows, branches to incorrect locations, and omission of error han-
dling are well known and avoided by experienced software engineers.
But that is not the same as attempting a rigorous analysis and quanti-
fication of coding defects.
The SANS report is a very encouraging example of the kind of prog-
ress that can be made when top experts from many companies work
together in a cooperative manner to explore common problems. The
SANS study group included experts from academia, government, and
commercial companies. It is also encouraging that these three kinds of
organizations were able to cooperate successfully. The normal relation-
ship among the three is often adversarial rather than cooperative, so
Chapter Eight
having all three work together and produce a useful report is a fairly
rare occurrence.
Hopefully, the current work will serve as a model of future collabora-
tion that will deal with other important software issues. Some of the
additional topics that might do well in a collaborative mode include:
1. Defect removal methods
2. Economic analysis of software development
3. Economic analysis of software maintenance
4. Software metrics and measurement
5. Software reusability
Some of the organizations that participated in the SANS study include
in alphabetical order:
Aspect Security
Breach Security
Homeland Security
National Security Agency
Perdue University
Red Hat
University of California
This is only a partial list, but it shows that the study included aca-
demia, commercial software organizations, and government agencies.
The overall list of 25 security problems was subdivided into three
larger topical areas. Readers are urged to review the full report, so only
a bare list of topics is included here:
1. Poor input validation
2. Poor encoding of output
3. SQL query structures
Programming and Code Development
4. Web page structures
5. Operating system command structures
6. Open transmission of sensitive data
7. Forgery of cross-site requests
8. Race conditions
9. Leaks from error messages
Resource Management
10. Unconstrained memory buffers
11. Control loss of state data
12. Control loss of paths and file names
13. Hazardous paths
14. Uncontrolled code generation
15. Reusing code without validation
16. Careless resource shutdown
17. Careless initialization
18. Calculation errors
Defense Leakages
19. Inadequate authorization and access control
20. Inadequate cryptographic algorithms
21. Hard coding and storing passwords
22. Unsafe permission assignments
23. Inadequate randomization
24. Excessive issuance of privileges
25. Client/server security lapses
The complete SANS list contains detailed information about each of
the 25 defects and also supplemental information on how the defects
are likely to occur, methods of prevention, and other important issues.
This is why readers are urged to examine the full SANS list.
As of 2009, these 25 problems may occur in more than 85 percent of
all operational software applications. One or more of these 25 problems
can be cited in more than 95 percent of all successful malware attacks.
Needless to say, the SANS list is a very important document that needs
widespread distribution and extensive study.
Chapter Eight
The SANS report is a valuable resource for companies involved in
testing, static analysis, inspections, and quality assurance. It provides
a very solid checklist of topics that need to be validated before code can
be safely released to the outside world.
Logistics of Software Code Defects
While the SANS report does an excellent job of identifying serious soft-
ware and code defects, once the defects are present in the code and the
code is in the hands of users, some additional issues need discussion.
Following is a list of topics that discuss logistical issues associated with
software defects:
1. Defect A problem caused by human beings that causes a software
application to either stop running or to produce incorrect results.
Defects can be errors of commission, where developers did some-
thing wrong, or errors of omission, where developers failed to antici-
pate a specific condition.
2. Defect severity level (IBM definition) Severity 1, software stops
working; Severity 2, major features disabled or incorrect; Severity
3, minor problem; Severity 4, cosmetic error with no operational
3. Invalid defect A problem reported as a defect but which upon
analysis turns out to be caused by something other than the soft-
ware itself. Hardware problems, user errors, and operating system
errors mistakenly reported as application errors are the most
common invalid defects. These total as many as 15 percent of valid
defect reports.
4. Abeyant defect (IBM term) A defect reported by a specific cus-
tomer that cannot be replicated on any other version of the software
except the one being used by the customer. Usually, abeyant defects
are caused by some unique combination of hardware devices and
other applications that run at the same time as the software against
which the defect was reported. These are rare but very difficult to
track down and repair.
5. False positive A code segment initially identified by a static
analysis tool or a test case as a potential defect. Upon further analy-
sis, the code turns out to be correct.
6. Secondhand defects A defect in an application that was not
caused by any overt mistakes on the part of the development team
itself, but instead was caused by errors in a compiler or tool used
by the development team. Errors in code generators and automatic
test tools are examples of secondhand defects. The developers used
Programming and Code Development
the tools in good faith, but as a result, bugs were created. An exam-
ple of a secondhand defect was a compiler error that incorrectly
handled an instruction. The code was compiled and executed, but
the instruction did not operate as defined in the language specifica-
tion. It was necessary to review the machine language listings to
find this secondhand defect since it was not visible in the source
code itself.
7. Undetected defects These are similar to secondhand defects,
but turn out to be due to either incomplete test coverage or to gaps
in static analysis tools. It is widely known that test suites almost
never touch 100 percent of the code in any application, and some-
times less than 60 percent of the code in large applications. To
minimize the impact of undetected defects and partial test cover-
age, it is necessary to use test coverage analysis tools. Major gaps
in coverage may need special testing or formal inspections.
8. Data defects Defects that are not in source code or applications,
but which reside in the data that passes through the application. A
very common example of a data defect would be an incorrect mail-
ing address. Data errors are numerous and may be severe, and they
are also difficult to eliminate. Data defects probably outnumber
code defects, and their status in terms of liability is ambiguous.
More serious examples of data defects are errors in credit reports,
which can lower credit ratings without any legitimate reason and
also without any overt defects in software. Data defects are noto-
riously difficult to repair, in part because there are no effective
quality assurance organizations involved with data defects. In fact,
there may not even be any reporting channels.
9. Externally caused defects A defect that was not originally a
defect, but became one due to external changes such as new tax
laws, changes in pension plans, and other government mandates
that trigger code changes in software applications. An example
would be a change in state sales taxes from 6 percent to 7.5 per-
cent, which necessitates changes in many software applications.
Any application that does not make the change will end up with a
defect even though it may have run successfully for years prior to
the external change. Such changes are frequent but unpredictable
because they are based on government actions.
10. Bad fixes About 7 percent of attempts to repair a software code
defect accidentally contain a new defect. Sometimes there are sec-
ondary and even tertiary bad fixes. In one lawsuit against a soft-
ware vendor, four consecutive attempts to fix a bug in a financial
application added new defects and did not fix the original defect.
The fifth attempt finally got it right.
Chapter Eight
11. Legacy defects These are defects that surface today, but which
may have been hidden in software applications for ten years or
more. An example of a legacy defect was a payroll application that
stopped calculating overtime payments correctly. What happened
was that overtime began to exceed $10.00 per hour, and the field
had been defined with $9.99 as the maximum amount. The problem
was more than ten years old when it first occurred and was identi-
fied. (The original developers of the application were no longer even
employed by the company at the time the problem surfaced.)
12. Reused defects  Between 15 percent and 50 percent of software
applications are based on reused code either acquired commercially
or picked up from other applications. Due to the lack of certifica-
tion of reusable materials, many bugs or errors are in reused code.
Whether liability should be assigned to the developer or to the user
of reused material is ambiguous as of 2009.
13. Error-prone modules (IBM term)  Studies of IBM software dis-
covered that bugs or defects were not randomly distributed but
tended to clump in a small number of places. For example, in the
IMS database product, about 35 modules out of 425 were found to
contain almost 60 percent of total customer-reported bugs. Error-
prone modules are fairly common in large software applications. As
a rule of thumb, about 3 percent of the modules in large systems
are candidates for being classified as error-prone modules.
14. Incident An incident is an abrupt stoppage of a software applica-
tion for unknown reasons. However, when the software is restarted,
it operates successfully. Incidents are not uncommon, but their ori-
gins are difficult to pin down. Some may be caused by momentary
power surges or power outages; some may be caused by hardware
problems or even cosmic rays; and some may be caused by soft-
ware bugs. Because incidents are usually random in occurrence and
cannot be replicated, it is difficult to study them.
15. Security vulnerabilities These are code segments that are
frequently used by viruses, worms, and hackers to gain access to
software applications. Error handling routines and buffer overflows
are common examples of vulnerabilities. As of 2009, these are not
usually classified as defects because they are only channels for
malicious attacks. However, given the alarming increase in such
attacks, there may be a need to reevaluate how to classify security
16. Malicious software engineers  From time to time software
engineers become disgruntled with their colleagues, their manag-
ers, or the companies that they work for. When this situation occurs,
Programming and Code Development
some software engineers deliberately insert malicious code into the
applications that they are developing. This situation is most likely
to occur in the time interval between a software engineer receiv-
ing a layoff notice and the actual day of departure. While only a
few software engineers cause deliberate harm, the situation may
become more prevalent as the recession deepens and lengthens. In
any case, the fact that software engineers can deliberately perform
harmful acts is one of the reasons why software engineers who work
for the Internal Revenue Service have their tax returns examined
manually. Of course, not only malicious code can occur, but also
other harmful kinds of coding might be used by software engineer-
ing employees, such as diverting funds to personal accounts.
17. Defect potentials This term originated in IBM circa 1973 and is
included in all of my major books. The term defect potential refers
to the sum total of possible defects that are likely to be encoun-
tered during software development. The total includes five sources of
defects: (1) requirements defects, (2) design defects, (3) code defects,
(4) document defects, and (5) bad fixes or secondary defects. Current
U.S. averages for defect potentials are about 5.0 per function point. A
rule of thumb for predicting defect potentials is to raise the size of the
application in function points to the 1.25 power. This gives a useful
approximation of total defects that are likely to occur for applications
between about 100 function points and 10,000 function points.
18. Defect removal efficiency This term also originated in IBM
circa 1973. It refers to the ratio of defects detected to defects pres-
ent. If a unit test finds 30 bugs out of a total of 100 bugs, it is 30
percent efficient. Most forms of testing are less than 50 percent
efficient. Static analysis and formal inspections top 80 percent in
defect removal efficiency.
19. Cumulative defect removal efficiency This term also origi-
nated in IBM circa 1973. It refers to the aggregate total of defects
removed by all forms of inspection, static analysis, and testing. If a
series of removal operations that includes requirement, design, and
code inspections; static analysis; and unit, new function, regression,
performance, and system tests finds 950 defects out of a possible
1000, the cumulative efficiency is 95 percent. Current U.S. averages
are only about 85 percent. Cumulative defect removal efficiency is
calculated at a fixed point in time, usually 90 days after software
is released to customers.
20. Performance issues  Some applications have stringent perfor-
mance criteria. An example might be the target-seeking guidance
system in a Patriot missile; another example would be the embed-
ded software inside antilock brakes. If the software fails to achieve
Chapter Eight
its performance targets, it may be unusable or even hazardous.
However, performance issues are not usually classified as defects
because no incorrect code is involved. What is involved are execu-
tion paths that are too long or that include too many calls and
branches. Even though there may be no overt errors, there are sub-
stantial liabilities associated with performance problems.
21. Cyclomatic and essential complexity These are mathemati-
cal expressions that provide a quantitative basis for judging the
complexity of source code segments. The metrics were invented by
Dr. Tom McCabe and are sometimes called McCabe complexity
metrics. Calculations are based on graph theory, and the general
formula is "edges ­ nodes + 2." Practically speaking, cyclomatic com-
plexity levels less than ten indicate low complexity when the code
is reviewed by software engineers. Cyclomatic complexity levels
greater than 20 indicate very complex code. The metrics are signifi-
cant because of correlations between defect densities and cyclomatic
complexity levels. Essential complexity is similar, but uses mathe-
matical techniques to simply the graphs by removing redundancy.
22. Toxic requirement  This is a new term introduced in 2009 and
derived from the financial phrase toxic assets. A toxic requirement
is defined as an explicit user requirement that is harmful and will
cause serious damages if not removed. Unfortunately, toxic require-
ments cannot be removed by means of regular testing because once
toxic requirements are embedded in requirements and design docu-
ments, any test cases created from those documents will confirm the
error rather than identify it. Toxic requirements can be removed
by formal inspections of requirements, however. An example of a
toxic requirement is the famous Y2K problem, which originated
as a specific user requirement. A more recent example of a toxic
requirement is the file handling of the Quicken financial software
application. If a backup file is "opened" instead of being "restored,"
then Quicken files can lose integrity.
Summary and Conclusions
on Software Defects
As discussed earlier in this book, the current U.S. average for software
defect volumes is about 5.0 per function point. (This total includes
requirements defects, design defects, coding defects, documentation
defects, and bad fixes or secondary defects.)
Cumulative defect removal is only about 85 percent. As a result, soft-
ware applications are routinely delivered with about 0.75 defect per
function point. Note that at the point of delivery, all of the early defects
in requirements and design have found their way into the code. In other
Programming and Code Development
words, while the famous Y2K problem originated as a requirements
defect, it eventually found its way into source code. No programming
language was immune, and therefore the Y2K problem was endemic
across thousands of applications written in all known programming
For a typical application of 1000 function points, 0.75 released defect
per function point implies about 750 delivered defects. Of these, about
20 percent will be high-severity defects: 150 high-severity defects will
probably be in the code when users get the first releases.
Five important kinds of remedial actions can improve this situation:
1. Measurement of defect volumes by 100 percent of software organi-
2. Measurement of defect removal efficiency for every kind of inspec-
tion, static analysis, and test stage used.
3. Reducing defect potentials by means of effective defect prevention
methods such as joint application design (JAD) and quality function
deployment (QFD), and others.
4. Raising defect removal efficiency levels by means of formal inspec-
tions, static analysis, and improved testing.
5. Examining the results of quality on defect removal costs and also on
total development costs and schedules, plus maintenance costs.
The combination of these five key activities can lower defect poten-
tials down to less than 3.0 defects per function point and raise defect
removal efficiency levels higher than 95 percent on average, with mis-
sion-critical applications hitting 99 percent.
An achievable goal for the software industry would be to achieve aver-
ages of less than 3.0 defects per function point, defect removal efficiency
levels of more than 95 percent, and delivered defect volumes of less than
0.15 defect per function point.
The combined results from better measurement, better defect pre-
vention, and better defect removal would reduce delivered defects for
a 1000­function point application from 750 down to only 150. Of these
150, only about 10 percent would be high-severity defects. Thus, instead
of 150 high-severity defects that normally occur today, only 15 high-
severity defects might occur. This is an improvement of a full order of
Even better, empirical data indicates that applications at the high
end of the quality spectrum have shorter development schedules, lower
development costs, and much lower maintenance costs.
Indeed, the main reason for both schedule slippages and cost over-
runs is because of excessive defect volumes at the start of testing.
Chapter Eight
Most projects are on schedule and within budget until testing starts,
at which time excessive defects stretch out testing by several hundred
percent compared with plans and cost estimates.
The technologies to achieve better quality results actually exist today
in 2009, but are not widely deployed. That means that better awareness
of quality and the economic value of quality are critical weaknesses of
the software industry circa 2009.
Preventing and Removing Defects from
Application Source Code
During development of software applications, the approximate average
number of defects encountered averages about 1.75 per function point or
17.5 per KLOC for languages where the ratio of lines of code to function
points is about 100. As pointed out earlier in this book, defect volumes
vary by the level of the programming languages, and they also vary by
the experience and skill of the programming team.
The minimum quantity of defects in source code will be about 0.5 per
function point or 5 per KLOC, while the maximum quantity will top
3.5 defects per function point or 35 defects per KLOC, assuming the
same level of programming language.
However, in spite of wide ranges of potential defects, there are still
more coding defects than any other kind of defect. Defect removal effi-
ciency against coding defects is in the range of 80 percent to 99 per-
cent. Some coding defects will slip through even in the best of cases,
although it is certainly better to approach 99 percent than it is to lag
at 80 percent.
For coding defects as with all other defect sources, two channels need
to be included in order to improve code quality:
1. Defect prevention, or methods that can lower defect potentials.
2. Defect removal, or methods that can seek out, find, and eliminate
coding defects.
The available forms of defect prevention for coding defects include
certified reusable code modules, use of patterns or standard coding
approaches for common situations, use of structured programming
methods, use of higher-level programming languages, constructing
prototypes prior to formal development, dividing large applications
into small segments (as does Agile development), participation in
code inspections, test-based development, and usage of static analysis
tools. Pair programming is also reported to have some efficacy in terms
of defect prevention, but this method has very low usage and very
little data.
Programming and Code Development
The available forms of defect removal for coding defects include desk
checking, pair programming, debugging tools, code inspections, static
analysis tools, and 17 kinds of conventional testing plus automated unit
testing and regression testing.
Defect removal by individual software engineers is difficult to study.
Desk checking, debugging, and unit testing are usually private activi-
ties with no observers and no detailed records kept. Most corporate
defect-tracking systems do not start to collect data until public defect
removal begins with formal inspections, function tests, and regression
tests. What happens before these public events is usually invisible.
There are some exceptions, however.
At one point, IBM asked for volunteers who were willing to record the
numbers of bugs they found in their own code by themselves. The pur-
pose of the study was to find out what was the actual defect removal effi-
ciency from these normally invisible forms of defect removal. Obviously,
the data was not used in any punitive fashion and was kept confidential,
other than to produce some statistical reports.
More recently the Personal Software Process (PSP) and Team
Software Process (TSP) methods developed by Watts Humphrey have
also included defect recording throughout the code development cycle.
Unfortunately, the Agile development method has moved in the other
direction and usually does not record private defect removal. Indeed,
many Agile projects do not record defect data at all, which is a mistake
because it reduces the ability of the Agile method to prove its value in
terms of quality.
The public forms of defect removal are discussed in this book in
Chapter 9, which deals with quality. The emphasis in this chapter is
more on the private forms of defect removal, which are seldom covered
in the software engineering literature.
Private defect removal lacks the large volumes of data associated with
some of the public forms such as formal inspections, static analysis, and
the test stages that involve other players such as test specialists and soft-
ware quality assurance. But for the sake of completeness, the topics of pri-
vate defect prevention and private defect removal need to be included.
Before discussing the effectiveness of either defect prevention or
defect removal, it should be noted that individual software engineers
or programmers vary widely in experience and skills.
In one controlled study at IBM where a number of programmers were
asked to implement the same trial example, the quantity of code pro-
duced varied by about 6 to 1 between the bulkiest solution and the most
concise solution for the same specification.
Similar studies showed about a 10 to 1 variation in the amount of
time a sample of programmers needed to code and debug a standard
problem statement.
Chapter Eight
These wide variations in individual performance mean that individ-
ual human variations in a population of software engineers probably
account for more divergence in results than do methods, tools, or factors
that can be studied objectively.
Forms of Programming Defect Prevention
It is much more difficult to measure or quantify defect prevention than
it is to measure defect removal. With defect removal, it is possible to
accumulate statistics on numbers of defects found and their severity
Once the project is released to customers, defect counts continue.
After 90 days of usage, it is possible to combine the internally discov-
ered defects with the customer-reported defects and to calculate defect
removal efficiency. If development personnel found 85 defects and cus-
tomers reported 15 defects, the removal efficiency is 85 percent. Such
data is easy to collect, valuable, and fairly accurate, except for some
invisible defects found via private removal actions such as desk check-
ing and unit test.
For defect prevention, there is no easy way to measure the absence of
defects. The methods available for exploring defect prevention require
collecting data from a fairly large number of projects, where some of
them utilized a specific defect prevention method and others did not.
For example, assume you measure a sample of 50 projects that used
structured coding methods and another 50 projects that did not use
structured programming methods. Assume the 50 projects that used
structured programming averaged 10 coding defects per KLOC or 1
per function point. Assume the 50 projects that did not use structured
programming averaged 20 coding defects per KLOC or 2 per function
point. This kind of analysis allows you to make a hypothesis that the
structured coding prevents about 50 percent of coding defects, but it is
still only a hypothesis and not proof.
Further, real-life situations are seldom simple and easy to deal with.
There may be numerous other factors at play, such as usage of static
analysis, usage of higher-level languages, usage of inspections, variations
in programming experience, complexity of the problems, and so forth.
The many different factors that can influence defect prevention mean
that exact knowledge of the effectiveness of any specific factor is some-
what subjective at best, and will probably stay that way.
Academic institutions can perform controlled experiments with stu-
dents where they measure the effectiveness of a single variable, but
such studies are fairly rare concerning defect prevention.
However, from long-range observations involving hundreds of soft-
ware personnel and hundreds of software projects over a multi-year
Programming and Code Development
time span, some objective factors about defect prevention have reason-
ably strong support:
Code reuse as defect prevention If reusable code is available that has
been certified to zero-defect levels, or at least carefully inspected, tested,
and subjected to static analysis before being made reusable, this is the
best known form of defect prevention. Defect potentials in certified reus-
able code modules are only a fraction of the 15 per KLOC normally
encountered during custom development; sometimes only about 1/100th
as many defects are encountered.
However, and this is an important point, using uncertified reusable code
can be both hazardous and expensive. If the defect potentials in uncer-
tified reusable code are more than about 1 per KLOC, and the reused
code is plugged into more than ten different applications, the combined
debugging costs will be so high that this example of reuse would have a
negative return on investment.
Although certified reuse is the most effective form of defect prevention
and counts as a best practice, it is also the rarest. Uncertified sources of
reuse outnumber certified sources by at least 50 to 1. Reuse of certified
code and other materials would class as a best practice. But reuse of
materials that are uncertified must be classed as a hazardous practice.
It is much harder for software engineers to debug someone else's
unfamiliar code than it is to debug their own. Every single time a reused
code module is utilized for a new application, there is a good chance that
the same errors will be encountered. Thus, uncertified reuse is hazard-
ous and can be more expensive than custom development of the same
module--hence, the reason the uncertified reuse can have a significant
negative return on investment (ROI).
Code reuse comes from many sources, including commercial vendors,
legacy applications, object-oriented class libraries, corporate reuse
libraries, public-domain and open-source libraries, and a number of
others. While reusable code is fairly plentiful, something that is not
plentiful is data on the repair frequencies of reusable materials. (See
the section on certifying reusable materials earlier in this book for addi-
tional information.)
As mentioned elsewhere in the book, code reuse by itself is only part
of the reusability picture. Reusable designs, data structures, test cases,
tutorial information, work breakdown structures, and HELP text are also
reusable and should be packaged together with the code they support.
Programmers and software engineers who
Patterns as defect prevention
have developed large numbers of software applications tend to be aware
that certain sequences of code occur many times in many applications.
Some of these sequences include validating inputs to ensure that error
Chapter Eight
conditions such as having character data entered into a numeric field is
rejected, or that text and numeric strings do not contain more characters
than specified by the application's design.
Patterns gained via personal experience are of course reusable even
if informal and personal. However, it has become clear that this kind
of knowledge occurs so often that it could be written down, illustrated
graphically, and then used to train new software engineers as they learn
trade craft.
Pattern-based development has the potential of lowering defect poten-
tials of young and inexperienced developers by more than 50 percent.
Once standard patterns are widely published and available, they can also
serve to facilitate career changes from one kind of software to another.
For example, there are very different kinds of patterns associated with
embedded applications than with information technology applications.
What is lacking for pattern-based development circa 2009 is an effec-
tive taxonomy that can be used to catalog the patterns and aid in select-
ing the appropriate set of patterns. Also, there is no exact knowledge of
how many patterns are likely to be useful and valuable. In the future,
pattern usage will no doubt be classed as a best practice, although doing
so in 2009 is probably a few years premature.
Individual software engineers working in a narrow range of applica-
tions probably utilize from 25 to 50 common patterns centering in input
and output validation, error handling, and perhaps security-related
topics. But when all types and forms of software are included, such as
financial applications, embedded applications, web applications, operat-
ing systems, compilers, avionics, and so on, the total number of useful
patterns could easily top 1000. This is too large a number to be listed
randomly, so patterns need to be organized if they are to become useful
tools of the trade.
Participation in formal inspections
Inspections as defect prevention
turns out to be equally effective as a defect-prevention method and a
defect-removal method. Participants in formal inspections spontane-
ously avoid making the kinds of mistakes that are found during the
inspection sessions. Therefore, after participating in a number of inspec-
tions, coding defects tend to be reduced by more than 80 percent com-
pared with the volumes encountered prior to starting to use inspections.
As a result, formal inspections get double counted as best practices: they
are highly effective for both defect prevention and defect removal.
Inspections turn out to be so effective in terms of defect prevention
that long-range usage of inspections has a tendency to become boring
for the participants due to a lack of interesting bugs or defects after
about a year of inspections. (Unfortunately, some companies stop using
inspections, so defect volumes begin to creep upwards again.)
Programming and Code Development
One other useful aspect of inspections is that when novices inspect
the work of experts, they spontaneously learn improved programming
skills. Conversely, when experts inspect the work of novices, they can
provide a great deal of useful advice as well as find a great many bugs
or defects. Therefore, it is useful to have several experts or top software
engineers as participants in inspections.
Static analysis is a fairly
Automated static analysis as defect prevention
new technology that is distinct from testing. Automated static analy-
sis tools have embedded rules and logic that are set up to discover
common forms of defects in source code. These tools are quite effective
and have defect removal efficiency levels that top 85 percent. A caveat
is that only about 50 languages out of 2500 are supported, and these
are primarily modern languages such as C, C#, C++, Java, and the
like. Older and obscure languages such as MUMPS, Coral, Chill, and
the like are not supported. However, with almost 100 static analysis
tools available, there are tools that can handle some older or special-
ized languages such as ABAP, Ada, COBOL, and PL/I. Some of the
tools have extensible rules, so in theory all of the 2500 languages in
existence might gain access to static analysis, although this is unlikely
to occur.
Because static analysis tools are effective at finding bugs in source
code, and the static analysis tools are usually run by programmers, they
have a double benefit of also acting as defect prevention agents. In other
words, programmers who carefully respond to the defects identified by
automated static analysis tools will spontaneously avoid making the
same defects in the future.
As of 2009, usage of static analysis counts as a best practice for sup-
ported programming languages. The evidence is already significant for
defect removal and is increasing for defect prevention.
Static-analysis tools are widely used by the open-source development
community with good results. Due to the power and utility of static
analysis, usage is expanding and this method should become a stan-
dard activity; in fact, static analysis should be included in every pro-
gramming development and maintenance environment and should be a
normal part of all development and maintenance methodologies.
The extreme pro-
Test-based development (TBD) as defect prevention
gramming (XP) method includes developing test cases prior to devel-
oping source code. Indeed, the test cases are used as an adjunct to the
requirements and design of software applications.
This method of early test-case development focuses attention on qual-
ity, and therefore TBD gets double credit as a best practice for both defect
prevention and defect removal. Because TBD is fairly new, empirical
Chapter Eight
data based on large numbers of trials is not yet available. The rather
lax measurement practices of the Agile community add to the problem
of ascertaining the actual effectiveness of TBD.
However, from anecdotal evidence, it appears that TBD may reduce
defect potentials by perhaps 30 percent and raise unit test defect removal
efficiency from around 35 percent up to perhaps 50 percent. Both results
are steps in the right direction, but additional data on TBD is needed.
TBD is a candidate for a best practice and no doubt will be classed as
one when additional quantitative data becomes available.
One of the claimed advantages
High-level languages as defect prevention
of high-level programming languages is that they reduce defect poten-
tials. A related claim is that if defects do occur, they are easier to find.
Both claims appear to be valid, but the situation is somewhat compli-
cated, and there are exceptions to general rules about the effectiveness
of high-level languages.
Any reduction in source code volumes will obviously reduce chances
for errors. If a specific function requires 1000 lines of code in assembly
language, but can be done with only 150 Java statements, the odds are
good that fewer defects will occur with Java. Even if both versions have
a constant ten bugs per KLOC, the larger assembly version might have
10 bugs, while the smaller Java version might have only 1 or 2.
However, some high-level programming languages have fairly com-
plex syntax and therefore make it easy to introduce errors by accident.
The APL programming language is an example of a language that is
very high level, but also difficult to read and understand, and therefore
difficult to debug, and especially so if the person attempting to debug
is not the original programmer.
Observations indicate the languages with regular syntax, mnemonic
labels, and commands that are amenable to human understanding will
have somewhat fewer coding defects than languages of the same level,
but with arcane commands and complicated syntax that include many
nested commands.
What would be useful and interesting would be controlled studies
by academic institutions that measured both defect densities and
debugging times for implementing standard problems in various
languages. It would be very interesting to see defect volumes and
debugging times compared for popular languages such as C, C#, C++,
Objective C, Java, JavaScript, Lua, Ruby, Visual Basic, and perhaps
50 more. However, as of 2009, this kind of controlled study does not
seem to exist.
As of 2009, the plethora of programming languages and their negative
impact on maintenance costs make best practice status for any specific
language somewhat questionable.
Programming and Code Development
Prototypes as defect prevention For large and complex applications, it
may be necessary to try out a number of alternative code sequences
before selecting a best-case alternative for the final versions. Prototypes
are useful in reducing defects in the final version by allowing software
engineers to experiment with alternatives in a benign fashion.
As a general rule prototypes are created mainly for the most trouble-
some and complicated pieces of work. As a result, the size of typical
prototypes is only about 5 percent to perhaps 10 percent of the size of
the total application. This practice of concentrating on the toughest
problems makes prototypes useful, and their compact size keeps them
from getting to be expensive in their own right.
Prototypes come in two flavors: disposable and evolutionary. As the
name implies, disposable prototypes are used to try out algorithms and
code sequences and then discarded. Evolutionary prototypes grow into
the finished application.
Because prototypes are usually developed at high speed in an experi-
mental fashion, the disposable prototypes are somewhat safer than evo-
lutionary prototypes. Prototypes may contain more bugs or defects than
polished work, and attempting to convert them into a finished product
may lead to higher than expected bug counts.
Disposable prototypes used to try out alternative solutions or to
experiment with difficult programming problems would be defined as
best practices. However, evolutionary prototypes that are carelessly
developed in the interest of speed are not best practices, but instead
somewhat hazardous.
Professor Edsger Dijkstra published
Code structure as defect prevention
one of the most famous letters in the history of software engineering
entitled "Go-to statements considered harmful." The letter to the editor
was published in August 1968 in The Communications of the ACM.
The thesis of this letter was that excessive use of branches or "go to"
statements made the structure of software applications so complex that
errors of incorrect branch sequences might occur that were very difficult
to identify and remove.
This letter triggered a revolution in programming style that came to
be known as structured programming. Under the principles of struc-
tured programming, branches were reduced and programmers began to
realize that complex loops and clever coding sequences introduced bugs
and made the code harder to test and validate.
As it happens another pioneering software engineer, Dr. Tom McCabe,
developed a way of measuring code structure that was published in
December 1976 in IEEE Transactions on Software. The measures devel-
oped by Dr. McCabe were those of "cyclomatic complexity" and "essential
Chapter Eight
Cyclomatic complexity is based on graph theory and is a formal way
of evaluating the complexity of a graph that describes the flow of control
through a software application. The formula for calculating cyclomatic
complexity is "edges ­ nodes + two."
Essential complexity is also based on graph theory, only it eliminates
redundant or duplicate paths through code.
In terms of cyclomatic complexity, a code segment with no branches
has a complexity score of 1, which indicates that the code executes in a
linear fashion with no branches or go-to statements. From a psychologi-
cal standpoint, cyclomatic complexity levels of less than 10 are usually
perceived as being well structured. However, as cyclomatic complexity
levels rise to greater than 20, the code segments become increasingly
difficult to understand or to follow from end to end without errors.
There is some empirical evidence that code with cyclomatic complex-
ity levels of less than 10 have only about 40 percent as many errors as
code with cyclomatic complexity levels greater than 20. Code with a
cyclomatic complexity level of 1 seems to have the fewest errors, if other
factors are held constant, such as the programming languages and the
experience of the developer.
One interesting study in IBM found a surprising result: that code
defects were sometimes higher for the work of senior or experienced pro-
grammers compared with the same volume of code written by novices
or new programmers. However, the actual cause of this anomaly was
that the experts were working on very difficult and complex applica-
tions, while the novices were doing only simple routines that were easy
to understand. In any case, the study indicated that problem difficulty
has a significant impact on defect density levels.
The importance of cyclomatic and essential complexity on code defects
led to the development of a number of commercial tools. Many tools
available circa 2009 can calculate cyclomatic and essential complexity
of code in a variety of languages.
In the 1980s, several tools on the market were aimed primarily at
COBOL and not only evaluated code complexity, but also could auto-
matically restructure the code and reduce both cyclomatic and essential
complexity. These tools asserted, with some evidence to back up the
assertions, that the revised code with low complexity levels could be
modified and maintained with less effort than the original code.
Use of structured programming techniques and keeping cyclomatic
complexity levels low would both be viewed as best practices. Code with
low complexity levels and few branches tends to have fewer defects, and
the defects that are present tend to be easier to find. Therefore, struc-
tured programming counts as a best practice for defect prevention.
More than 50 years of empirical data
Segmentation as defect prevention
has proven conclusively that defect potentials correlate almost perfectly
Programming and Code Development
with application size measured using both lines of code and function
points. Because size and defects are closely coupled, it is reasonable
to ask, Why not decompose large systems into a number of smaller
Unfortunately, this is not as easy as it sounds. To make an analogy,
since constructing an 80,000-ton cruise ship is known to be expensive,
why not decompose the ship into 80,000 small boats that are cheap to
build? Obviously, the features and user requirements of 80,000 small
boats are not the same as those of one large 80,000-ton cruise ship.
As of 2009, there are no proven and successful methods for segment-
ing or decomposing large systems into small independent components.
As it happens, the Agile method of dividing a system into segments or
sprints that can be developed sequentially has shown itself to be fairly
successful. But most of the Agile applications are below 10,000 function
points and are comparatively simple in architecture.
There have not yet been any Agile projects that tackle something of
the size of Microsoft Vista at about 150,000 function points or a large
ERP package at perhaps 300,000 function points. Indeed, if Agile sprints
were used for these applications and team sizes were in the range of
average Agile projects (less than ten people) then probably 150 sprints
would be needed for Vista and 300 would be needed for an ERP pack-
age. Assuming one month per sprint, the schedule would be perhaps 12
years for Vista and 25 years for the ERP package. Multiple teams would
speed things up, but interfaces between the code of each team would
add complexity and also add defects.
The bottom line is that segmentation into small independent pack-
ages or components is effective when it can be done well, but not
always possible given the feature sets and architecture of many large
systems. Thus best practice status cannot be assigned to segmenta-
tion as of 2009, due to the lack of standard and effective methods for
For large applications, segmentation is most common for major fea-
tures, but each of these features may themselves be in the range of
10,000 function points or more. There is not yet any proven way to
divide a massive system of 150,000 function points or 15 million lines
of code into perhaps 15,000 small independent pieces. About the best
that occurs circa 2009 is to divide these massive systems into perhaps
ten large segments.
Methodologies and measurements as defect prevention The Personal
Software Process (PSP) and Team Software Process (TSP) developed
by Watts Humphrey feature careful recording of all defects found during
development, including the normally invisible defects found privately
via desk checking and unit testing.
Chapter Eight
The act of recording specific defects tends to embed them in the minds
of software engineers and programmers. The result is that after several
projects in succession, coding defects decline by perhaps 40 percent since
they are spontaneously avoided.
Measurements and methodologies are therefore useful in terms of
defect prevention because they tend to focus attention on defects and so
trigger reductions over time. The methods that record defects and focus
on quality are classed as best practices.
One unusual aspect of TSP is that the results seem to improve with
application size. In other words, TSP operates successfully for large
systems in excess of 10,000 function points. This is a fairly rare occur-
rence among development methods.
Pair programming as defect prevention The idea of pair programming is
for two software engineers or programmers to share one workstation.
They take turns coding, and the other member of the pair observes the
code and makes comments and suggestions as the coding takes place.
The pair also has discussions on alternatives prior to actually doing the
code for any module or segment.
The method of pair programming has some experimental data that
suggests it may be effective in terms of both defect removal and defect
prevention. However, the pair programming method has so little usage
on actual software projects that it is not possible to evaluate these
claims as of 2009 on large-scale applications.
On the surface, pair programming would seem to come very close to
doubling the effort required to complete any given code segment. Indeed,
due to normal human tendencies to chat and discuss social topics, there
is some reason to suspect that pair programming would be more than
twice as expensive as individual programming.
Until additional information becomes available from actual projects
rather than from small experiments, there is not enough data to judge
the impact of pair programming in terms of defect removal or defect
The methods cited earlier in this
Other methods as defect prevention
chapter have been used enough so that their effectiveness in terms of
code defect prevention can be hypothesized. Other methods seem to
have some benefits in terms of defect prevention, but they are harder to
judge. One of these methods is Six Sigma as it applies to software. The
Six Sigma approach does include measurements of defects and analysis
of causes. However, Six Sigma is usually a corporate approach that is
not applied to specific projects, so it is harder to evaluate. Other code
defect prevention techniques that may be beneficial but for which the
Programming and Code Development
author has no solid data include quality function deployment (QFD),
root-cause analysis, the Rational Unified Process (RUP), and many of
the Agile development variations.
Combinations and synergies among defect prevention methods
the methods cited earlier may occur individually, they are often used in
combinations that sometimes appear synergistic. For example, struc-
tured coding is often used with TSP, with inspections, and with static
The most frequent combination is the pairing of high-level program-
ming languages with the concepts of structured programming. The
combination that tends to yield the highest overall levels of defect pre-
vention would be methodologies such as TSP teamed with high-level
programming languages, certified reusable code, patterns, prototypes,
static analysis, and inspections.
Overriding all other aspects of defect prevention and defect removal,
individual experience and skill levels of the software engineers continue
to be a dominant factor. However, as of 2009, the software engineering
field lacks standard methods for evaluating human performance; it has
no licensing or certification, no board specialties, and no methods of
judging professional malpractice. Therefore, expertise among software
engineers is important but difficult to evaluate.
Summary of Observations
on Defect Prevention
Because of the difficulty and uncertainty of measuring defect preven-
tion, the suite of defect prevention methods lacks the large volumes of
solid statistical data associated with defect removal.
Personal defect prevention is especially difficult to study because
most of the activities are private and therefore seldom have records or
statistical information available, other than data kept by volunteers.
Long-range measurements over time and involving hundreds of appli-
cations and software engineers give some strong indications of what
works in terms of defect prevention, but the results are still less than
precise and will probably stay that way.
Forms of Programming Defect Removal
There is very good data available on the public forms of defect removal
such as formal inspections, function test, regression test, independent
verification and validation, and many others. But private defect removal
is another story. The phrase private defect removal refers to activities
that software engineers or programmers perform by themselves without
witnesses and usually without keeping any written records.
Chapter Eight
The major forms of private defect removal include, but are not limited
1. Desk checking
2. Debugging using automated tools
3. Automated static analysis
4. Subroutine testing
5. Unit testing (manual)
6. Unit testing (automated)
Since most of these defect removal methods are used in private, data
to judge their effectiveness comes from either volunteers who keep
records of bugs found, or from practitioners of methods that include
complete records of all defects, such as PSP and TSP.
Automated static analysis is a method that happens to be used both
privately by individual programmers on their own code, and also pub-
licly by open-source developers who are working collaboratively on
large applications such as Firefox, Linux, and the like. Therefore, static
analysis has substantial data available for its public uses, and it can be
assumed that private use of static analysis will be equally effective.
In the early days of programming and
Desk checking for defect removal
computing, the time lag between writing source code and getting it
assembled or compiled was sometimes as much as 24 hours. When pro-
gram source code was punched into cards and the cards were then put
in a queue for assembly or compilation, many hours would go by before
the code could be executed or tested.
In these early days of programming between the late 1960s and the
1970s, desk checking or carefully reading the listing of a program to
look for errors was the most common method of personal defect removal.
Desk checking was also a technical necessity because errors in a deck
of punch cards could stop the assembly or compilation process and add
perhaps another 24 hours before testing could commence.
Today in 2009, code segments can be compiled or interpreted instantly,
and can be executed instantly as well. Indeed, they can be executed
using programming environments that include debugging tools and
automated static analysis. Therefore, desk checking has declined in
frequency of usage due to the availability of personal workstations and
personal development environments.
Although there is not much in the way of recent data on the effective-
ness of desk checking, historical data from 30 years ago indicates about
40 percent to just over 60 percent in terms of defect removal efficiency
Programming and Code Development
Today in 2009, desk checking is primarily reserved for a small subset
of very tricky bugs or defects that have not been successfully detected
and removed via other methods. These include security vulnerabilities,
performance problems, and sometimes toxic requirements that have
slipped into source code. These are hard to detect via static analysis or
normal testing because they may not involve overt code errors such as
branches to incorrect locations or boundary violations.
These special and unique bugs compose only about 5 percent of total
numbers of bugs likely to be found in software applications. Deck check-
ing is actually close to 70 percent in dealing with these very troublesome
bugs that have eluded other methods. (The reason that desk checking is
not higher is because sometimes software engineers don't realize that
a particular code practice is wrong. This is why proofreading of manu-
scripts is needed. Authors cannot always see their own mistakes.)
While these subtle bugs can be detected using formal inspections,
formal inspections do not occur on more than about 10 percent of soft-
ware applications and require between three and eight participants.
Desk checking, on the other hand, is a one-person activity that can be
performed at any time with no formal preparation or training.
Desk checking in 2009 is a supplemental method that may not be
needed for every software project. It is effective for a number of subtle
bugs and might be viewed as a best practice on an as-needed basis.
Software engineers and pro-
Automated debugging for defect removal
grammers circa 2009 have access to hundreds of debugging tools. These
tools normally support either specific programming languages such as
Java and Ruby or specific operating systems such as Linux, Leopard,
Windows Vista, and many others. In any case, a great many debugging
tools are available.
The features of debugging tools vary, but all of them allow the execu-
tion of code to be stopped at various places; they allow changes to code;
and they may include features to look for common problems such as
buffer overflows and branching errors. Beyond that, the specialized
debugging tools have a number of special features that are relevant to
specific languages or operating systems.
Debugging tools are so common that usage is a standard practice
and therefore would be classed as a best practice. That being said,
none are 100 percent effective, and quite a few bugs can escape. In fact,
given the numbers of bugs found later via inspections, static analysis,
and testing, the average efficiency of program debugging is only about
30 percent or less.
Automated static analysis for defect removal Static analysis tools examine
source code and the paths through the code and look for common errors.
Chapter Eight
Some of these tools have built-in sets of rules, while others have exten-
sible rule sets.
A keyword search of the Web using "automated static analysis" turns
up more than 100 such tools including Axivion, CAST, Coverity, Fortify,
GrammaTeck, Klocwork, Lattix, Ounce, Parasoft, ProjectAnalyzer,
ReSharper, SoArc, SofCheck, Viva64, Understand, Visual Studio Team
System, and XTRAN.
Individually, each static analysis tool supports up to 30 languages. For
common languages such as Java and C, dozens of static analysis tools
are available; for older languages such as Ada, Jovial, and PL/I, there
are only a few static analysis tools. For very specialized languages such
as ABAP used for writing code in SAP environments, there are only one
or two static analysis tools.
Without doing an exhaustive search, it appears that out of the current
total of 2500 programming languages developed to date, static analysis
tools are available for perhaps 50 programming languages. However,
some of these static analysis tools support extensible rules, so it is theo-
retically possible to create rules for examining all of the 2500 languages.
This is unlikely to occur, due to economic reasons for obscure languages
or those not used for business or scientific applications.
As a class, static analysis tools seem to be effective and can find per-
haps 85 percent of common programming errors. Therefore, usage of
static analysis tools can be viewed as a best practice; rapidly becoming
a standard practice, too.
However, static analysis tools only find coding problems and do not
find toxic requirements, performance problems, user interface problems,
and some kinds of security vulnerabilities. Therefore, additional forms
of defect removal are needed.
Some static analysis tools provide additional features besides defect
detection. Some are able to assist in translating older languages into
newer languages, such as turning COBOL into Java if desired.
It is also possible to raise the level of static analysis and examine the
meta-languages underlying several forms of requirements and design
documentation such as those created via the unified modeling language
(UML). Indeed, it is theoretically possible to use a form of extended
static analysis to create test suites.
Because static analysis and formal code inspections usually find many
of the same kinds of bugs, normally either one form or the other is uti-
lized, but not both. Static analysis and inspections have roughly the
same levels of defect removal efficiency, but static analysis is cheaper
and quicker. However, code inspections can find more subtle problems
such as performance issues or security vulnerabilities. These are not
code "bugs" per se, but they do cause trouble.
If static analysis and code inspections are both utilized, which occurs
for mission-critical applications such as some medical instruments and
Programming and Code Development
some kinds of security and military software, static analysis would nor-
mally come before code inspections.
A small number of issues identified by static analysis tools turn out
to be false positives, or code segments identified as bugs which turn out
to be correct. However, a few false positives is a small price to pay for
such a high level of defect removal efficiency.
Testing comes in many flavors and
Subroutine testing for defect removal
covers many different sizes of code volumes. The phrase subroutine
testing refers to a small collection of perhaps up to ten source code
instructions that produces an output or performs an action that needs
to be verified. Subroutine testing is usually the lowest level of testing
in terms of code volumes.
By contrast, unit testing would normally include perhaps 100 instruc-
tions or more, while the "public" forms of testing such as function testing
and regression testing may deal with thousands of instructions.
As the volume of source code increases, paths through the code
increase, and therefore more and more test cases are needed to actu-
ally cover 100 percent of the code. Indeed, for very large systems,
100 percent coverage appears to be impossible, or at least very rare.
Subroutine testing is a standard practice and also a best practice
because it eliminates a significant number of problems. However, the
defect removal efficiency of subroutine testing is only 30 percent to
perhaps 40 percent. This is because the code volumes are too small for
detecting many kinds of bugs such as branching errors.
Subroutine testing may or may not use actual formal test cases. The
usual mode is to execute the code and check the outputs for validity.
Subroutine test cases, if any, are normally disposable.
Unit testing of complete modules
Manual unit testing for defect removal
is the largest form of testing that is normally private or carried out by
individual programmers without the involvement of other personnel
such as test specialists or software quality assurance.
Manual unit testing is the first and oldest kind of formal testing.
Indeed, in the 1960s and early 1970s, when many applications only
contained 100 code statements or so, unit testing was often the only
form of testing performed.
The phrase unit testing refers to testing a complete module of perhaps
100 code statements that performs a discrete function with inputs, out-
puts, algorithms, and logic that need to be validated.
Unit testing can combine "black box" testing and "white box" test-
ing. The phrase black box means that the internal code of a module
is hidden, so only inputs and outputs are visible. Black box testing
therefore tests input and output validity. The phrase white box means
that internal code is revealed, so branches and control flow through
Chapter Eight
an application can be tested. Combining the two forms of testing should
in theory test everything. However, code coverage seldom hits 100 per-
cent, and for large applications that are high in cyclomatic complexity
it may drop below 50 percent.
Unit testing tends to look at limits, ranges of values, error-handling,
and security-related issues. Unfortunately, unit testing is only in the
range of perhaps 30 percent to 50 percent efficient in finding bugs. For
example, unit testing is not able to find many performance-related issues
because they typically involve longer paths and multiple modules.
For modules that tend to include a number of branches or complex
flows, unit testing begins to encounter problems with test coverage. As
cyclomatic complexity levels go up, it takes more and more test cases
to cover every path. In fact, 100 percent coverage almost never occurs
when cyclomatic complexity levels get above 5, even for modules with
only 100 code statements.
Unit testing is a standard activity for software engineering and
therefore counts as a best practice in spite of the somewhat low defect
removal efficiency. Without unit testing, the later stages of testing such
as function testing, stress testing, component testing, and system test-
ing would not be possible.
The test cases created for unit testing are normally placed in a formal
test library so that they can be used later for regression testing. Since
the test cases are going to be long-lived and used repeatedly, they need
proper identification as to what applications and features they test, what
functions they test, when they were created, and by whom. There will
also be accompanying test scripts that deal with invoking and executing
the test cases. The specifics of formal test case design are outside the
scope of this book, but such topics are covered in many other books.
Unit testing can be used in conjunction with other forms of defect
removal such as formal code inspections and static analysis. Usually,
static analysis would be performed prior to unit testing, while code
inspections would be performed after unit testing. This is because static
analysis is quick and inexpensive and finds many bugs that might be
found via unit testing. Unit testing is done prior to code inspections for
the same reason; it is faster and cheaper. However, code inspections are
very effective at finding subtle issues that elude both static analysis and
unit testing, such as security vulnerabilities and performance issues.
Using code inspections, static analysis, and unit testing for the same
code is a fairly rare occurrence that most often occurs on mission-critical
applications such as weapons systems, medical instruments, and other
software applications where failure might cause death or destruction.
Manual unit testing was a normal and standard activity for more than
40 years and is still very widespread. However, performance of units varies
from "poorly performed" to "extremely good." Because of the inconsistencies
Programming and Code Development
in methods of carrying out unit testing and in testing results, the ranges
are too wide to say that unit testing per se is a best practice. Careful unit
testing with both black box and white box test cases and thoughtful consid-
eration to test coverage would be considered a best practice. Careless unit
testing with hasty test cases and partial coverage would rank no better
than marginally adequate and would not be a best practice.
Testing is a teachable skill, and there are many classes available by
both academia and commercial test companies. There are also several
forms of certification for test personnel. It would be useful to know if
formal test training and certification elevated test defect removal effi-
ciency by significant amounts. There is considerable anecdotal evidence
that certification is beneficial, but more large-scale surveys and studies
are needed on this topic.
While manual unit testing has
Automated unit testing for defect removal
been part of software engineering since the 1960s, automated unit testing is
newer and started to occur only in the 1980s in response to larger and more
complex applications plus the arrival of graphical user interfaces (GUI),
which greatly expanded the nature of software inputs and outputs.
The phrase "automated unit testing" is somewhat ambiguous circa
2009. The most common usage of the term implies manual creation of
unit test cases combined with a framework or scaffold that allows them
to be run automatically on a regular basis without explicit actions by
software engineers.
Automated unit testing has been adopted by the Agile and extreme
programming (XP) communities together with the corollary idea of cre-
ating test cases before creating code. This combination seems to be fairly
effective in terms of defect removal and also pays off with improved
defect prevention by focusing the attention of software engineers on
quality topics.
The phrase automated unit testing deals mainly with test case execu-
tion and recording of defects that are encountered: most of the test cases
are still created by hand. However, it is theoretically possible to envision
automated test case creation as well.
Recall from Chapter 7 that during requirements gathering and analy-
sis, seven fundamental topics and 30 supplemental topics need to be
considered. As it happens, these same 37 issues also need to be tested.
A form of static analysis elevated to execute against requirements and
specification meta-languages should, in theory, be able to produce a
suite of test cases as a byproduct.
Some forms of test automation are aimed at web applications; others are
aimed at embedded applications; and still others are aimed at information
technology products. Automated testing is an emerging technology that as
of 2009 is still rapidly evolving.
Chapter Eight
There is a shortage of solid empirical data that compares automated
unit testing and manual unit testing in a side-by-side fashion for appli-
cations of similar size and complexity. Anecdotal information gives an
edge to automated testing for speed and convenience. However, the most
critical metric for testing is that of defect removal efficiency. As this
book is written, there is not enough solid data that compares automated
unit testing to the best forms of manual unit testing to judge whether
automated unit tests have higher levels of defect removal efficiency
than manual unit tests.
As additional data becomes available, there is a good chance that
automatic unit testing will enter the best practice class. As of 2009, the
data shows some effort and cost benefits, but defect removal efficiency
benefits remain uncertain.
About 40 percent of the software
Defect removal for legacy applications
engineers in the world are faced with performing maintenance on aging
legacy applications that they did not create themselves. Although the
legacy applications may be old, they are far from trouble free, and they
still contain latent bugs or defects.
This situation brings up a number of questions about defect removal for
legacy code where the original developers are gone, the specifications may
be missing or out of date, comments may be sparse or incorrect, regression
tests are of unknown completeness, and the code itself may be in a dead
language or one the current maintenance team has not used.
Fortunately, a number of companies and tools have addressed the
issues of maintaining aging legacy code. Some of these companies have
developed "maintenance workbenches" that include features such as:
1. Automated static analysis
2. Automated test coverage analysis
3. Automated function point calculations
4. Automated cyclomatic and essential complexity calculations
5. Automated debugging support for many (but not all) languages
6. Automated data mining for business rules
7. Automated translation from dead languages to newer languages
With aging legacy applications being written in as many as 2500 dif-
ferent programming languages, no single tool can provide universal sup-
port. However, for legacy code written in the more common languages
such as Ada, COBOL, C, PL/I, and the like, a number of maintenance
tools are available.
Usage of maintenance workbenches as a class counts as a best prac-
tice, but there are too many tools and variations to identify specific
Programming and Code Development
workbenches. Also, these tools are evolving fairly rapidly, and new fea-
tures occur frequently.
The methods
Synergies and combinations of personal defect removal
discussed in this section are used in combination rather than alone.
Debugging, automated static analysis, and unit testing form the most
common combination. The combined effectiveness of these three meth-
ods can top 97 percent in terms of defect removal efficiency when per-
formed by experienced software engineers. The combined results can
also drop below 85 percent when performed by novices.
Summary and Conclusions on
Personal Defect Removal
Although personal defect removal activities are private and therefore
difficult to study, they have been the frontline of defense against soft-
ware defects for more than 50 years. That being said, the fact that soft-
ware defects emerge and are still present when software is delivered
indicates that none of the personal defect removal methods are 100
percent effective.
However, some of the newer defect removal tools such as automated
static analysis are improving the situation and adding rigor to the suite
of personal defect removal tools and methods.
Since individual software engineers can keep records of the bugs they
find, it would be useful and valuable if personal defect removal effi-
ciency levels could be elevated up to more than 90 percent before the
public forms of defect removal begin.
Personal defect removal will continue to have a significant role as
software engineering evolves from a craft to a true engineering dis-
cipline. Knowing the most effective and efficient ways for preventing
and removing defects is a sign of software engineering professionalism.
Lack of defect measures and unknown levels of defect removal efficiency
imply amateurishness; not professionalism.
Economic Problems of the
"Lines of Code" Metric
Any discussion of programming and code development would be incom-
plete without considering the famous lines of code (LOC) metric, which
has been used to measure both productivity and quality since the dawn
of the computer era.
Chapter Eight
The LOC metric was first introduced circa 1960 and was used for
economic, productivity, and quality studies. At first the LOC metric was
reasonably effective for all three purposes.
As additional higher-level programming languages were created, the
LOC metric began to encounter problems. LOC metrics were not able to
measure noncoding activities such as requirements and design, which
were becoming increasingly expensive.
These problems became so severe that a controlled study in 1994
that used both LOC metrics and function point metrics for ten versions
of the same application coded in ten languages reached an alarming
conclusion: LOC metrics violated the standard assumptions of economic
productivity so severely that using LOC metrics for studies involving
more than one programming language constituted professional mal-
Such a strong statement cannot be made without examples and case
studies to show the LOC problems. Following is a chronology of the use
of LOC metrics that shows when and why the metric began to cease
being useful and start being troublesome. The chronology runs from
1960 to the present day, and it projects some ideas forward to 2020.
Lines of Code Metrics Circa 1960
The lines of code (LOC) metric for software projects was first introduced
circa 1960 and was used for economic, productivity, and quality studies.
The economics of software applications were measured using "dollars
per LOC." Productivity was measured in terms of "lines of code per time
unit." Quality was measured in terms of "defects per KLOC" where "K"
was the symbol for 1000 lines of code. The LOC metric was reasonably
effective for all three purposes.
When the LOC metric was first introduced, there was only one pro-
gramming language, basic assembly language. Programs were small
and coding effort composed about 90 percent of the total work. Physical
lines and logical statements were the same thing for basic assembly
In this early environment, the LOC metric was useful for economic,
productivity, and quality analyses. The LOC metric worked fairly well
for a single language where there was little or no reused code and where
there were no significant differences between counts of physical lines
and counts of logical statements. But the golden age of the LOC metric,
where it was effective and had no rivals, only lasted about ten years.
However, this ten-year span was time enough so that the LOC metric
became firmly embedded in the psychology of software engineering. Once
an idea becomes firmly fixed, it tends to stay in place until new evidence
becomes overwhelming. Unfortunately, as the software industry changed
Programming and Code Development
and evolved rapidly, the LOC metric did not change. As time passed,
the LOC metric became less and less useful until by about 1980 it had
become extremely harmful without very many people realizing it. Due to
cognitive dissonance, the LOC metric was used but not examined criti-
cally in the light of changes in other software engineering methods.
Lines of Code Metrics Circa 1970
By 1970, basic assembly had been supplanted by macro-assembly.
The first generation of higher-level programming languages such as
COBOL, FORTRAN, and PL/I was starting to be used. Usage of basic
assembly language was beginning to drop out of use as better alterna-
tives became available. This was perhaps the first instance of a long
series of programming languages that died out, leaving a train of aging
legacy applications that would be difficult to maintain as programmers
and compilers stopped being available who were familiar with the dead
The first known problem with LOC metrics was in 1970, when many
IBM publication groups exceeded their budgets for that year. It was
discovered (by the author) that technical publication group budgets
had been based on 10 percent of the budgets assigned to programming
or coding.
The publication projects based on code budgets for assembly language
did not overrun their budgets, but manuals for the projects coded in
PL/S (a derivative of PL/I) had major overruns. This was because PL/S
reduced coding effort by half, but the technical manuals were as big as
ever. Therefore, when publication budgets were set at 10 percent of code
budgets, and coding costs declined by 50 percent, all of the publication
budgets for PL/S projects were exceeded.
The initial solution to this problem at IBM was to give a formal math-
ematical definition to language levels. The level was defined as the
number of statements in basic assembly language needed to equal the
functionality of 1 statement in a higher-level language. Thus, COBOL
was a level 3 language because it took three basic assembly statements
to equal one COBOL statement. Using the same rule, SMALLTALK is
a level 18 language.
For several years before function points were invented, IBM used
"equivalent assembly statements" as the basis for estimating noncode
work such as user manuals. (Indeed, a few companies still use equiva-
lent assembly language even in 2009.)
Thus, instead of basing a publication budget on 10 percent of the
effort for writing a program in PL/S, the budget would be based on 10
percent of the effort if the code were basic assembly language. This
method was crude but reasonably effective. This method recognized that
Chapter Eight
not all languages required the same number of lines of code to deliver
specific functions.
However, neither IBM customers nor IBM executives were comfort-
able with the need to convert the sizes of modern languages into the
size of an antique language for cost-estimating purposes. Therefore, a
better form of metric was felt to be necessary.
The documentation problem plus dissatisfaction with the equivalent
assembler method were two of the reasons IBM assigned Allan Albrecht
and his colleagues to develop function point metrics. Additional very
powerful programming languages such as APL were starting to appear,
and IBM wanted both a metric and an estimating method that could
deal with noncoding work as well as coding in an accurate fashion.
The use of macro-assembly language had introduced code reuse, and
this caused measurement problems, too. It raised the issue of how to
count reused code in software applications, or how to count any other
reused material for economic purposes.
The solution here was to separate productivity into two discrete topics:
1. Development productivity
2. Delivery productivity
The former, development productivity, dealt with the code and materi-
als that had to be constructed from scratch in the traditional way.
The latter, delivery productivity, dealt with the final application as
delivered, including reused material. For example, using macro-assem-
bly language, a productivity rate for development productivity might
be 300 lines of code per month. But due to reusing code in the form of
macro expansions, delivery productivity might be as high as 750 lines
of code per month.
This is an important business distinction that is not well understood
even in 2009. The true goal of software engineering is to improve the
rate of delivery productivity. Indeed, it is possible for delivery productiv-
ity to rise while development productivity declines!
This might occur by carefully crafting a reusable code module and
certifying it to zero-defect quality levels. Assume a 500­line code module
is developed for widespread reuse. Assume the module was carefully
developed, fully inspected, examined via static analysis, and fully tested.
The module was certified to be of zero-defect status.
This kind of careful development and certification might yield a net
development productivity rate of only 100 lines of code per month, while
normal development for a single-use module would be closer to 500 lines
of code per month. Thus, a total of five months instead of a single month
of development effort went to creating the module. This is of course a
very low rate of development productivity.
Programming and Code Development
However, once the module is certified and available for reuse, assume
that utilizing it in additional applications can be done in only one hour.
Therefore, every time the module is utilized, it saves about one month
of custom development!
If the module is utilized in only five applications, it will have paid for
its low development productivity. Every time this module is used, its
effective delivery productivity rate is equal to 500 lines of code per hour,
or about 66,000 lines of code per month!
Thus, while the development productivity of the module dropped down
to only 100 lines of code per month, the delivery productivity rate is
equivalent to 66,000 lines of code per month. The true economic value
of this module does not reside in how fast it was developed, but rather
in how many times it can be delivered in other applications because it
is reusable.
To be successful, reused code needs to approach or achieve zero-defect
status. It does not matter what the development speed is, if once com-
pleted the code can then be used in hundreds of applications.
As service-oriented architecture (SOA) and software as a service
(SaaS) approach, their goal is to make dramatic improvements in the
ability to deliver software features. Development speed is comparatively
unimportant so long as quality approaches zero-defect levels.
Returning to the historical chronology, another issue shared between
macro-assembly language and other new languages was the difference
between physical lines of code and logical statements. Some languages,
such as Basic, allowed multiple statements to be placed on a physical
line. Other languages, such as COBOL, divided some logical statements
into multiple physical lines. The difference between a count of physical
lines and a count of logical statements could differ by as much as 500
percent. For some languages, there would be more physical lines than
logical statements, but for other languages, the reverse was true. This
problem was never fully resolved by LOC users and remains trouble-
some even in 2009.
Due to the increasing power and sophistication of high-level program-
ming languages such as C++, Objective C, SMALLTALK, and the like,
the percentage of project effort devoted to coding was dropping from
90 percent down to about 50 percent. As coding effort declined, LOC metrics
were no longer effective for economic, productivity, or quality studies.
After function point metrics were developed circa 1975, the defini-
tion of language level was expanded to include the number of logical
code statements equivalent to 1 function point. COBOL, for example,
requires about 105 statements per function point in the procedure and
data divisions.
This expansion is the mathematical basis for backfiring, or direct
conversion from source code to function points. Of course, individual
Chapter Eight
programming styles make backfiring a method with poor accuracy even
though it remains widely used for legacy applications where code exists
but specifications may be missing.
There are tables available from several consulting companies such as
David Consulting, Gartner Group, and Software Productivity Research
(SPR) that provide values for source code statements per function point
for hundreds of programming languages.
In 1978, A.J. Albrecht gave a public lecture on function point metrics
at a joint IBM/SHARE/GUIDE conference in Monterey, California. Soon
after this, function points started to be published in the software litera-
ture. IBM customers soon began to use function points, and this led to
the formation of a function point user's group, originally in Canada.
Lines of Code Metrics Circa 1980
By about 1980, the number of programming languages had topped 50,
and object-oriented languages were rapidly evolving. As a result, soft-
ware reusability was increasing rapidly.
Another issue that surfaced circa 1980 was the fact that many appli-
cations were starting to use more than one programming language, such
as COBOL and SQL. The trend for using multiple languages in the same
application has become the norm rather than the exception. However,
the difficulty of counting lines of code with accuracy was increased when
multiple languages were used.
About the middle of this decade, function point users organized and
created the nonprofit International Function Point Users Group (IFPUG).
Originally based in Canada, IFPUG moved to the United States in the
mid-1980s. Affiliates in other countries soon were formed, so that by the
end of the decade, function point user groups were in a dozen countries.
In 1985, the first commercial software cost-estimating tool based on
function points reached the market, SPQR/20. This tool supported esti-
mates for 30 common programming languages and also could be used
for combinations of more than one programming language.
This tool included sizing and estimating of paper documents such as
requirements, design, and user manuals. It also estimated noncoding
tasks including testing and project management.
Because LOC metrics were still widely used, the SPQR/20 tool
expressed productivity and quality results using both function points
and LOC metrics. Because it was easy to switch from one language
to another, it was interesting to compare the results using both func-
tion point and LOC metrics when changing from macro-assembly to
FORTRAN or Ada or PL/I or Java.
As the level of a programming language goes up, economic productiv-
ity expressed in terms of function points per staff month also goes up,
Programming and Code Development
which matches standard economics. But as language levels get higher,
productivity expressed in terms of lines of code per month drops down.
This reversal by LOC metrics violates all rules of standard economics
and is a key reason for asserting that LOC metrics constitute profes-
sional malpractice.
It is a well-known law of manufacturing economics that when a develop-
ment cycle includes a high percentage of fixed costs, and there is a decline
in the number of units manufactured, the cost per unit will go up.
If line of code is considered to be a manufacturing unit and there is a
switch from a low-level language to a high-level language, the number
of units will decline. But the paper documents in the form of require-
ments, specifications, and user documents do not decline. Instead they
stay almost constant and have the economic effect of fixed costs. This
of course will raise the cost per unit. Because this situation is poorly
understood, two examples will clarify the situation.
Suppose we have an application that consists of 1000 lines of
Case A
code in basic assembly language. (We can also assume that the applica-
tion is 5 function points.) Assume the development personnel are paid
at a rate of $5000 per staff month.
Assume that coding took 1 staff month and production of paper docu-
ments in the form of requirements, specifications, and user manuals
also took 1 staff month. The total project took 2 staff months and cost
$10,000. Productivity expressed as LOC per staff month is 500. The cost
per LOC is $10.00. Productivity expressed in terms of function points
per staff month is 2.5. The cost per function point is $2000.
Case B Assume that we are doing the same application using the Java
programming language. Instead of 1000 lines of code, the Java version
only requires 200 lines of code. The function point total stays the same
at 5 function points. Development personnel are also paid at the same
rate of $5000 per staff month.
In Case B suppose that coding took only 1 staff week, but the produc-
tion of paper documents remained constant at 1 staff month.
Now the entire project took only 1.25 staff months instead of 2 staff
months. The cost was only $6250 instead of $10,000. Clearly economic
productivity has improved, since we did the same job as Case A with a
savings of $3750. We delivered exactly the same functions to users, but
with much less code and therefore much less effort, so true economic
productivity increased.
When we measure productivity for the entire project using LOC met-
rics, our rate has dropped down to only 160 LOC per month from the
500 LOC per month shown for Case A!
Chapter Eight
Our cost per LOC has soared up to $31.25 per LOC. Obviously, LOC
metrics cannot measure true economic productivity. Also obviously, LOC
metrics penalize high-level languages. In fact, many studies have proven
that the penalty exacted by LOC metrics is directly proportional to the
level of the programming language, with the highest-level languages
looking the worst!
Since the function point totals of both Case A and Case B versions are
the same at 5 function points, Case B has a productivity rate of 4 func-
tion points per staff month. The cost per function point is only $1250.
These improvements match the rules of standard economics, because
the faster and cheaper version has better results than the slower more
expensive version.
What has happened of course is that the paperwork portion of the
project did not decline even though the code portion declined substan-
tially. This is why LOC metrics are professional malpractice if applied
to compare projects that used different programming languages. They
move in the opposite direction from standard economic productivity
rates and penalize high-level languages. Table 8-7 summarizes both
Case A and Case B.
As can be seen by looking at Cases A and B when they are side by side,
LOC metrics actually reverse the terms of the economic equation and
make the large, slow, costly version look better than the small, quick,
cheap version.
It might be said that the reversal of productivity with LOC metrics
is because paperwork was aggregated with coding. But even when only
coding by itself is measured, LOC metrics still violate standard eco-
nomic assumptions.
TABLE 8-7 Comparing Low-Level and High-Level Languages
Case A
Case B
Lines of code (LOC)
Function points
Monthly compensation
Paperwork effort (months)
Coding effort (months)
Total effort (months)
Project cost
LOC per month
Cost per LOC
Function points per month
Cost per function point
Programming and Code Development
The 1000 LOC of assembly code was done in 1 month at a rate of 1000
LOC per month. The pure coding cost was $5000 or $5.00 per LOC.
The 200 LOC of Java code was done in 1 week, or 0.25 month.
Converted into a monthly rate, that is only 800 LOC per month. The
coding cost for Java was $1250, so the cost per LOC was $6.25.
Thus, Java costs more per LOC than assembly, even though Java took
only one-fourth the time and one-fourth the cost! When you try and
measure the two different languages using LOC, assembly looks better
than Java, which is definitely a false conclusion. Table 8-8 shows the
comparison between assembly and Java for coding only.
In real economic terms, the Java code only cost $1250 while the assem-
bly code cost $5000. Obviously, Java has better economics because the
same job was done for a savings of $3750.
But the Java LOC production rate is lower than assembly, and the
cost per LOC has jumped from $5.00 to $6.25! From an economic stand-
point, variations in LOC per month and cost per LOC are unimportant
if there is a major difference in how much code is needed to complete
an application.
Unfortunately, LOC metrics end up as professional malpractice no
matter how you use them if you are trying to measure economic pro-
ductivity between unlike programming languages. By contrast, the Java
code's cost per function point was $250, while the assembly code's cost
per function point was $1000, and this matches the assumptions of
standard economics.
Function point production for Java was 20 function points per staff
month versus only 5 function points per staff month for assembly. Thus,
function points match the assumptions of standard economics while
LOC metrics violate standard economics.
Returning to the main thread, within a few years, all other commercial
software estimating tools would also support function point metrics, so
TABLE 8-8 Comparing Coding for Low-Level and High-Level Languages
Case A
Case B
Lines of code (LOC)
Function points
Monthly compensation
Coding effort (months)
Coding cost
LOC per month
Cost per LOC
Function points per month
Cost per function point
Chapter Eight
that CHECKPOINT, COCOMO, KnowledgePlan, Price-S, SEER, SLIM
SPQR/20, and others could express estimates in terms of both function
points and LOC metrics.
By the end of this decade, coding effort was below 35 percent of total
project effort, and LOC was no longer valid for either economic or qual-
ity studies. LOC metrics could not quantify requirements and design
defects, which now outnumbered coding defects. LOC metrics could not
be used to measure any of the noncoding activities such as require-
ments, design, documentation, or project management.
The response of the LOC users to these problems was unfortunate:
they merely stopped measuring anything but code production and
coding defects. The bulk of all published reports based on LOC metrics
cover less than 35 percent of development effort and less than 25 per-
cent of defects, with almost no data being published on requirements
and design defects, rates of requirements creep, design costs, and other
modern problems.
The history of the LOC metric provides an interesting example of
Dr. Leon Festinger's theory of cognitive dissonance. Once an idea
becomes entrenched, the human mind tends to reject all evidence to
the contrary. Only when the evidence becomes overwhelming will there
be changes of opinion, and such changes tend to occur rapidly.
Lines of Code Metrics Circa 1990
By about 1990, not only were there more than 500 programming lan-
guages in use, but some applications were written in 12 to 15 different
languages. There were no international standards for counting code, and
many variations were used sometimes without being defined.
In 1991, the first edition of the author's book Applied Software
Measurement included a proposed draft standard for counting lines
of code based on counting logical statements. One year later, Bob Park
from the Software Engineering Institute (SEI), also published a pro-
posed draft standard, only based on counting physical lines.
A survey of software journals by the author in 1993 found that about
one-third of published articles used physical lines, one-third used logical
statements, and the remaining third used LOC metrics without even
bothering to say how they were counted. Since there is about a 500 per-
cent variance between physical LOC and logical statements for many
languages, this was not a good situation.
The technical journals that deal with medical practice and engineer-
ing often devote as much as 50 percent of the text to explaining and
defining the measurement methods used to derive the results. The soft-
ware engineering journals, on the other hand, often fail to define the
measurement methods at all.
Programming and Code Development
The software journals seldom devote more than a few lines of text to
explaining the nature of the measurements used for the results. This is
one of several reasons why the term "software engineering" is something
of an oxymoron. In fact it is not even legal to use the term "software
engineering" in some states and countries, because software develop-
ment is not a recognized engineering discipline or a licensed engineering
But there was a worse problem approaching than ambiguity in count-
ing lines of code. The arrival of Visual Basic introduced a class of pro-
gramming languages where counting lines of code was not even possible.
This is because a lot of Visual Basic "programming" was not done with
procedural code, but rather with buttons and pull-down menus.
Of the approximate 2500 programming languages and dialects in
existence circa 2009, there are only effective published counting rules
for about 150. About another 2000 are similar to other languages and
could perhaps share the same counting rules. But for at least 50 lan-
guages that use graphics or visual means to augment procedural code,
there are no code counting rules at all. Unfortunately, some of the lan-
guages without code counting rules tend to be most recent languages
that are used for web site development.
In 1994, a controlled study was done that used both LOC metrics
and function points for ten versions of the same application written in
ten different programming languages, including four object-oriented
The study was published in American Programmer in 1994. This
study found that LOC metrics violated the basic concepts of economic
productivity and penalized high-level and OO languages due to the fixed
costs of requirements, design, and other noncoding activities. This was
the first published study to state that LOC metrics constituted profes-
sional malpractice if used for economic studies where more than one
programming language was involved.
By the 1990s most consulting studies that collected benchmark and
baseline data used function points. There are no large-scale benchmarks
based on LOC metrics. The International Software Benchmarking
Standards Group (ISBSG) was formed in 1997 and only publishes data
in function point form. Consulting companies such as SPR and the
David Consulting Group also use function point metrics.
By the end of the decade, some projects were spending less than 20 per-
cent of the total effort on coding, so LOC metrics could not be used for the
80 percent of effort outside the coding domain. The LOC users remained
blindly indifferent to these problems and continued to measure only
coding, while ignoring the overall economics of complete development
cycles that include requirements, analysis, design, user documentation,
project management, and many other noncoding tasks.
Chapter Eight
By the end of the decade, noncoding defects in requirements and
design outnumbered coding defects almost 2 to 1. But since noncode
defects could not be measured with LOC metrics, the LOC literature
simply ignores them.
Indeed, still in 2009, debates occur about the usefulness of the LOC
metric, but the arguments unfortunately are not solidly grounded in
manufacturing economics. The LOC enthusiasts seem to ignore the
impact of fixed costs on software development.
The main argument of the LOC enthusiasts is that development effort
has a solid statistical correlation to size measured in terms of lines of
code. This is true, but irrelevant in terms of standard economics.
If it takes 1000 lines of C code to deliver ten function points to custom-
ers and the cost was $10,000, then the cost per LOC is $10.00. Assuming
one month of programming effort, the productivity rate using LOC is
1000 LOC per month.
If the same ten function points were delivered to customers in
Objective C, there might be only 250 lines of code and the cost might
be only $2500. The effort might take only one week instead of a whole
month. But the cost per LOC is unchanged at $10.00 and the LOC pro-
ductivity rate is also unchanged at 1000 LOC per month.
With LOC metrics, both versions appear to have identical productivity
rates of 1000 LOC per month, but these are development rates; not deliv-
ery rates. Since the functionality is the same for both C and Objective C
versions, it is important that the cost per function point for C was $1000,
while for Objective C the cost per function point was only $250.
Measured in terms of function points per month, the rate for C was
10, while the rate for Objective C increased to 40. Thus, when measured
correctly, the economic value of high-level languages and delivery rates
are clearly revealed, while the LOC metric does not show either eco-
nomic or delivery productivity at all.
Lines of Code Metrics Circa 2000
By the end of the century, the number of programming languages had
topped 2000 and continues to grow at more than one new program-
ming language per month. Current rates of new programming language
development may approach 100 new languages per year.
Web applications are mushrooming, and all of these are based on very
high-level programming languages and substantial reuse. The Agile
methods are also mushrooming and also tend to use high-level pro-
gramming languages. Software reuse in some applications now tops 80
percent. LOC metrics cannot be used for most web applications and are
certainly not useful for measuring Scrum sessions and other noncoding
activities that are part of Agile projects.
Programming and Code Development
Function point metrics had become the dominant metric for serious
economic and quality studies. But two new problems appeared that
have kept function point metrics from actually becoming the industry
standard for both economic and quality studies.
The first problem is that some software applications are now so large
(greater than 300,000 function points) that normal function point analy-
sis is too slow and too expensive to be used.
There are gaps at both ends of normal function point analysis. Above
15,000 function points, the costs and schedule for counting function point
metrics become so high that large projects are almost never counted.
(Function point analysis operates between 400 and 600 function points
per day per counter. The approximate cost is about $6.00 per function
point counted.)
At the low end of the scale, the counting rules for function points do
not operate below a size of about 15 function points. Thus, small changes
and bug repairs cannot be counted. Individually, such changes may be as
small as 1/50 of a function point and are rarely larger than 10 function
points. But large companies can make 30,000 or more changes per year,
with a total size that can top 100,000 function points.
The second problem is that the success of the original function point
metric has triggered an explosion of function point clones. As of 2009,
there are at least 24 function point variations. This makes benchmark
and baseline studies difficult, because there are very few conversion
rules from one variation to another.
In addition to standard IFPUG function points, there are also Mark
II function points, COSMIC function points, Finnish function points,
Netherlands function points, story points, feature points, web-object
points, and many others.
Although LOC metrics continue to be used, they continue to have such
major errors that they constitute professional malpractice for economic
and quality studies where more than one language is involved, or where
non-coding issues are significant.
There is also a psychological problem. LOC usage tends to fixate atten-
tion on coding and make the other kinds of software work invisible. For
large software projects there may be many more noncode workers than
programmers. There will be architects, designers, database administra-
tors, quality assurance, technical writers, project managers, and many
other occupations. But since none of these can be measured using LOC
metrics, the LOC literature ignores them.
Lines of Code Metrics Circa 2010
It would be nice to predict an optimistic future, but the recession has
changed the nature of industry and the future is now uncertain.
Chapter Eight
If current trends continue, within a few more years the software
industry will have more than 3000 programming languages, of which
about 2900 will be obsolete or nearly dead languages. The industry
will have more than 20 variations for counting lines of code, more than
50 variations for counting function points, and probably another 20
unreliable metrics such as story points, use-case points, cost per defect,
or using percentages of unknown numbers. (The software industry loves
to make claims such as "improve productivity by 10 to 1" without defin-
ing either the starting or the ending point.)
Future generations of sociologists will no doubt be interested in why
the software industry spends so much energy on creating variations of
things, and so little energy on fundamental issues. No doubt large proj-
ects will still be cancelled, litigation for failures will still be common,
software quality will still be bad, software productivity will remain low,
security flaws will be alarming, and the software literature will con-
tinue to offer unsupported claims without actually presenting quanti-
fied data.
What the software industry needs is actually fairly straightforward:
1. Measures of defect potentials from all sources expressed in terms of
function points; that is, requirements defects, design defects, code
defects, document defects, and bad fixes.
2. Measures of defect removal efficiency levels for all forms of inspec-
tion, static analysis, and testing.
3. Activity-based productivity benchmarks from requirements through
delivery and then for maintenance and customer support from
delivery to retirement using function points.
4. Certified sources of reusable material near the zero-defect level.
5. Much improved security methods to guard against viruses, spyware,
and hacking.
6. Licenses and board-certification for software engineering specialties.
But until measurement becomes both accurate and cost-effective,
none of these are likely to occur. An occupation that will not measure
its own performance with accuracy is not a true profession.
Lines of Code Circa 2020
If we look forward to 2020, there are best-case and worst-case scenarios
to consider.
The best-case scenario for lines of code metrics is that usage dimin-
ishes even faster than it has been and that economic productivity based
on delivery becomes the industry focus rather than development and
Programming and Code Development
lines of code. For this scenario to occur, the speed of function point analy-
sis needs to increase and the cost per function point counted needs to
decrease from about $6.00 per function point counted to less than $0.10
per function point counted, which is technically possible and indeed
occurs in 2009, although the high-speed methods are not yet widely
deployed since they are so new.
If these changes occur, then function point usage will increase at least
tenfold, and many new kinds of economic studies can be carried out.
Among these will be measurement of entire portfolios that might top
10 million function points. Corporate backlogs could be sized and pri-
oritized, and some of these exceed 1 million function points. Risk/value
analyses for major software applications could become both routine
and professionally competent. It will also be possible to do economic
analyses of interesting new technologies such as the Agile methods,
service-oriented architecture (SOA), software as a service (SaaS), and
of course total cost of ownership (TCO).
Under the best-case scenario, software engineering would evolve from
a craft or art form into a true engineering discipline. Reliable measures
of all activities and tasks will lead to greater success rates on large soft-
ware applications. The goal of software engineering should be to become
a true engineering discipline with recognized specialties, board certifica-
tion, and accurate information on productivity, quality, and costs. But
that cannot be accomplished when project failures outnumber successes
for large applications.
So long as quality and productivity are ambiguous and uncertain, it
is difficult to carry out multiple regression studies and to select really
effective tools and methods. LOC metrics have been a major barrier to
economic and quality studies for software.
The worst-case scenario is that LOC metrics continue at about the
same level as 2009. The software industry will continue to ignore eco-
nomic productivity and remain fixated on the illusory "lines of code per
month" metric. Under the worst-case scenario, "software engineering"
will remain an oxymoron. Trial-and-error methods will continue to dom-
inate, in part because effective tools and methodologies cannot even be
studied using LOC metrics. Under the worst-case scenario, failures and
project disasters will remain common for large software applications.
Function point analysis will continue to serve an important role for
economic studies, benchmarks, and baselines, but only for about 10
percent of software applications of medium size. The cost per function
point under the worst-case scenario will remain so high that usage
above 15,000 function points will continue to be very rare. There will
probably be even more function point variations, and the chronic lack
of conversion rules from one variation to another will make large-scale
international economic studies almost impossible.
Chapter Eight
Summary and Conclusions
The history of lines of code metrics is a cautionary tale for all people
who work in software. The LOC metric started out well and was fairly
effective when there was only one programming language and coding
was so difficult it constituted 90 percent of the total effort for putting
software on a computer.
But the software industry began to develop hundreds of program-
ming languages. Applications started to use multiple programming
languages, and that remains the norm today. Applications grew from
less than 1000 lines of code up to more than 10 million lines of code.
Coding is the major task for small applications, but for large systems,
the work shifts to defect removal and production of paper documents
in the forms of requirements, specifications, user manuals, test plans,
and many others.
The LOC metric was not able to keep pace with either change. It does
not work well when there is ambiguity in counting code, which always
occurs with high-level languages and multiple languages in the same
application. It does not work well for large systems where coding is only
a small fraction of the total effort.
As a result, LOC metrics became less and less useful until sometime
around 1985 they started to become actually harmful. Given the errors
and misunderstandings that LOC metrics bring to economic, productiv-
ity, and quality studies, it is fair to say that in many situations usage
of LOC metrics can be viewed as professional malpractice if more than
one programming language is part of the study or the study seeks to
measure real economic productivity.
The final point is that continued usage of LOC metrics is a significant
barrier that is delaying the progress of software engineering from a
craft to a true engineering discipline. An occupation that cannot even
measure its own work with accuracy is hardly qualified to be called
Readings and References
Barr, Michael and Anthony Massa. Programming Embedded Systems: With C and GNU
Development Tools. Sebastopol, CA: O'Reilly Media, 2006.
Beck, K. Extreme Programming Explained: Embrace Change. Boston, MA: Addison
Wesley, 1999.
Bott, Frank, A. Coleman, J. Eaton, and D. Rowland. Professional Issues in Software
Engineering, Third Edition. London and New York: Taylor & Francis, 2000.
Cockburn, Alistair. Agile Software Development. Boston, MA: Addison Wesley, 2001.
Cohen, D., M. Lindvall, & P. Costa, "An Introduction to agile methods." Advances in
Computers. New York: Elsevier Science (2004): 1­66.
Garmus, David and David Herron. Function Point Analysis. Boston: Addison Wesley,
Garmus, David and David Herron. Measuring the Software Process: A Practical Guide
to Functional Measurement. Englewood Cliffs, NJ: Prentice Hall, 1995.
Programming and Code Development
Glass, Robert L. Facts and Fallacies of Software Engineering (Agile Software
Development). Boston: Addison Wesley, 2002.
Hans, Professor van Vliet. Software Engineering Principles and Practices, Third
Edition. London, New York: John Wiley & Sons, 2008.
Highsmith, Jim. Agile Software Development Ecosystems. Boston, MA: Addison Wesley,
Humphrey, Watts. PSP: A Self-Improvement Process for Software Engineers. Upper
Saddle River, NJ: Addison Wesley, 2005.
Humphrey, Watts. TSP--Leading a Development Team. Boston, MA: Addison Wesley,
Hunt, Andrew and David Thomas. The Pragmatic Programmer. Boston, MA: Addison
Wesley, 1999.
Jeffries, R., et al. Extreme Programming Installed. Boston, MA: Addison Wesley, 2001.
Jones, Capers. Applied Software Measurement, Third Edition. New York, NY: McGraw-
Hill, 2008.
Jones, Capers. Conflict and Litigation Between Software Clients and Developers,
Version 6. Burlington, MA: Software Productivity Research, June 2006. 54 pages.
Jones, Capers. Estimating Software Costs, Second Edition. New York, NY: McGraw-Hill,
Jones, Capers. Software Assessments, Benchmarks, and Best Practices. Boston, MA:
Addison Wesley Longman, 2000.
Jones, Capers. "The Economics of Object-Oriented Software." American Programmer
Magazine, October 1994: 29­35.
Kan, Stephen H. Metrics and Models in Software Quality Engineering, Second Edition.
Boston, MA: Addison Wesley Longman, 2003.
Krutchen, Phillippe. The Rational Unified Process--An Introduction. Boston, MA:
Addison Wesley, 2003.
Larman, Craig &, Victor Basili. "Iterative and Incremental Development--A Brief
History." IEEE Computer Society, June 2003: 47­55.
Love, Tom. Object Lessons. New York, NY: SIGS Books, 1993.
Marciniak, John J. (Ed.) Encyclopedia of Software Engineering. (2 vols.) New York, NY:
John Wiley & Sons, 1994.
McConnell, Steve. Code Complete. Redmond, WA: Microsoft Press, 1993.
------ Software Estimation--Demystifying the Black Art. Redmond, WA: Microsoft
Press, 2006.
Mills, H., M. Dyer, & R. Linger. "Cleanroom Software Engineering." IEEE Software, 4, 5
(Sept. 1987): 19­25.
Morrison, J. Paul. Flow-Based Programming. A New Approach to Application
Development. New York, NY: Van Nostrand Reinhold, 1994.
Park, Robert E. SEI-92-TR-20: Software Size Measurement: A Framework for Counting
Software Source Statements. Pittsburgh, PA: Software Engineering Institute, 1992.
Pressman, Roger. Software Engineering--Practitioner's Approach, Sixth Edition. New
York, NY: McGraw-Hill, 2005.
Putnam, Lawrence and Ware Myers. Industrial Strength Software--Effective
Management Using Measurement. Los Alamitos, CA: IEEE Press, 1997.
------ Measures for Excellence--Reliable Software On-Time Within Budget. Englewood
Cliffs, NJ: Yourdon Press, Prentice Hall, 1992.
Sommerville, Ian. Software Engineering, Seventh Edition. Boston, MA: Addison Wesley,
Stapleton, J. DSDM--Dynamic System Development Method in Practice. Boston, MA :
Addison Wesley, 1997.
Stephens M. and D. Rosenberg. Extreme Programming Refactored: The Case Against
XP. Berkeley, CA: Apress L.P., 2003.
This page intentionally left blank