|
|||||
Chapter
8
Programming
and Code
Development
Introduction
This
chapter has an unusual slant
compared with other books on
soft-
ware
engineering. Among other
topics, it deals with 12 important
ques-
tions
that are not well
covered in the software
engineering literature:
1.
Why do we have more than
2500 programming
languages?
2.
Why does a new programming
language appear more than
once a
month?
3.
How many programming languages
are really needed by
software
engineering?
4.
Why do most modern
applications use between 2
and 15 different
languages?
5.
How many applications being
maintained are written in
"dead"
programming
languages with few
programmers?
6.
How many programmers use
major languages; how many
use minor
languages?
7.
Should there be a national
translation center that
maintains com-
pilers
and tools for dead
programming languages and
that can con-
vert
antique languages into
modern languages?
8.
What are the major
kinds of bugs found in
source code?
9.
How effective are debuggers
and static analysis tools
compared
with
inspections?
489
490
Chapter
Eight
10.
How effective are various
kinds of testing in terms of
bug
removal?
11.
How effective is reusable code in
terms of quality, security,
and
costs?
12.
Why has the "lines of code"
metric stopped being
effective for soft-
ware
economic studies?
These
12 topics are not the
only topics that are
important about pro-
gramming,
but they are not
discussed often in software
engineering
journals
or books. Following are
discussions of the 12
topics.
A
Short History of Programming
and
Language
Development
It
is interesting to consider the
history of programming and
the devel-
opment
of programming languages. The
early history of
mechanical
computers
driven by gears, cogs, and
later punched cards is
interest-
ing,
but not relevant. However,
these devices did embody
the essence of
computer
programming, which
is to control the behavior of a
mechanical
device
by means of discrete instructions
that could be varied in
order to
change
the behavior of the
machine.
The
pioneers of computer design
include Charles Babbage,
Ada
Lovelace,
Hermann Hollerith, Alan
Turing, John Von Neumann,
Conrad
Zuse,
J. Presper Eckert, John
Mauchly, and a number of
others. John
Backus,
Konrad Zuse, and others
contributed to the foundations of
pro-
gramming
languages. David Parnas and
Edsger Dijkstra
contributed
to
the development of structured
programming that minimized
the ten-
dency
of code branching to form
"spaghetti bowls" of so many
branches
that
the code became nearly
unreadable.
Ada
Lovelace was an associate of
Charles Babbage. In 1842 and
1843,
she
described a method of calculating
Bernoulli numbers for use on
the
Babbage
analytical engine. Her work is
often cited as the world's
first
computer
program, although there is
some debate about
this.
In
the years during and
prior to World War II, a
number of companies
in
various countries built
electro-mechanical computing devices,
pri-
marily
for special purposes such as
calculating trajectories or
handling
mathematical
tasks.
The
earliest models were
"programmed" in part by changing
wire con-
nections
or using plug boards. But
during World War II,
computing devices
were
developed with memory that
could store both data
and instructions.
The
ability to have language
instructions stored in memory
opened the
gates
to modern computer programming as we
know it today.
Konrad
Zuse of Germany built the Z3
computer in 1941 and
later
designed
what seems to be the first
high-level language,
Plankalkül,
Programming
and Code
Development
491
in
1948, although no compiler
was created and the
language was not
used.
The
earliest "languages" that
were stored in computers
were binary
codes
or machine languages, which
obviously were extremely
difficult to
understand,
code, or modify. The
difficulty of working with machine
codes
directly
led to languages that were
more amenable to human
under-
standing
but capable of being
translated into machine
instructions.
The
earliest of these languages
were termed assembly
languages and
usually
had a one-to-one correspondence
between the
human-readable
instructions
(called source
code)
and the executable
instructions (called
object
code).
The
idea of developing languages
that humans could use to
describe
various
algorithms or data manipulation
steps proved to be so useful
that
very
shortly a number of more
specialized languages were
developed.
In
these languages the human
portions were optimized for
certain
kinds
of problems, and the work of
translating the languages
into
machine
code was left to the
compilers. Incidentally, the
main difference
between
an assembler
and
a compiler
is
that assemblers tend to have
a
one-to-one
ratio between source code
and object code, while
compilers
have
a one-to-many ratio. In other
words, one statement in a
compiled
language
might generate a dozen or
more machine
instructions.
The
ability to translate a single
source instruction into many
object
instructions
led to the concept of
high-level
programming languages. In
general,
the higher the level of a
programming language, the
more object
code
can be created from a single
source code
statement.
Both
assembly and compilation
were handled by special
translation
programs
as batch activities. The
source code could not be
run immedi-
ately.
Sometimes translation might be
delayed for hours if the
computer
was
being used for other
work and other computers
were not available.
These
delays led to another form
of code translation.
Programming
language
translators called interpreters
were
soon developed, which
allowed
source code to be converted
into object code
immediately.
In
the early days of computing
and programming, software
was used
primarily
for a narrow range of
mathematical calculations. But
the
speed
of digital computers soon gave
rise to wider ranges of
applica-
tions.
When computers started to be
aimed at business problems
and
to
manipulate text and data, it
became obvious that if the
source code
included
some of the language and
vocabulary of the problem
domain,
then
programming languages would be
easier to learn and use.
The use
of
computers to control physical
devices opened up yet
another need for
languages
optimized for dealing with
physical objects.
As
a result, scores of domain-specific
programming languages
were
developed
that were aimed at tasks
such as list processing,
business
applications,
astronomy, embedded applications,
and a host of others.
492
Chapter
Eight
Why
Do We Have More than
2500
Programming
Languages?
The
concept of having source
code optimized for specific
kinds of busi-
ness
or technical problems is one of the
factors that led to the
enormous
proliferation
of programming languages.
There
are some technical
advantages for having
programming lan-
guages
match the vocabulary of
various problem domains. For
one thing,
such
languages are easy to learn
for programmers who are
familiar with
the
various domains.
It
is actually fairly easy to
develop a new programming
language.
As
computers began to be used
for more and more
kinds of problems,
the
result was more and
more programming languages.
Developing a
new
programming language that
attracted other programmers
also had
social
and prestige value.
As
a result of these technical
and social reasons, the
software industry
developed
new programming languages with
astonishing frequency.
Today,
as of 2009, no one really knows
the actual number of
program-
ming
languages and dialects, but
the largest published lists
of program-
ming
languages now contain 2500
languages (The Language List
by Bill
Kinnersley,
http://people.ku.edu).
The
author's former company,
Software Productivity Research,
has
been
keeping a list of common
programming languages since
1984, and
the
current version contains
more than 700 programming
languages.
New
programming languages continue to
come out at a rate of two
or
three
per calendar month; some
months, more than 10
languages have
arrived.
There is no end in
sight.
One
reason for the plethora of
languages is that a new
language can
be
developed by a single software
engineer in only a month or
two. In
fact,
with compiler-compilers, a new
programming language can
evolve
from
a vague idea to compiled
code in 60 days or
less.
In
1984, the author's first
commercial software estimating
tool was
put
on the market. The first
release of the tool could
perform cost and
quality
estimates for 30 different
programming languages, but
the tool
itself
could handle other languages
using the same logic
and algorithms.
Therefore,
we made a statement to customers
that our tool could
sup-
port
cost estimates for "all
known programming
languages."
Having
made the claim, it was
necessary to back it up by
assembling
a
list of all known
programming languages and
their levels. At the
time
the
claim was made in 1984,
the author hypothesized that
the list might
include
50 languages. However, when
the data was collected, it
was
discovered
that the set of "all
known programming languages"
included
about
250 languages and dialects
circa 1984.
It
was also discovered while
compiling the list that
new languages
were
popping up about once a month;
sometimes quite a few
more.
Programming
and Code
Development
493
It
became obvious that keeping
track of languages was not
going to be
quick
and easy, but would require
continuous effort.
Today,
as of 2009, the current list
of languages maintained by
Software
Productivity
Research has grown to more
than 700 programming
lan-
guages,
and the frequency with which
new languages come out
seems
to
be increasing from about one
new language per month up to
perhaps
two
or even four and
occasionally ten new
languages per month.)
An
approximate chronology of significant
programming languages is
shown
in Table 8-1.
Table
8-1 is only a tiny subset of
the total number of
programming
languages.
It is included just to give
readers who may not be
practicing
programmers
an idea of the wide variety
of languages in existence.
Those
familiar with programming concepts
can see from the
list that
programming
language design took two
divergent paths:
Specialized
languages that were optimal
for narrow sets of
problems
■
such
as FORTRAN, Lisp, ASP, and
SQL
General-purpose
languages that could be used
for a wide range of
■
problems
such as Ada, Objective C,
PL/I, and Ruby.
It
is of sociological interest that
the vast majority of
special-purpose
languages
were developed by individuals or
perhaps two
individuals.
For
example, Basic was developed
by John Kemeny and Thomas
Kurtz;
C
was developed by Dennis
Ritchie; FORTRAN was
developed by John
Backus;
Java was developed by James
Gosling; and Objective C
was
developed
by Brad Cox and Tom
Love.
The
general-purpose languages were
usually developed by
commit-
tees.
For example, COBOL was
developed by a famous committee
with
major
inputs from Grace Hopper of
the U.S. Navy. Other
languages
developed
by committees include Ada
and PL/I. However, some
general-
purpose
languages were also
developed by individuals or
colleagues,
such
as Ruby and Objective
C.
For
reasons that are perhaps
more sociological than
technological, the
attempts
at building general-purpose languages
such as PL/I and
Ada
have
not been as popular with
programmers as many of the
special-
purpose
languages.
This
is a topic that needs both
sociological and technical
research,
because
PL/I and Ada appear to be
well designed, robust, and
capable
of
tacking a wide variety of
applications with good
results.
Another
major divergence in programming
languages occurred
during
the
late 1970s, although
research had started
earlier. This is the
split
between
object-oriented languages such as
SMALLTALK, C++, and
Objective
C and languages that did
not adopt OO methods and
termi-
nology,
such as Basic, Visual Basic,
and XML.
494
Chapter
Eight
TABLE
8-1
Chronology
of Programming Language
Development
1951
Assembly
languages
1954
FORTRAN
(Formula Translator)
1958
Lisp
(List Processing)
1959
COBOL
(Common Business-Oriented
Language)
1959
JOVIAL
(Jules Own Version of the
International Algorithmic
Language)
1959
RPG
(formerly Report Program
Generator)
1960
ALGOL
(Algorithmic Language)
1962
APL
(A Programming Language)
1962
SIMULA
1964
Basic
(Beginner's all-purpose symbolic
instruction code)
1964
PL/I
1964
CORAL
1967
MUMPS
1970
PASCAL
1970
Prolog
1970
Forth
1972
C
1978
SQL
(Structured query
language)
1980
CHILL
1980
dBASE
II
1982
SMALLTALK
1983
Ada83
1985
Quick
Basic
1985
Objective
C
1986
C++
1986
Eiffel
1986
JavaScript
1987
Visual
Basic
1987
PERL
1989
HTML
(Hypertext Markup
Language)
1993
AppleScript
1995
Java
1995
Ruby
1999
XML
(Extensible Markup
Language)
2000
C#
2000
ASP
(Active Server Pages)
2002
ASP.NET
Today
in 2009, more than 50
percent of active programming
languages
tend
to be in the object-oriented camp,
while the other languages
are
procedural
languages, functional languages, or
use some other
method
of
operation.
Programming
and Code
Development
495
Yet
another dichotomy among
programming languages is
whether
they
are typed or un-typed. The
term typed
means
that operations in
a
language are restricted to
only specific data types.
For example, a
typed
language would not allow
mathematical operations against
char-
acter
data. Examples of typed
languages include Ruby,
SMALLTALK,
and
Lisp.
The
opposite case, or un-typed
languages,
means that operations
can
be
performed against any type
of data. Examples of un-typed
languages
include
assembly language and
Forth.
The
terms type and un-typed
are somewhat ambiguous, as
are the
related
terms of strongly typed and
weakly typed. Over and
above ambi-
guity,
there is some debate as to
the virtues and limits of
typed versus
un-typed
languages.
Exploring
the Popularity of
Programming
Languages
There
are a number of ways of
studying the usage and
popularity of
programming
languages. These
include
1.
Statistical analysis of web
searches for specific
languages
2.
Statistical analysis of books
and articles published about
specific
languages
3.
Statistical analysis of citations in
the literature about
specific lan-
guages
4.
Statistical analysis of job
ads for programmers that
cite language
skills
5.
Surveys and statistical
analysis of languages in legacy
applica-
tions
6.
Surveys and statistical
analysis of languages used
for new applica-
tions
A
company called Tiobe
publishes a monthly analysis of
programming
language
popularity that ranks 100
different programming
languages.
Since
this section is being
written in May 2009, the 20
most popular lan-
guages
for this month from
the Tiobe rankings are
listed in Table 8-2.
Older
readers may wonder where
COBOL, FORTRAN, PL/I, and
Ada
reside.
They are further down
the Tiobe list in languages
21 through 40.
Since
new languages pop up at a
rate of more than one per
month,
language
popularity actually fluctuates
rather widely on a
monthly
basis.
As interesting new programming
languages appear, their
popu-
larity
goes up rapidly. But based on
their utility or lack of
utility over
longer
periods, they may drop
down again just as
fast.
496
Chapter
Eight
TABLE
8-2
Popularity
Ranking of Programming Languages as of
May 2009
1.
Java
2.
C
3.
C++
4.
PHP
5.
Visual
Basic
6.
Python
7.
C#
8.
JavaScript
9.
Perl
10.
Ruby
11.
Delphi
12.
PL/SQL
13.
SAS
14.
PASCAL
15.
RPG
(OS/400)
16.
ABAP
17.
D
18.
MATLAB
19.
Logo
20.
Lua
The
popularity of programming languages
bears a certain
resemblance
to
the popularity of prime-time
television shows. Some new
shows such
as
Two
and a Half Men surface,
attract millions of viewers,
and may last
for
a number of seasons. A few shows
such as Seinfeld
become so
popular
that
they go into syndication and
continue to be aired long
after produc-
tion
stops. But many shows are
dropped after a single
season.
It
is interesting that the life
expectancy of programming
languages
and
the life expectancy of
television shows are about
the same. Many
programming
languages have active lives
that span only a few
"seasons"
and
then disappear. Other
languages become standards and
may last for
many
years. However, when all
2500 languages are
considered, the aver-
age
active life of a programming
language when it is being
used for new
development
is less than five years.
Very few programming
languages
attract
development programmers after
more than ten
years.
Some
of the languages that are in
the class of Seinfeld
or
I
Love Lucy
and
may last more than 25
years under syndication
include
Ada
■
C
■
C++
■
Programming
and Code
Development
497
COBOL
■
Java
■
Objective
C
■
PL/I
■
SQL
■
Visual
Basic
■
XML
■
In
a programming language context,
the term syndication
means
that
the
language is no longer under
the direct control of its
originator, but
rather
control has passed to a user
group or to a commercial
company,
or
that the language has
been put in the public
domain and is
available
via
open-source compilers.
It
would be interesting and
valuable if there were
benchmarks and
statistics
kept of the numbers of
applications written in these
long-lived
programming
languages. No doubt C and
COBOL have each been
used
for
more than 1 million
applications on a global
basis.
In
fact, continuing with the
analogy of the entertainment
business,
it
might be interesting to have
awards for languages that
have been
used
for large numbers of
applications. Perhaps "silver"
might go for
100,000
applications, "gold" for 1
million applications, and
"platinum"
for
10 million applications.
If
such an award were created,
a good name for it might be
the
"Hopper,"
after Admiral Grace Hopper,
who did so much to
advance
programming
languages and especially
COBOL. In fact, COBOL is
prob-
ably
the first programming
language in history to achieve
the 1-million-
application
plateau.
Although
the idea of awards for
various numbers of applications
is
interesting,
that would mean that
statistics were available
for ascer-
taining
how many applications were
created in specific languages
or
combinations
of languages. As of 2009, the
software industry does
not
keep
such data.
The
choice of which language
should be used for specific
kinds of
applications
is surprisingly subjective. A colleague
at IBM was asked
in
a meeting if he programmed in the APL
language. His response
was,
"No,
I'm not of that
faith."
It
would be technically possible to
develop a standard method
of
describing
and cataloging the features
of programming languages.
Indeed,
with more than 2500
languages in existence, such a
catalog is
urgently
needed. Even if the catalog
only started with 100 of the
most
widely
used languages, it would
provide valuable
information.
498
Chapter
Eight
The
full set of topics included
to create an effective taxonomy of
pro-
gramming
languages is outside the
scope of this book, but
might contain
factors
such as:
1.
Language
name
Name
of language
2.
Architecture
Object-oriented,
functional, procedural,
etc.
3.
Origin
Year
of creation, names of
inventors
4.
Sources
URLs
of distributors of language
compilers
5.
Current
version
Version
number of current release; 1, 2,
or
whatever
6.
Support
URLs
or addresses of maintenance
organizations
7.
User
associations
Names,
URLs, and locations of user
groups
8.
Tutorial
materials
Books
and learning sources about
the language
9.
Reviews
or critiques
Published
reviews of language in
refereed
journals
10.
Legal
status
Public
domain, licensed, patents,
etc.
11.
Language
definition
Whether
it is formal, informal
12.
Language
syntax
Description
of syntax
13.
Language
typing
Strongly
typed, weakly typed,
un-typed, etc.
14.
Problem
domains
Mathematics,
web, embedded, graphics,
etc.
15.
Hardware
platforms
Hardware
language was intended to
support
16.
OS
platforms
Operating
systems language compilers
work
with
17.
Intended
uses
Targeted
application types
18.
Known
limitations
Performance,
security, problem domains,
etc.
19.
Dialects
Variations
of the basic language
20.
Companion
languages
.NET,
XML, etc. (languages used
jointly)
21.
Extensibility
Commands
added by language
users
22.
Level
Logical
statements relative to
assembly
language
23.
Backfire
level
Logical
statements per function
point
24.
Reuse
sources
Certified
modules, uncertified,
etc.
25.
Security
features
Intrinsic
security features, such as
in
the
E language
26.
Debuggers
available
Names
of debugging tools
27.
Static
analysis available
Names
of static analysis
tools
28.
Development
tools available
Names
of development tools
29.
Maintenance
tools available
Names
of maintenance tools
30.
Applications
to date
Approximately
100, 1000, 10,000, 100,000,
etc.
Given
the huge number of
programming languages, it is
surprising
that
no standard taxonomy exists.
Web searches reveal more
than a dozen
topics
when using search arguments
such as "taxonomies of
program-
ming
languages" or "categories of programming
languages." However,
Programming
and Code
Development
499
these
vary widely, and some
contain more than 50
different descriptive
forms,
but seem to lack any
fundamental organizing
principle.
Returning
now to the main theme,
somewhat alarmingly, the
life
expectancy
of many software applications is
longer than the active
life
of
the languages in which they
were written. An example of
this is the
patient-record
systems of medical records
maintained by the
Veterans
Administration.
It is written in the MUMPS programming
language
and
has far outlived MUMPS
itself.
It
is obvious to students of software
engineering economics that
if
programming
languages have an average
life expectancy of only 5
years,
but
large applications last an
average of 25 years, then
software mainte-
nance
costs are being driven
higher than they should be
due to the very
large
number of aging applications
that were coded in
programming
languages
that are now dead or
dying.
How
Many Programming
Languages
Are
Really Needed?
The
plethora of programming languages
raises basic questions
that
need
to be addressed by the software
engineering literature: How
many
programming
languages does software
engineering really
need?
Having
thousands of programming languages
raises a corollary
ques-
tion:
Is
the existence of more than
2500 programming languages a
good
thing
or a bad thing?
The
argument that asserts having
thousands of languages is a
good
thing
centers around the fact
that languages tend to be
optimized for
unique
classes of problems. As new
problems are encountered,
they
demand
new programming languages, or at
least that is a
hypothesis.
The
argument that asserts having
thousands of languages is a
bad
thing
centers around economics.
Maintenance of legacy
applications
written
in dead languages is an expensive
nightmare. The
constant
need
to train development programmers in
the latest cult
language
is
expensive. Many useful tools
such as static analysis
tools and auto-
mated
test tools support only a
small subset of programming
languages,
and
therefore may require
expensive modifications for
new languages.
Accumulating
large volumes of certified
reusable code is more
difficult
and
expensive if thousands of languages
have to be dealt
with.
The
existence of thousands of programming
languages has created
a
new
subindustry within software
engineering. This new
subindustry is
concerned
with translating dead or dying
languages into new living
lan-
guages.
For example, it is now
possible to translate the MUMPS
language
circa
1967 into the C or Java
languages and to do so
automatically.
A
corollary subindustry is that of
renovation
or
periodically perform-
ing
special maintenance activities on
legacy applications to clean
out
500
Chapter
Eight
dead
code, remove error-prone
modules, and to reduce the
inevitable
increase
in cyclomatic and essential
complexity that occurs over
time
due
to repeated small
changes.
Linguists
and those familiar with
natural human languages
are
aware
that translation from one
language to another is not
perfect. For
example,
some Eskimo dialects include
more than 30 different
words
for
various kinds of snow. It is
hard to get an exact
translation into a
language
such as English that
developed in a temperate climate
and
has
only a few variations on
"snow."
Since
many programming languages
have specialized constructs
for
certain
classes of problem, the translation
into other languages may
lead
to
awkward constructs that
might be difficult for human
programmers
to
understand or deal with during
maintenance and enhancement
work.
Even
so, if the translation opens up a
dead language to a variety of
static
analysis
and maintenance tools, the
effort is probably
worthwhile.
To
deal with the question of
how many programming
languages are
needed,
it is useful to start by considering
the universe of problem
areas
that
need to be put onto
computers. There seem to be
ten discrete prob-
lem
areas, divided into two
different major kinds of
processing, as shown
in
Table 8-3.
These
two general categories
reflect the major forms of
software that
actually
exist today: (1) software
that processes information, and
(2)
software
that controls physical
devices or deals with physical
properties
such
as sound or light or
music.
These
two broad categories might
lead to the conclusion that
per-
haps
two programming languages
would be the minimum number
that
would
be able to address all
problem areas. One language
would be
optimized
for information systems, and
another would be
optimized
for
dealing with physical devices
and electronic signals.
However, the
TABLE
8-3
Problem
Domains of Software
Applications
Logical
and Mathematical Problem
Areas
1.
Mathematical
calculations
2.
Logic
and algorithmic
expressions
3.
Numerical
data
4.
Text
and string data
5.
Time
and dates
Physical
Problem Areas
1.
Sensor-based
electronic signals
2.
Audible
signals and music
3.
Static
images
4.
Dynamic
or moving images
5.
Colors
Programming
and Code
Development
501
track
records of general-purpose languages
such as PL/I and Ada
have
not
indicated much success for
languages that attempt to do
too many
things
at once.
Few
problems are "pure" and
deal with only one narrow
topic. In fact,
most
applications deal with hybrid
problem domains. This leads
to a
possible
conclusion that programming
languages may reflect the
permu-
tations
of problem areas rather than
the problem areas
individually.
If
the permutations of all ten
problem areas were
considered, then we
might
eventually end up with 3,628,800
programming languages.
This
is
even more unlikely to occur
than having one "superlanguage"
that
could
tackle all problem
areas.
From
examining samples of both
information processing
applications
and
embedded and systems
software applications, a provisional
hypoth-
esis
is that about four different
problem areas occur in
typical software
applications.
The permutation of four
topics out of a total of ten
topics
leads
to the hypothesis that the
software engineering domain will
even-
tually
end up with about 5,040
different programming
languages.
Since
we already have about 2500
programming languages and
dia-
lects
in 2009, there may yet be
another 2500 languages still
to be devel-
oped
in the future. At the rate
new languages are occurring
of roughly
100
per year, it can be
projected that new languages
will proceed at
about
the same rate for
another 25 years. From an
economic standpoint,
this
does not seem to be a very
cost-effective engineering
solution.
Assuming
that the software
engineering community does
reach 5040
languages,
the probable distribution of
those languages would
be
4800
languages would be dead or
dying, with few
programmers
■
200
languages would be in legacy
applications and therefore
need
■
maintenance
40
languages would be new and
gathering increasing numbers
of
■
programmers
A
technical alternative to churning
out another 2500 specialized
lan-
guages
for every new kind of
problem that surfaces would
be to consider
building
polymorphic compilers that
would support any
combination of
problem
areas.
Creating
a National Programming
Language
Translation Center
When
considering alternatives to churning
out another 2500
program-
ming
languages, it might be of value to
create a formal
programming
language
translation center stocked with
the language definitions of
all
known
programming languages.
502
Chapter
Eight
This
center could provide
guidance in the translation of
dead or dying
languages
into modern active
languages. Some companies
already per-
form
translation, but out of
today's total of 2500
languages, only a few
are
handled with technical and
linguistic accuracy. Automated
transla-
tion
as of 2009 probably only
handles 50 languages out of
2500 total
languages.
Given
the huge number of existing
programming languages and
the
rapid
rate of creation of new
programming languages, such a
transla-
tion
center would probably
require a full-time staff of at
least 50 person-
nel.
This would mean that
only very large companies
such as IBM or
Microsoft
or large government agencies such as
Homeland Security or
the
Department
of Defense would be likely to
attempt such an
activity.
Over
and above translation, the
national programming
language
translation
center could also perform
careful linguistic analyses of
all
current
languages in order to identify
the main strengths and
weak-
nesses
of current languages. One
obvious weakness of most
languages
is
that they are not
very secure.
Another
function of the translation
center would be to record
demo-
graphic
information about the
numbers and kinds of
applications that
use
various languages. For
example, the languages used
for financial
systems,
for weapons systems, for
medical applications, for
taxation
systems,
and for patient records
have economic and even
national
importance.
It would be useful to keep
records of the
programming
languages
used for such vital
applications. Obviously, maintenance
and
restoration
of these vital applications
has major business and
national
importance.
Table
8-4 is a summary of 40 kinds of
software applications
that
have
critical importance to the
United States. Table 8-4
also shows the
various
programming languages used in
these 40 kinds of
applications.
A
major function of a code translation
center would be to
accumu-
late
more precise data on
critical applications and
the languages used
in
them.
Both
columns of Table 8-4 need
additional research. There
are no
doubt
more kinds of critical
applications than the 40
listed here. Also, in
order
to fit on a printed page, the second
column of the table is
limited
to
about six or seven
programming languages. For
many of these criti-
cal
applications, there may be 50 or
more languages in use at
national
levels.
The
North American Industry
Classification (NAIC) codes of
the
Department
of Commerce identify at least
250 industries that
the
author
knows create software in
substantial volumes. However,
the 40
industries
shown in Table 8-4 probably
contain almost 50 percent
of
applications
critical to U.S. business
and government
operations.
Programming
and Code
Development
503
TABLE
8-4
Programming
Languages Used for Critical
Software Applications
Critical
Software
Programming
Languages
1.
Air traffic control
Ada,
Assembly, C, Jovial, PL/I
2.
Antivirus & security
ActiveX,
C, C++, Oberon7
3.
Automotive engines
C,
C++, Forth, Giotto
4.
Banking applications
C,
COBOL, E, HTML, Java, PL/I,
SQL, XML
5.
Broadband
C,
C++, CESOF, JAVA
6.
Cell phones
C,
C++, C#, Objective C
7.
Credit cards
ASP.NET,
C, COBOL, Java, Perl, PHP,
PL/I
8.
Credit checking
ABAP,
COBOL, FORTRAN, PL/I,
XML
9.
Credit unions
C,
COBOL, HTML, PL/I,
SQL
10.
Criminal records
ABAP,
C, COBOL, FORTRAN,
Hancock
11.
Defense applications
Ada,
Assembly, C, CMS2, FORTRAN,
Java, Jovial, SPL
12.
Electric power
Assembly,
C, DCOPEJ, Java,
Matpower
13.
FBI, CIA, NSA, etc.
Ada,
APL, Assembly, C, C++,
FORTRAN, Hancock
14.
Federal taxation
C,
COBOL, Delphi, FORTRAN,
Java, SQL
15.
Flight controls
Ada,
Assembly, C, C++, C#,
LabView
16.
Insurance
ABAP,
COBOL, FORTRAN, Java,
PL/I
17.
Mail and shipping
COBOL,
dBase2, PL/I, Python,
SQL
18.
Manufacturing
AML,
APT, C, Forth, Lua,
RLL
19.
Medical equipment
Assembly,
Basic, C, CO, CMS2,
Java
20.
Medical records
ABAP,
COBOL, MUMPS. SQL
21.
Medicare
Assembly,
COBOL, Java, PL/I, dBase2,
SQL
22.
Municipal taxation
C,
COBOL, Delphi, Java
23.
Navigation
Assembly,
C, C++, C#, Lua, Logo,
MatLab
24.
Oil and Energy
AMPL,C,
G, GAMS/MPSGE, SLP,
25.
Open-source software
C,
C++, JavaScript, Python,
Suneido, XUL
26.
Operating systems,
large
Assembly,
C, C#, Objective C, PL/S,
VB
27.
Operating systems, small C,
C++, Objective C, OSL,
SR
28.
Pharmaceuticals
C,
C++, Java, PASCAL, SAS,
Visual Basic
29.
Police records
C,
COBOL, DBase2, Hancock,
SQL
30.
Satellites
C,
C++, C#, Java, Jovial, PHP,
Pluto
31.
Securities trading
ABAP,
C #,COBOL, DBase2, Java,
SQL
32.
Social Security
Assembly,
COBOL, PL/I, dBase2,
SQL
33.
State taxation
C,
COBOL, Delphi, FORTRAN,
Java, SQL
34.
Surface transportation
C,
C++, COBOL, FORTRAN, HTML,
SQL
35.
Telephone switching
C,
CHILL, CORAL, Erlang,
ESPL1,ESTEREL
36.
Television broadcasts
C,
C++, C#, Java,
Forth
37.
Voting equipment
Ada,
C, C++, Java
38.
Weapons systems
Ada,
Assembly, C, C++,
Jovial
39.
Web applications
AppleScript,
ASP, CMM, Dylan, E, Perl, PHP,
.NET
40.
Welfare (State)
ASP.NET,
C, COBOL, dBASE2, PL/I,
SQL
504
Chapter
Eight
As
a result of the importance of
these 40 software application
areas
to
the United States business
and to government operations,
they prob-
ably
receive almost 75 percent of
cyberattacks in the form of
viruses,
spyware,
search-bots, and denial of
service attacks. These 40
industries
need
to focus on security. Even a
cursory examination of the
program-
ming
languages used by these
industries reveals that few
of them are
particularly
resistant to viruses or malware
attacks.
For
all 40, maintenance is
expensive and for many, it
is growing
progressively
more expensive due to the
difficulty of simultaneously
maintaining
applications written in so many
different programming
languages.
As
a technical byproduct of translation
from older languages to
new
languages,
one value-added function of a national
programming lan-
guage
translation center would be to
eliminate security
vulnerabilities
at
the same time the
older languages are being
translated.
If
the language translation
center operated as a profit-making
busi-
ness,
it might well grow a
good-sized company. Assuming
the company
billed
at the same rate as Y2K
companies (about $1.00 per
logical state-
ment),
a national translation center
might clear $75 million
per year,
assuming
accurate and competent
translation technology.
What
the author suggests is that
rather than continue to
develop
random
programming languages at random
but rapid intervals, there
is
a
need to address programming
languages at a fundamental
linguistic
level.
A
study team that included
linguists, software engineers,
and domain
specialists
might be able to address the
problems of the most
effective
ways
of expressing the ten
problem areas and their
permutations. The
goal
would be to understand the minimum
set of programming
lan-
guages
capable of handling any
combination of problem
areas.
If
economists were added to the
study team, they would
also be able to
address
the financial impact of
attempting to maintain and
occasionally
renovate
applications written in hundreds of
dead and dying
program-
ming
languages.
Why
Do Most Applications Use
Between 2
and
15 Programming Languages
A
striking phenomenon of software
engineering is the presence of
mul-
tiple
programming languages in the
same applications. This is
not a
new
trend, and many older
applications used combinations
such as
COBOL
and SQL. More recent
combinations might include
Java and
HTML
or XML.
A
similar phenomenon is the
fact that many programming
lan-
guages
are themselves combinations of
two or more other
programming
Programming
and Code
Development
505
languages.
For example, the Objective C
language combines
features
from
SMALLTALK and C. The Ruby
language combines features
from
Ada,
Eiffel, Perl, and Python
among others.
Recall
that a majority of programming
languages are
somewhat
specialized,
and these seem to be more
popular than
general-purposes
languages.
A hypothesis that explains why
applications use
several
different
programming languages is that
the "problem space" of
the
application
is broader than the
"solution space" of individual
program-
ming
languages.
It
was mentioned earlier that
many applications include at
least
four
of the ten problem areas
cited in Table 8-3. However,
many pro-
gramming
languages seem to be optimized
only for one to three of
the
problem
areas. This creates a
situation where multiple
programming
languages
are needed to implement all
of the problem areas in
the
application.
Of
course, using any of the
more general-purpose languages
such as
Ada
or PL/I would reduce the
numbers of languages, but
for sociological
reasons,
these general-purpose languages
have not been as popular
as
the
more specialized
languages.
The
implications of having many
different languages in the
same
application
are that development is more
difficult, debugging is
more
difficult, static analysis is
more difficult, and code
inspection is
more
difficult. After release, maintenance
and enhancement tasks are
more
difficult.
Table
8-5 illustrates how both
development and maintenance
costs
go
up as the number of languages in an
application increase. The
costs
show
the rate of increase
compared with a single
language.
Both
development and maintenance
costs increase as numbers of
pro-
gramming
languages in the same
application increase, but
maintenance
is
more severely
impacted.
TABLE
8-5
Impact
of Multiple Languages on Costs
Languages
in Application
Development
Costs
Maintenance
Costs
1
$1.00
$1.00
2
$1.07
$1.14
3
$1.12
$1.17
4
$1.13
$1.20
5
$1.18
$1.24
6
$1.22
$1.30
7
$1.23
$1.35
8
$1.27
$1.40
9
$1.30
$1.47
10
$1.34
$1.55
506
Chapter
Eight
How
Many Programmers Use
Various
Programming
Languages?
There
is no real census of either
languages used in applications
or
number
of programmers. While the
Department of Commerce and
the
Bureau
of Labor Statistics do issue
reports on such topics in
the United
States,
their statistics are known
to be inaccurate.
A
survey done by the author
and his colleagues a few
years ago found
that
the human resources
organizations in most large
corporations did
not
know how many programmers or
software engineers were
actually
employed.
Since government statistics
are based on reports from
HR
organizations,
if they don't know, then HR
organizations can't
provide
good
data to the
government.
Among
the reasons government
statistics probably understate
the
numbers
of programmers and software
engineers is because of ambigu-
ous
job titles. For example,
some large companies use
titles such as
"member
of the technical staff" as an
umbrella title that might
include
software
engineers, hardware engineers,
systems analysts, and
perhaps
another
dozen occupations.
Another
problem with knowing how
many software engineers
there
are
is the fact that many
personnel working on embedded
applications
are
not software engineers or
computer scientists by training,
but rather
electrical
engineers, aeronautical engineers,
telecommunications engi-
neers,
or some other type of
engineer.
Because
the status of these older
forms of engineering is higher
than
the
status of software engineering,
many people working on
embed-
ded
software refuse to be called
software engineers and
insist on being
identified
by their true academic
credentials.
The
study carried out by the
author and his colleagues
was to derive
information
on the number of software
specialists (i.e., quality
assurance,
database
administration, etc.) employed by large
software-intensive com-
panies
such as IBM, AT&T, Hartford
Insurance, and so
forth.
The
study included on-site
visits and discussions with
both HR organi-
zations
and also local software
managers and executives. It
was during
the
discussions with local software
managers and executives that
it was
discovered
that not a single HR
organization actually had
good statistics
on
software engineering
populations.
Based
on on-site interviews with client
companies and then
extrapola-
tion
from their data to national
levels, the author assumes
that the U.S.
total
of software engineers circa
2009 is about 2.5 million.
Government
statistics
as of 2009 indicate around
600,000 programmers, but
these
statistics
are low for reasons
already discussed. Additionally,
the govern-
ment
statistics also tend to omit
one-person companies and
individual
programmers
who develop applets or
single applications.
Programming
and Code
Development
507
About
60 percent of these software
engineers work in
maintenance
and
enhancement tasks, and 40
percent work as developers on
new
applications.
There are of course
variations. For example,
many more
developers
than maintenance personnel
work on web
applications,
because
all of these applications
are fairly new. But for
traditional
mainframe
business applications and
ordinary embedded and
systems
software
applications, maintenance workers
outnumber development
workers
by a substantial margin.
Table
8-6 shows the approximate
numbers of software engineers
by
language
for the United States.
However, the data in Table
8-6 is hypo-
thetical
and not exact. Among
the reasons that the
data is not exact
is
that many software engineers
know more than one
programming
language
and work with more than one
programming language.
However,
Table 8-6 does illustrate a
key point: The most
common lan-
guages
for software development are
not the same as the
most common
languages
for software maintenance.
This situation leads to a
great deal
of
trouble for the software
industry.
The
most obvious problem
illustrated by Table 8-6 is
that it is difficult
to
get development personnel to
work on maintenance tasks because
of
the
perceived view that older
languages are not as
glamorous as modern
languages.
A
second problem is that due to
the differences in programming
lan-
guages
between maintenance and new
development, two different
sets
TABLE
8-6
Estimated
Number of Software Engineers by
Language
Development
Software
Maintenance
Software
Languages
Engineers
Languages
Engineers
Java
175,000
COBOL
575,000
C
150,000
PL/I
125,000
C++
130,000
Ada
100,000
Visual
Basic
100,000
Visual
Basic
75,000
C#
90,000
RPG
75,000
Ruby
65,000
Basic
75,000
JavaScript
50,000
Assembler
75,000
Perl
30,000
C
75,000
Python
20,000
FORTRAN
65,000
COBOL
15,000
Java
60,000
PHP
15,000
JavaScript
40,000
Objective
C
10,000
Jovial
10,000
Others
150,000
Others
150,000
1,000,000
1,500,000
508
Chapter
Eight
of
tools are likely to be
needed. The developers are
interested in using
modern
tools including static
analysis, automated testing,
and other
fairly
new innovations.
However,
many of these new tools do
not support older
languages,
so
the software maintenance
community needs to be equipped
with
maintenance
workbenches that include
tools with different
capabilities.
For
example, tools that analyze
cyclomatic and essential
complexity
are
used more often in
maintenance work than in new
development.
Tools
that can trace execution
flow are also used
more widely in main-
tenance
work than in development.
Another new kind of tool
that sup-
ports
maintenance more than
development can "mine"
legacy code and
extract
hidden business rules. Yet
another kind of tool that
supports
maintenance
work is tools that can
parse the code and
automatically
generate
function point
totals.
It
is fairly easy for programmers to
learn new languages, but
nobody can
possibly
learn 2500 programming
languages. An average programmer
in
the
U.S. is probably fairly
expert in one language and
fairly knowledgeable
in
three others. Some may
know as many as ten
languages. The
plethora
of
languages obviously introduces
major problems in academic
training
and
in ways of keeping programmers
current in their skill
sets.
The
bottom line is that
development and maintenance
tool suites are
often
very different, and this is
due in large part to the
differences in
programming
languages used for
development and for
maintenance.
Since
the great majority of
languages widely used for
development
today
in 2009 will fall out of
service in less than ten
years, the software
industry
faces some severe
maintenance challenges.
Languages
used for new development
are surfacing at rates
of
more
than two per month.
Most of these languages will be
short-lived.
However,
some of the applications
created in these ephemeral
languages
will
last for many years. As a
result, the set of
programming languages
associated
with legacy applications that
need maintenance is
growing
larger
at rates that sometimes
might top 50 languages per
year!
A
major economic problem
associated with having thousands
of
programming
languages is that the
plethora of languages is
driving
up
maintenance costs. Ironically, one of
the major claims of new
pro-
gramming
languages is that "they
improve programming
productivity."
Assuming
that such claims are
true at all, they are
only true for
new
development.
Every single new language is
eventually going to add
to
the
U.S. software maintenance
burden. This is because
programming
languages
have shorter life
expectancies than the
applications created
with
them. One by one, today's
"new" languages will drop
out of use
and
leave behind hundreds of
aging legacy applications with
declining
numbers
of trained programmers, few
effective tools, and
sometimes
not
even working
compilers.
Programming
and Code
Development
509
What
Kinds of Bugs or
Defects
Occur
in Source Code?
In
2008 and 2009, a major
new study was performed
that identified the
most
common and serious 25
software bugs or defects.
The study was
sponsored
by the SANS Institute, with
the cooperation of MITRE
and
about
30 other organizations.
This
new study is deservedly
attracting a great deal of
attention. In
the
history of software quality
and security, it will no doubt be
ranked
as
a landmark report. Indeed,
all software engineering
groups should
get
copies of the report and
make it required reading for
software engi-
neers,
quality assurance personnel,
and also for software
managers and
executives.
Access
to the report can be had
via either the SANS
Institute or
MITRE
web sites. The relevant URLS
are
www.SANS.org
■
www.CWE-MITRE.org
■
In
spite of the fact that
software engineering is now a
major occupa-
tion
and millions of applications
have been coded, only
recently has
there
been a serious and
concentrated effort to understand
the nature
of
bugs and defects that
exist in source code. The
SANS report is
signifi-
cant
because the list of 25 serious
problems was developed by a
group
of
some 40 experts from major
software organizations. As a result, it
is
obvious
that the problems cited
are universal programming
problems
and
not issues for a single
company.
Over
the years, many large
companies such as IBM, AT&T,
Microsoft,
and
Unisys have had very
sophisticated defect tracking
and monitor-
ing
systems. These same
companies have also used
root-cause analy-
sis.
Some of the results of these
internal defect tracking
systems have
been
published, but they usually
were not perceived as having
general
applicability.
A
number of common problems
have long been well
understood: buffer
overflows,
branches to incorrect locations,
and omission of error
han-
dling
are well known and
avoided by experienced software
engineers.
But
that is not the same as
attempting a rigorous analysis
and quanti-
fication
of coding defects.
The
SANS report is a very
encouraging example of the
kind of prog-
ress
that can be made when
top experts from many
companies work
together
in a cooperative manner to explore
common problems. The
SANS
study group included experts
from academia, government,
and
commercial
companies. It is also encouraging
that these three kinds
of
organizations
were able to cooperate
successfully. The normal
relation-
ship
among the three is often
adversarial rather than
cooperative, so
510
Chapter
Eight
having
all three work together
and produce a useful report
is a fairly
rare
occurrence.
Hopefully,
the current work will serve
as a model of future
collabora-
tion
that will deal with other
important software issues.
Some of the
additional
topics that might do well in
a collaborative mode
include:
1.
Defect removal
methods
2.
Economic analysis of software
development
3.
Economic analysis of software
maintenance
4.
Software metrics and
measurement
5.
Software reusability
Some
of the organizations that
participated in the SANS
study include
in
alphabetical order:
Apple
■
Aspect
Security
■
Breach
Security
■
CERT
■
Homeland
Security
■
Microsoft
■
Mitre
■
National
Security Agency
■
Oracle
■
Perdue
University
■
Red
Hat
■
Tata
■
University
of California
■
This
is only a partial list, but
it shows that the study
included aca-
demia,
commercial software organizations,
and government
agencies.
The
overall list of 25 security
problems was subdivided into
three
larger
topical areas. Readers are
urged to review the full
report, so only
a
bare list of topics is
included here:
Interactions
1.
Poor input validation
2.
Poor encoding of
output
3.
SQL query structures
Programming
and Code
Development
511
4.
Web page structures
5.
Operating system command
structures
6.
Open transmission of sensitive
data
7.
Forgery of cross-site
requests
8.
Race conditions
9.
Leaks from error
messages
Resource
Management
10.
Unconstrained memory
buffers
11.
Control loss of state
data
12.
Control loss of paths and
file names
13.
Hazardous paths
14.
Uncontrolled code
generation
15.
Reusing code without
validation
16.
Careless resource
shutdown
17.
Careless initialization
18.
Calculation errors
Defense
Leakages
19.
Inadequate authorization and
access control
20.
Inadequate cryptographic
algorithms
21.
Hard coding and storing
passwords
22.
Unsafe permission
assignments
23.
Inadequate randomization
24.
Excessive issuance of
privileges
25.
Client/server security
lapses
The
complete SANS list contains
detailed information about
each of
the
25 defects and also
supplemental information on how
the defects
are
likely to occur, methods of
prevention, and other
important issues.
This
is why readers are urged to
examine the full SANS
list.
As
of 2009, these 25 problems
may occur in more than 85
percent of
all
operational software applications.
One or more of these 25
problems
can
be cited in more than 95
percent of all successful
malware attacks.
Needless
to say, the SANS list is a
very important document that
needs
widespread
distribution and extensive
study.
512
Chapter
Eight
The
SANS report is a valuable
resource for companies
involved in
testing,
static analysis, inspections,
and quality assurance. It
provides
a
very solid checklist of
topics that need to be
validated before code
can
be
safely released to the
outside world.
Logistics
of Software Code
Defects
While
the SANS report does an
excellent job of identifying
serious soft-
ware
and code defects, once the
defects are present in the
code and the
code
is in the hands of users,
some additional issues need
discussion.
Following
is a list of topics that
discuss logistical issues
associated with
software
defects:
1.
Defect
A
problem caused by human
beings that causes a
software
application
to either stop running or to
produce incorrect
results.
Defects
can be errors of commission,
where developers did
some-
thing
wrong, or errors of omission,
where developers failed to
antici-
pate
a specific condition.
2.
Defect
severity level (IBM
definition) Severity 1, software
stops
working;
Severity 2, major features
disabled or incorrect;
Severity
3,
minor problem; Severity 4,
cosmetic error with no
operational
impact.
3.
Invalid
defect A
problem reported as a defect
but which upon
analysis
turns out to be caused by
something other than the
soft-
ware
itself. Hardware problems,
user errors, and operating
system
errors
mistakenly reported as application
errors are the
most
common
invalid defects. These total
as many as 15 percent of
valid
defect
reports.
4.
Abeyant
defect (IBM
term) A defect reported by a
specific cus-
tomer
that cannot be replicated on
any other version of the
software
except
the one being used by the
customer. Usually, abeyant
defects
are
caused by some unique
combination of hardware devices
and
other
applications that run at the
same time as the software
against
which
the defect was reported.
These are rare but
very difficult to
track
down and repair.
5.
False
positive A
code segment initially
identified by a static
analysis
tool or a test case as a
potential defect. Upon
further analy-
sis,
the code turns out to be
correct.
6.
Secondhand
defects A
defect in an application that
was not
caused
by any overt mistakes on the
part of the development
team
itself,
but instead was caused by
errors in a compiler or tool
used
by
the development team. Errors
in code generators and
automatic
test
tools are examples of
secondhand defects. The
developers used
Programming
and Code
Development
513
the
tools in good faith, but as
a result, bugs were created.
An exam-
ple
of a secondhand defect was a
compiler error that
incorrectly
handled
an instruction. The code was
compiled and executed,
but
the
instruction did not operate
as defined in the language
specifica-
tion.
It was necessary to review
the machine language
listings to
find
this secondhand defect since
it was not visible in the
source
code
itself.
7.
Undetected
defects These
are similar to secondhand
defects,
but
turn out to be due to either
incomplete test coverage or to
gaps
in
static analysis tools. It is
widely known that test
suites almost
never
touch 100 percent of the
code in any application, and
some-
times
less than 60 percent of the
code in large applications.
To
minimize
the impact of undetected
defects and partial test
cover-
age,
it is necessary to use test
coverage analysis tools.
Major gaps
in
coverage may need special
testing or formal
inspections.
8.
Data
defects Defects
that are not in source
code or applications,
but
which reside in the data
that passes through the
application. A
very
common example of a data
defect would be an incorrect
mail-
ing
address. Data errors are
numerous and may be severe,
and they
are
also difficult to eliminate.
Data defects probably
outnumber
code
defects, and their status in
terms of liability is
ambiguous.
More
serious examples of data
defects are errors in credit
reports,
which
can lower credit ratings
without any legitimate
reason and
also
without any overt defects in
software. Data defects are
noto-
riously
difficult to repair, in part
because there are no
effective
quality
assurance organizations involved with
data defects. In
fact,
there
may not even be any
reporting channels.
9.
Externally
caused defects A
defect that was not
originally a
defect,
but became one due to
external changes such as new
tax
laws,
changes in pension plans,
and other government
mandates
that
trigger code changes in
software applications. An
example
would
be a change in state sales
taxes from 6 percent to 7.5
per-
cent,
which necessitates changes in
many software
applications.
Any
application that does not
make the change will end up
with a
defect
even though it may have
run successfully for years
prior to
the
external change. Such
changes are frequent but
unpredictable
because
they are based on government
actions.
10.
Bad
fixes About
7 percent of attempts to repair a
software code
defect
accidentally contain a new
defect. Sometimes there are
sec-
ondary
and even tertiary bad
fixes. In one lawsuit against a
soft-
ware
vendor, four consecutive
attempts to fix a bug in a
financial
application
added new defects and
did not fix the original
defect.
The
fifth attempt finally got it
right.
514
Chapter
Eight
11.
Legacy
defects These
are defects that surface
today, but which
may
have been hidden in software
applications for ten years
or
more.
An example of a legacy defect
was a payroll application
that
stopped
calculating overtime payments
correctly. What
happened
was
that overtime began to
exceed $10.00 per hour,
and the field
had
been defined with $9.99 as
the maximum amount. The
problem
was
more than ten years
old when it first occurred
and was identi-
fied.
(The original developers of
the application were no
longer even
employed
by the company at the time
the problem
surfaced.)
12.
Reused
defects Between
15 percent and 50 percent of
software
applications
are based on reused code
either acquired
commercially
or
picked up from other
applications. Due to the
lack of certifica-
tion
of reusable materials, many
bugs or errors are in reused
code.
Whether
liability should be assigned to
the developer or to the
user
of
reused material is ambiguous as of
2009.
13.
Error-prone
modules (IBM
term) Studies of IBM software
dis-
covered
that bugs or defects were
not randomly distributed
but
tended
to clump in a small number of
places. For example, in
the
IMS
database product, about 35
modules out of 425 were
found to
contain
almost 60 percent of total
customer-reported bugs.
Error-
prone
modules are fairly common in
large software applications.
As
a
rule of thumb, about 3
percent of the modules in
large systems
are
candidates for being
classified as error-prone
modules.
14.
Incident
An
incident
is an
abrupt stoppage of a software
applica-
tion
for unknown reasons.
However, when the software
is restarted,
it
operates successfully. Incidents
are not uncommon, but
their ori-
gins
are difficult to pin down.
Some may be caused by
momentary
power
surges or power outages;
some may be caused by
hardware
problems
or even cosmic rays; and
some may be caused by
soft-
ware
bugs. Because incidents are
usually random in occurrence
and
cannot
be replicated, it is difficult to study
them.
15.
Security
vulnerabilities These
are code segments that
are
frequently
used by viruses, worms, and
hackers to gain access
to
software
applications. Error handling
routines and buffer
overflows
are
common examples of vulnerabilities. As of
2009, these are
not
usually
classified as defects because
they are only channels
for
malicious
attacks. However, given the
alarming increase in
such
attacks,
there may be a need to
reevaluate how to classify
security
vulnerabilities.
16.
Malicious
software engineers From
time to time software
engineers
become disgruntled with their
colleagues, their
manag-
ers,
or the companies that they
work for. When this
situation occurs,
Programming
and Code
Development
515
some
software engineers deliberately
insert malicious code into
the
applications
that they are developing.
This situation is most
likely
to
occur in the time interval
between a software engineer
receiv-
ing
a layoff notice and the
actual day of departure.
While only a
few
software engineers cause
deliberate harm, the
situation may
become
more prevalent as the
recession deepens and
lengthens. In
any
case, the fact that
software engineers can
deliberately perform
harmful
acts is one of the reasons why
software engineers who
work
for
the Internal Revenue Service
have their tax returns
examined
manually.
Of course, not only
malicious code can occur,
but also
other
harmful kinds of coding
might be used by software
engineer-
ing
employees, such as diverting
funds to personal
accounts.
17.
Defect
potentials This
term originated in IBM circa
1973 and is
included
in all of my major books. The
term defect
potential refers
to
the sum total of possible
defects that are likely to
be encoun-
tered
during software development.
The total includes five
sources of
defects:
(1) requirements defects, (2) design
defects, (3) code
defects,
(4)
document defects, and (5)
bad fixes or secondary
defects. Current
U.S.
averages for defect
potentials are about 5.0
per function point. A
rule
of thumb for predicting
defect potentials is to raise
the size of the
application
in function points to the
1.25 power. This gives a
useful
approximation
of total defects that are
likely to occur for
applications
between
about 100 function points
and 10,000 function
points.
18.
Defect
removal efficiency This
term also originated in
IBM
circa
1973. It refers to the ratio
of defects detected to defects
pres-
ent.
If a unit test finds 30 bugs
out of a total of 100 bugs,
it is 30
percent
efficient. Most forms of
testing are less than 50
percent
efficient.
Static analysis and formal
inspections top 80 percent
in
defect
removal efficiency.
19.
Cumulative
defect removal efficiency
This
term also origi-
nated
in IBM circa 1973. It refers to
the aggregate total of
defects
removed
by all forms of inspection,
static analysis, and
testing. If a
series
of removal operations that
includes requirement, design,
and
code
inspections; static analysis;
and unit, new function,
regression,
performance,
and system tests finds 950
defects out of a
possible
1000,
the cumulative efficiency is 95
percent. Current U.S.
averages
are
only about 85 percent.
Cumulative defect removal
efficiency is
calculated
at a fixed point in time,
usually 90 days after
software
is
released to customers.
20.
Performance
issues Some
applications have stringent
perfor-
mance
criteria. An example might be
the target-seeking
guidance
system
in a Patriot missile; another
example would be the
embed-
ded
software inside antilock
brakes. If the software
fails to achieve
516
Chapter
Eight
its
performance targets, it may be
unusable or even
hazardous.
However,
performance issues are not
usually classified as
defects
because
no incorrect code is involved.
What is involved are
execu-
tion
paths that are too
long or that include too
many calls and
branches.
Even though there may be no
overt errors, there are
sub-
stantial
liabilities associated with performance
problems.
21.
Cyclomatic
and essential complexity These
are mathemati-
cal
expressions that provide a
quantitative basis for
judging the
complexity
of source code segments. The
metrics were invented
by
Dr.
Tom McCabe and are sometimes
called McCabe
complexity
metrics.
Calculations
are based on graph theory,
and the general
formula
is "edges nodes + 2." Practically
speaking, cyclomatic
com-
plexity
levels less than ten
indicate low complexity when
the code
is
reviewed by software engineers.
Cyclomatic complexity
levels
greater
than 20 indicate very
complex code. The metrics
are signifi-
cant
because of correlations between defect
densities and
cyclomatic
complexity
levels. Essential complexity is
similar, but uses
mathe-
matical
techniques to simply the
graphs by removing
redundancy.
22.
Toxic
requirement This
is a new term introduced in
2009 and
derived
from the financial phrase
toxic
assets. A
toxic requirement
is
defined as an explicit user requirement
that is harmful and
will
cause
serious damages if not
removed. Unfortunately, toxic
require-
ments
cannot be removed by means of
regular testing because once
toxic
requirements are embedded in
requirements and design
docu-
ments,
any test cases created
from those documents will
confirm the
error
rather than identify it.
Toxic requirements can be
removed
by
formal inspections of requirements,
however. An example of a
toxic
requirement is the famous Y2K
problem, which
originated
as
a specific user requirement. A
more recent example of a
toxic
requirement
is the file handling of the
Quicken financial
software
application.
If a backup file is "opened"
instead of being
"restored,"
then
Quicken files can lose
integrity.
Summary
and Conclusions
on
Software Defects
As
discussed earlier in this
book, the current U.S.
average for software
defect
volumes is about 5.0 per
function point. (This total
includes
requirements
defects, design defects,
coding defects,
documentation
defects,
and bad fixes or secondary
defects.)
Cumulative
defect removal is only about
85 percent. As a result,
soft-
ware
applications are routinely
delivered with about 0.75
defect per
function
point. Note that at the
point of delivery, all of
the early defects
in
requirements and design have
found their way into
the code. In other
Programming
and Code
Development
517
words,
while the famous Y2K problem
originated as a requirements
defect,
it eventually found its way
into source code. No
programming
language
was immune, and therefore
the Y2K problem was
endemic
across
thousands of applications written in
all known programming
languages.
For
a typical application of 1000
function points, 0.75
released defect
per
function point implies about
750 delivered defects. Of
these, about
20
percent will be high-severity defects:
150 high-severity defects
will
probably
be in the code when users
get the first
releases.
Five
important kinds of remedial
actions can improve this
situation:
1.
Measurement of defect volumes by
100 percent of software
organi-
zations.
2.
Measurement of defect removal
efficiency for every kind of
inspec-
tion,
static analysis, and test
stage used.
3.
Reducing defect potentials by
means of effective defect
prevention
methods
such as joint application
design (JAD) and quality
function
deployment
(QFD), and others.
4.
Raising defect removal
efficiency levels by means of
formal inspec-
tions,
static analysis, and
improved testing.
5.
Examining the results of
quality on defect removal
costs and also on
total
development costs and
schedules, plus maintenance
costs.
The
combination of these five
key activities can lower
defect poten-
tials
down to less than 3.0
defects per function point
and raise defect
removal
efficiency levels higher
than 95 percent on average, with
mis-
sion-critical
applications hitting 99
percent.
An
achievable goal for the
software industry would be to
achieve aver-
ages
of less than 3.0 defects
per function point, defect
removal efficiency
levels
of more than 95 percent, and
delivered defect volumes of
less than
0.15
defect per function
point.
The
combined results from better
measurement, better defect
pre-
vention,
and better defect removal
would reduce delivered
defects for
a
1000function point application
from 750 down to only
150. Of these
150,
only about 10 percent would
be high-severity defects. Thus,
instead
of
150 high-severity defects
that normally occur today,
only 15 high-
severity
defects might occur. This is
an improvement of a full order
of
magnitude.
Even
better, empirical data
indicates that applications at
the high
end
of the quality spectrum have
shorter development schedules,
lower
development
costs, and much lower
maintenance costs.
Indeed,
the main reason for
both schedule slippages and
cost over-
runs
is because of excessive defect
volumes at the start of
testing.
518
Chapter
Eight
Most
projects are on schedule and
within budget until testing
starts,
at
which time excessive defects
stretch out testing by
several hundred
percent
compared with plans and cost
estimates.
The
technologies to achieve better
quality results actually
exist today
in
2009, but are not
widely deployed. That means
that better awareness
of
quality and the economic
value of quality are
critical weaknesses of
the
software industry circa
2009.
Preventing
and Removing Defects
from
Application
Source Code
During
development of software applications,
the approximate
average
number
of defects encountered averages
about 1.75 per function
point or
17.5
per KLOC for languages
where the ratio of lines of
code to function
points
is about 100. As pointed out
earlier in this book, defect
volumes
vary
by the level of the
programming languages, and
they also vary by
the
experience and skill of the
programming team.
The
minimum quantity of defects in source
code will be about 0.5
per
function
point or 5 per KLOC, while
the maximum quantity will
top
3.5
defects per function point
or 35 defects per KLOC,
assuming the
same
level of programming
language.
However,
in spite of wide ranges of
potential defects, there are
still
more
coding defects than any
other kind of defect. Defect
removal effi-
ciency
against coding defects is in
the range of 80 percent to 99
per-
cent.
Some coding defects will
slip through even in the
best of cases,
although
it is certainly better to approach 99
percent than it is to
lag
at
80 percent.
For
coding defects as with all
other defect sources, two
channels need
to
be included in order to improve
code quality:
1.
Defect
prevention, or
methods that can lower
defect potentials.
2.
Defect
removal, or
methods that can seek
out, find, and
eliminate
coding
defects.
The
available forms of defect
prevention for coding
defects include
certified
reusable code modules, use of
patterns or standard
coding
approaches
for common situations, use
of structured programming
methods,
use of higher-level programming
languages, constructing
prototypes
prior to formal development,
dividing large
applications
into
small segments (as does
Agile development), participation
in
code
inspections, test-based development,
and usage of static
analysis
tools.
Pair programming is also
reported to have some
efficacy in terms
of
defect prevention, but this
method has very low usage
and very
little
data.
Programming
and Code
Development
519
The
available forms of defect
removal for coding defects
include desk
checking,
pair programming, debugging
tools, code inspections,
static
analysis
tools, and 17 kinds of
conventional testing plus
automated unit
testing
and regression
testing.
Defect
removal by individual software
engineers is difficult to
study.
Desk
checking, debugging, and
unit testing are usually
private activi-
ties
with no observers and no detailed
records kept. Most
corporate
defect-tracking
systems do not start to
collect data until public
defect
removal
begins with formal inspections,
function tests, and
regression
tests.
What happens before these
public events is usually
invisible.
There
are some exceptions,
however.
At
one point, IBM asked for
volunteers who were willing
to record the
numbers
of bugs they found in their
own code by themselves. The
pur-
pose
of the study was to find
out what was the
actual defect removal
effi-
ciency
from these normally
invisible forms of defect
removal. Obviously,
the
data was not used in
any punitive fashion and
was kept
confidential,
other
than to produce some
statistical reports.
More
recently the Personal
Software Process (PSP) and
Team
Software
Process (TSP) methods
developed by Watts Humphrey
have
also
included defect recording
throughout the code
development cycle.
Unfortunately,
the Agile development method
has moved in the
other
direction
and usually does not
record private defect
removal. Indeed,
many
Agile projects do not record
defect data at all, which is
a mistake
because
it reduces the ability of
the Agile method to prove
its value in
terms
of quality.
The
public forms of defect
removal are discussed in
this book in
Chapter
9, which deals with quality.
The emphasis in this chapter
is
more
on the private forms of
defect removal, which are
seldom covered
in
the software engineering
literature.
Private
defect removal lacks the
large volumes of data
associated with
some
of the public forms such as
formal inspections, static
analysis, and
the
test stages that involve
other players such as test
specialists and soft-
ware
quality assurance. But for
the sake of completeness,
the topics of pri-
vate
defect prevention and
private defect removal need
to be included.
Before
discussing the effectiveness of
either defect prevention
or
defect
removal, it should be noted
that individual software
engineers
or
programmers vary widely in
experience and
skills.
In
one controlled study at IBM where a
number of programmers
were
asked
to implement the same trial
example, the quantity of
code pro-
duced
varied by about 6 to 1 between
the bulkiest solution and
the most
concise
solution for the same
specification.
Similar
studies showed about a 10 to 1
variation in the amount
of
time
a sample of programmers needed to
code and debug a
standard
problem
statement.
520
Chapter
Eight
These
wide variations in individual
performance mean that
individ-
ual
human variations in a population of
software engineers
probably
account
for more divergence in
results than do methods,
tools, or factors
that
can be studied
objectively.
Forms
of Programming Defect
Prevention
It
is much more difficult to
measure or quantify defect
prevention than
it
is to measure defect removal.
With defect removal, it is
possible to
accumulate
statistics on numbers of defects
found and their
severity
levels.
Once
the project is released to
customers, defect counts
continue.
After
90 days of usage, it is possible to
combine the internally
discov-
ered
defects with the customer-reported
defects and to calculate
defect
removal
efficiency. If development personnel
found 85 defects and
cus-
tomers
reported 15 defects, the
removal efficiency is 85 percent.
Such
data
is easy to collect, valuable,
and fairly accurate, except
for some
invisible
defects found via private
removal actions such as desk
check-
ing
and unit test.
For
defect prevention, there is no
easy way to measure the
absence
of
defects.
The methods available for
exploring defect prevention
require
collecting
data from a fairly large
number of projects, where
some of
them
utilized a specific defect
prevention method and others
did not.
For
example, assume you measure
a sample of 50 projects that
used
structured
coding methods and another
50 projects that did not
use
structured
programming methods. Assume
the 50 projects that
used
structured
programming averaged 10 coding
defects per KLOC or 1
per
function point. Assume the
50 projects that did not
use structured
programming
averaged 20 coding defects
per KLOC or 2 per
function
point.
This kind of analysis allows
you to make a hypothesis
that the
structured
coding prevents about 50
percent of coding defects,
but it is
still
only a hypothesis and not
proof.
Further,
real-life situations are
seldom simple and easy to
deal with.
There
may be numerous other
factors at play, such as
usage of static
analysis,
usage of higher-level languages, usage of
inspections, variations
in
programming experience, complexity of
the problems, and so
forth.
The
many different factors that
can influence defect
prevention mean
that
exact knowledge of the
effectiveness of any specific
factor is some-
what
subjective at best, and will
probably stay that
way.
Academic
institutions can perform
controlled experiments with
stu-
dents
where they measure the
effectiveness of a single variable,
but
such
studies are fairly rare
concerning defect
prevention.
However,
from long-range observations
involving hundreds of
soft-
ware
personnel and hundreds of
software projects over a
multi-year
Programming
and Code
Development
521
time
span, some objective factors
about defect prevention have
reason-
ably
strong support:
Code
reuse as defect prevention
If
reusable code is available
that has
been
certified to zero-defect levels, or at
least carefully inspected,
tested,
and
subjected to static analysis
before being made reusable,
this is the
best
known form of defect
prevention. Defect potentials in
certified reus-
able
code modules are only a
fraction of the 15 per KLOC
normally
encountered
during custom development;
sometimes only about
1/100th
as
many defects are
encountered.
However,
and this is an important
point, using uncertified
reusable code
can
be both hazardous and
expensive. If the defect
potentials in uncer-
tified
reusable code are more
than about 1 per KLOC,
and the reused
code
is plugged into more than
ten different applications,
the combined
debugging
costs will be so high that this
example of reuse would have
a
negative
return on investment.
Although
certified reuse is the most
effective form of defect
prevention
and
counts as a best practice, it is
also the rarest. Uncertified
sources of
reuse
outnumber certified sources by at least
50 to 1. Reuse of certified
code
and other materials would
class as a best practice. But
reuse of
materials
that are uncertified must be
classed as a hazardous
practice.
It
is much harder for software
engineers to debug someone
else's
unfamiliar
code than it is to debug
their own. Every single
time a reused
code
module is utilized for a new
application, there is a good
chance that
the
same errors will be encountered.
Thus, uncertified reuse is
hazard-
ous
and can be more expensive
than custom development of
the same
module--hence,
the reason the uncertified
reuse can have a
significant
negative
return on investment
(ROI).
Code
reuse comes from many
sources, including commercial
vendors,
legacy
applications, object-oriented class
libraries, corporate
reuse
libraries,
public-domain and open-source
libraries, and a number
of
others.
While reusable code is
fairly plentiful, something
that is not
plentiful
is data on the repair
frequencies of reusable materials.
(See
the
section on certifying reusable
materials earlier in this
book for addi-
tional
information.)
As
mentioned elsewhere in the
book, code reuse by itself
is only part
of
the reusability picture.
Reusable designs, data
structures, test
cases,
tutorial
information, work breakdown
structures, and HELP text
are also
reusable
and should be packaged
together with the code they
support.
Programmers
and software engineers
who
Patterns
as defect prevention
have
developed large numbers of
software applications tend to be
aware
that
certain sequences of code occur
many times in many
applications.
Some
of these sequences include validating
inputs to ensure that
error
522
Chapter
Eight
conditions
such as having character
data entered into a numeric
field is
rejected,
or that text and numeric
strings do not contain more
characters
than
specified by the application's
design.
Patterns
gained via personal
experience are of course
reusable even
if
informal and personal.
However, it has become clear
that this kind
of
knowledge occurs so often
that it could be written
down, illustrated
graphically,
and then used to train
new software engineers as
they learn
trade
craft.
Pattern-based
development has the
potential of lowering defect
poten-
tials
of young and inexperienced
developers by more than 50
percent.
Once
standard patterns are widely
published and available,
they can also
serve
to facilitate career changes
from one kind of software to
another.
For
example, there are very
different kinds of patterns
associated with
embedded
applications than with information
technology applications.
What
is lacking for pattern-based
development circa 2009 is an
effec-
tive
taxonomy that can be used to
catalog the patterns and
aid in select-
ing
the appropriate set of
patterns. Also, there is no
exact knowledge of
how
many patterns are likely to
be useful and valuable. In
the future,
pattern
usage will no doubt be classed as a
best practice, although
doing
so
in 2009 is probably a few
years premature.
Individual
software engineers working in a
narrow range of
applica-
tions
probably utilize from 25 to 50
common patterns centering in
input
and
output validation, error
handling, and perhaps
security-related
topics.
But when all types and
forms of software are
included, such as
financial
applications, embedded applications,
web applications,
operat-
ing
systems, compilers, avionics,
and so on, the total
number of useful
patterns
could easily top 1000.
This is too large a number
to be listed
randomly,
so patterns need to be organized if
they are to become
useful
tools
of the trade.
Participation
in formal inspections
Inspections
as defect prevention
turns
out to be equally effective as a
defect-prevention method and
a
defect-removal
method. Participants in formal
inspections spontane-
ously
avoid making the kinds of
mistakes that are found
during the
inspection
sessions. Therefore, after participating
in a number of inspec-
tions,
coding defects tend to be
reduced by more than 80
percent com-
pared
with the volumes encountered
prior to starting to use
inspections.
As
a result, formal inspections
get double counted as best
practices: they
are
highly effective for both
defect prevention and defect
removal.
Inspections
turn out to be so effective in
terms of defect
prevention
that
long-range usage of inspections
has a tendency to become
boring
for
the participants due to a
lack of interesting bugs or
defects after
about
a year of inspections. (Unfortunately,
some companies stop
using
inspections,
so defect volumes begin to
creep upwards again.)
Programming
and Code
Development
523
One
other useful aspect of
inspections is that when
novices inspect
the
work of experts, they
spontaneously learn improved
programming
skills.
Conversely, when experts
inspect the work of novices,
they can
provide
a great deal of useful
advice as well as find a
great many bugs
or
defects. Therefore, it is useful to
have several experts or top
software
engineers
as participants in inspections.
Static
analysis is a fairly
Automated
static analysis as defect
prevention
new
technology that is distinct
from testing. Automated
static analy-
sis
tools have embedded rules
and logic that are
set up to discover
common
forms of defects in source code.
These tools are quite
effective
and
have defect removal
efficiency levels that top
85 percent. A caveat
is
that only about 50 languages
out of 2500 are supported,
and these
are
primarily modern languages
such as C, C#, C++, Java,
and the
like.
Older and obscure languages
such as MUMPS, Coral, Chill,
and
the
like are not supported.
However, with almost 100
static analysis
tools
available, there are tools
that can handle some
older or special-
ized
languages such as ABAP, Ada,
COBOL, and PL/I. Some of
the
tools
have extensible rules, so in
theory all of the 2500
languages in
existence
might gain access to static
analysis, although this is
unlikely
to
occur.
Because
static analysis tools are
effective at finding bugs in
source
code,
and the static analysis
tools are usually run by
programmers, they
have
a double benefit of also
acting as defect prevention
agents. In other
words,
programmers who carefully
respond to the defects
identified by
automated
static analysis tools will
spontaneously avoid making
the
same
defects in the
future.
As
of 2009, usage of static
analysis counts as a best
practice for sup-
ported
programming languages. The
evidence is already significant
for
defect
removal and is increasing
for defect
prevention.
Static-analysis
tools are widely used by
the open-source
development
community
with good results. Due to the
power and utility of
static
analysis,
usage is expanding and this
method should become a
stan-
dard
activity; in fact, static
analysis should be included in
every pro-
gramming
development and maintenance
environment and should be
a
normal
part of all development and
maintenance methodologies.
The
extreme pro-
Test-based
development (TBD) as defect
prevention
gramming
(XP) method includes
developing test cases prior
to devel-
oping
source code. Indeed, the
test cases are used as an
adjunct to the
requirements
and design of software
applications.
This
method of early test-case
development focuses attention on
qual-
ity,
and therefore TBD gets
double credit as a best
practice for both
defect
prevention
and defect removal. Because
TBD is fairly new,
empirical
524
Chapter
Eight
data
based on large numbers of
trials is not yet available.
The rather
lax
measurement practices of the
Agile community add to the
problem
of
ascertaining the actual
effectiveness of TBD.
However,
from anecdotal evidence, it
appears that TBD may
reduce
defect
potentials by perhaps 30 percent
and raise unit test
defect removal
efficiency
from around 35 percent up to
perhaps 50 percent. Both
results
are
steps in the right
direction, but additional
data on TBD is needed.
TBD
is a candidate for a best
practice and no doubt will be
classed as
one
when additional quantitative
data becomes
available.
One
of the claimed
advantages
High-level
languages as defect
prevention
of
high-level programming languages is
that they reduce defect
poten-
tials.
A related claim is that if
defects do occur, they are
easier to find.
Both
claims appear to be valid,
but the situation is
somewhat compli-
cated,
and there are exceptions to
general rules about the
effectiveness
of
high-level languages.
Any
reduction in source code
volumes will obviously reduce
chances
for
errors. If a specific function
requires 1000 lines of code
in assembly
language,
but can be done with only
150 Java statements, the
odds are
good
that fewer defects will
occur with Java. Even if
both versions have
a
constant ten bugs per
KLOC, the larger assembly
version might have
10
bugs, while the smaller
Java version might have
only 1 or 2.
However,
some high-level programming
languages have fairly
com-
plex
syntax and therefore make it
easy to introduce errors by
accident.
The
APL programming language is an example of
a language that is
very
high level, but also
difficult to read and
understand, and
therefore
difficult
to debug, and especially so if
the person attempting to
debug
is
not the original
programmer.
Observations
indicate the languages with
regular syntax,
mnemonic
labels,
and commands that are
amenable to human understanding
will
have
somewhat fewer coding
defects than languages of
the same level,
but
with arcane commands and
complicated syntax that
include many
nested
commands.
What
would be useful and
interesting would be controlled
studies
by
academic institutions that
measured both defect
densities and
debugging
times for implementing
standard problems in
various
languages.
It would be very interesting to
see defect volumes
and
debugging
times compared for popular
languages such as C, C#,
C++,
Objective
C, Java, JavaScript, Lua,
Ruby, Visual Basic, and
perhaps
50
more. However, as of 2009,
this kind of controlled
study does not
seem
to exist.
As
of 2009, the plethora of
programming languages and
their negative
impact
on maintenance costs make
best practice status for
any specific
language
somewhat questionable.
Programming
and Code
Development
525
Prototypes
as defect prevention For
large and complex
applications, it
may
be necessary to try out a number of
alternative code sequences
before
selecting a best-case alternative for
the final versions.
Prototypes
are
useful in reducing defects in
the final version by
allowing software
engineers
to experiment with alternatives in a
benign fashion.
As
a general rule prototypes
are created mainly for
the most trouble-
some
and complicated pieces of
work. As a result, the size
of typical
prototypes
is only about 5 percent to
perhaps 10 percent of the
size of
the
total application. This
practice of concentrating on the
toughest
problems
makes prototypes useful, and
their compact size keeps
them
from
getting to be expensive in their
own right.
Prototypes
come in two flavors:
disposable and evolutionary. As
the
name
implies, disposable prototypes
are used to try out
algorithms and
code
sequences and then discarded.
Evolutionary prototypes grow
into
the
finished application.
Because
prototypes are usually
developed at high speed in an
experi-
mental
fashion, the disposable
prototypes are somewhat
safer than evo-
lutionary
prototypes. Prototypes may
contain more bugs or defects
than
polished
work, and attempting to
convert them into a finished
product
may
lead to higher than expected
bug counts.
Disposable
prototypes used to try out
alternative solutions or to
experiment
with difficult programming problems
would be defined as
best
practices. However, evolutionary
prototypes that are
carelessly
developed
in the interest of speed are
not best practices, but
instead
somewhat
hazardous.
Professor
Edsger Dijkstra
published
Code
structure as defect
prevention
one
of the most famous letters
in the history of software
engineering
entitled
"Go-to statements considered
harmful." The letter to the
editor
was
published in August 1968 in
The
Communications of the
ACM.
The
thesis of this letter was
that excessive use of
branches or "go to"
statements
made the structure of
software applications so complex
that
errors
of incorrect branch sequences might
occur that were very
difficult
to
identify and remove.
This
letter triggered a revolution in
programming style that came
to
be
known as structured
programming. Under
the principles of
struc-
tured
programming, branches were
reduced and programmers
began to
realize
that complex loops and
clever coding sequences introduced
bugs
and
made the code harder to
test and validate.
As
it happens another pioneering
software engineer, Dr. Tom
McCabe,
developed
a way of measuring code
structure that was published
in
December
1976 in IEEE
Transactions on Software. The
measures devel-
oped
by Dr. McCabe were those of
"cyclomatic complexity" and
"essential
complexity."
526
Chapter
Eight
Cyclomatic
complexity
is based on graph theory and
is a formal way
of
evaluating the complexity of a
graph that describes the
flow of control
through
a software application. The
formula for calculating
cyclomatic
complexity
is "edges nodes + two."
Essential
complexity
is also based on graph
theory, only it
eliminates
redundant
or duplicate paths through
code.
In
terms of cyclomatic complexity, a
code segment with no
branches
has
a complexity score of 1, which indicates
that the code executes in
a
linear
fashion with no branches or go-to
statements. From a
psychologi-
cal
standpoint, cyclomatic complexity
levels of less than 10 are
usually
perceived
as being well structured.
However, as cyclomatic
complexity
levels
rise to greater than 20,
the code segments become
increasingly
difficult
to understand or to follow from
end to end without
errors.
There
is some empirical evidence
that code with cyclomatic
complex-
ity
levels of less than 10 have
only about 40 percent as
many errors as
code
with cyclomatic complexity levels
greater than 20. Code with
a
cyclomatic
complexity level of 1 seems to
have the fewest errors, if
other
factors
are held constant, such as
the programming languages
and the
experience
of the developer.
One
interesting study in IBM found a
surprising result: that
code
defects
were sometimes higher for
the work of senior or
experienced pro-
grammers
compared with the same
volume of code written by
novices
or
new programmers. However,
the actual cause of this
anomaly was
that
the experts were working on
very difficult and complex
applica-
tions,
while the novices were
doing only simple routines
that were easy
to
understand. In any case, the
study indicated that problem
difficulty
has
a significant impact on defect
density levels.
The
importance of cyclomatic and
essential complexity on code
defects
led
to the development of a number of
commercial tools. Many
tools
available
circa 2009 can calculate
cyclomatic and essential
complexity
of
code in a variety of
languages.
In
the 1980s, several tools on
the market were aimed
primarily at
COBOL
and not only evaluated
code complexity, but also
could auto-
matically
restructure the code and
reduce both cyclomatic and
essential
complexity.
These tools asserted, with
some evidence to back up
the
assertions,
that the revised code with
low complexity levels could
be
modified
and maintained with less
effort than the original
code.
Use
of structured programming techniques
and keeping
cyclomatic
complexity
levels low would both be
viewed as best practices.
Code with
low
complexity levels and few
branches tends to have fewer
defects, and
the
defects that are present
tend to be easier to find.
Therefore, struc-
tured
programming counts as a best
practice for defect
prevention.
More
than 50 years of empirical
data
Segmentation
as defect prevention
has
proven conclusively that
defect potentials correlate
almost perfectly
Programming
and Code
Development
527
with
application size measured
using both lines of code
and function
points.
Because size and defects
are closely coupled, it is
reasonable
to
ask, Why
not decompose large systems
into a number of
smaller
segments?
Unfortunately,
this is not as easy as it
sounds. To make an
analogy,
since
constructing an 80,000-ton cruise
ship is known to be
expensive,
why
not decompose the ship
into 80,000 small boats
that are cheap to
build?
Obviously, the features and
user requirements of 80,000
small
boats
are not the same as
those of one large 80,000-ton
cruise ship.
As
of 2009, there are no proven
and successful methods for
segment-
ing
or decomposing large systems
into small independent
components.
As
it happens, the Agile method
of dividing a system into
segments or
sprints
that
can be developed sequentially
has shown itself to be
fairly
successful.
But most of the Agile
applications are below
10,000 function
points
and are comparatively simple
in architecture.
There
have not yet been
any Agile projects that
tackle something of
the
size of Microsoft Vista at
about 150,000 function
points or a large
ERP
package at perhaps 300,000
function points. Indeed, if
Agile sprints
were
used for these applications
and team sizes were in
the range of
average
Agile projects (less than
ten people) then probably
150 sprints
would
be needed for Vista and
300 would be needed for an
ERP pack-
age.
Assuming one month per
sprint, the schedule would
be perhaps 12
years
for Vista and 25 years
for the ERP package.
Multiple teams would
speed
things up, but interfaces
between the code of each
team would
add
complexity and also add
defects.
The
bottom line is that
segmentation into small
independent pack-
ages
or components is effective when it
can be done well, but
not
always
possible given the feature
sets and architecture of
many large
systems.
Thus best practice status
cannot be assigned to
segmenta-
tion
as of 2009, due to the lack
of standard and effective
methods for
segmentation.
For
large applications, segmentation is
most common for major
fea-
tures,
but each of these features
may themselves be in the
range of
10,000
function points or more.
There is not yet any
proven way to
divide
a massive system of 150,000
function points or 15 million
lines
of
code into perhaps 15,000
small independent pieces.
About the best
that
occurs circa 2009 is to
divide these massive systems
into perhaps
ten
large segments.
Methodologies
and measurements as defect
prevention The
Personal
Software
Process (PSP) and Team
Software Process (TSP)
developed
by
Watts Humphrey feature
careful recording of all
defects found during
development,
including the normally
invisible defects found
privately
via
desk checking and unit
testing.
528
Chapter
Eight
The
act of recording specific
defects tends to embed them
in the minds
of
software engineers and
programmers. The result is
that after several
projects
in succession, coding defects
decline by perhaps 40 percent
since
they
are spontaneously
avoided.
Measurements
and methodologies are
therefore useful in terms
of
defect
prevention because they tend to
focus attention on defects
and so
trigger
reductions over time. The
methods that record defects
and focus
on
quality are classed as best
practices.
One
unusual aspect of TSP is
that the results seem to
improve with
application
size. In other words, TSP
operates successfully for
large
systems
in excess of 10,000 function
points. This is a fairly
rare occur-
rence
among development
methods.
Pair
programming as defect prevention
The
idea of pair programming is
for
two software engineers or
programmers to share one
workstation.
They
take turns coding, and
the other member of the
pair observes the
code
and makes comments and
suggestions as the coding
takes place.
The
pair also has discussions on
alternatives prior to actually
doing the
code
for any module or
segment.
The
method of pair programming
has some experimental data
that
suggests
it may be effective in terms of
both defect removal and
defect
prevention.
However, the pair
programming method has so
little usage
on
actual software projects
that it is not possible to
evaluate these
claims
as of 2009 on large-scale
applications.
On
the surface, pair
programming would seem to
come very close to
doubling
the effort required to
complete any given code
segment. Indeed,
due
to normal human tendencies to
chat and discuss social
topics, there
is
some reason to suspect that
pair programming would be
more than
twice
as expensive as individual
programming.
Until
additional information becomes
available from actual
projects
rather
than from small experiments,
there is not enough data to
judge
the
impact of pair programming in
terms of defect removal or
defect
prevention.
The
methods cited earlier in
this
Other
methods as defect
prevention
chapter
have been used enough so
that their effectiveness in
terms of
code
defect prevention can be
hypothesized. Other methods
seem to
have
some benefits in terms of
defect prevention, but they
are harder to
judge.
One of these methods is Six
Sigma as it applies to software.
The
Six
Sigma approach does include
measurements of defects and
analysis
of
causes. However, Six Sigma is usually a
corporate approach that
is
not
applied to specific projects, so it is
harder to evaluate. Other
code
defect
prevention techniques that
may be beneficial but for
which the
Programming
and Code
Development
529
author
has no solid data include
quality function deployment
(QFD),
root-cause
analysis, the Rational
Unified Process (RUP), and
many of
the
Agile development
variations.
Although
Combinations
and synergies among defect
prevention methods
the
methods cited earlier may
occur individually, they are
often used in
combinations
that sometimes appear
synergistic. For example,
struc-
tured
coding is often used with
TSP, with inspections, and with
static
analysis.
The
most frequent combination is
the pairing of high-level
program-
ming
languages with the concepts of
structured programming.
The
combination
that tends to yield the
highest overall levels of
defect pre-
vention
would be methodologies such as
TSP teamed with
high-level
programming
languages, certified reusable
code, patterns,
prototypes,
static
analysis, and
inspections.
Overriding
all other aspects of defect
prevention and defect
removal,
individual
experience and skill levels
of the software engineers
continue
to
be a dominant factor. However, as of
2009, the software
engineering
field
lacks standard methods for
evaluating human performance; it
has
no
licensing or certification, no board
specialties, and no methods
of
judging
professional malpractice. Therefore,
expertise among
software
engineers
is important but difficult to
evaluate.
Summary
of Observations
on
Defect Prevention
Because
of the difficulty and
uncertainty of measuring defect
preven-
tion,
the suite of defect
prevention methods lacks the
large volumes of
solid
statistical data associated with
defect removal.
Personal
defect prevention is especially
difficult to study
because
most
of the activities are
private and therefore seldom
have records or
statistical
information available, other
than data kept by
volunteers.
Long-range
measurements over time and
involving hundreds of
appli-
cations
and software engineers give
some strong indications of
what
works
in terms of defect prevention,
but the results are
still less than
precise
and will probably stay that
way.
Forms
of Programming Defect
Removal
There
is very good data available
on the public forms of
defect removal
such
as formal inspections, function
test, regression test,
independent
verification
and validation, and many
others. But private defect
removal
is
another story. The phrase
private
defect removal refers
to activities
that
software engineers or programmers
perform by themselves
without
witnesses
and usually without keeping
any written records.
530
Chapter
Eight
The
major forms of private
defect removal include, but
are not limited
to:
1.
Desk checking
2.
Debugging using automated
tools
3.
Automated static
analysis
4.
Subroutine testing
5.
Unit testing (manual)
6.
Unit testing (automated)
Since
most of these defect removal
methods are used in private,
data
to
judge their effectiveness
comes from either volunteers
who keep
records
of bugs found, or from
practitioners of methods that
include
complete
records of all defects, such
as PSP and TSP.
Automated
static analysis is a method
that happens to be used
both
privately
by individual programmers on their
own code, and also
pub-
licly
by open-source developers who
are working collaboratively
on
large
applications such as Firefox,
Linux, and the like.
Therefore, static
analysis
has substantial data
available for its public
uses, and it can be
assumed
that private use of static
analysis will be equally
effective.
In
the early days of
programming and
Desk
checking for defect
removal
computing,
the time lag between
writing source code and
getting it
assembled
or compiled was sometimes as
much as 24 hours. When
pro-
gram
source code was punched
into cards and the
cards were then
put
in
a queue for assembly or
compilation, many hours
would go by before
the
code could be executed or
tested.
In
these early days of
programming between the late
1960s and the
1970s,
desk checking or carefully
reading the listing of a
program to
look
for errors was the
most common method of
personal defect
removal.
Desk
checking was also a
technical necessity because errors in a
deck
of
punch cards could stop
the assembly or compilation
process and add
perhaps
another 24 hours before
testing could
commence.
Today
in 2009, code segments can
be compiled or interpreted
instantly,
and
can be executed instantly as
well. Indeed, they can be
executed
using
programming environments that
include debugging tools
and
automated
static analysis. Therefore,
desk checking has declined
in
frequency
of usage due to the
availability of personal workstations
and
personal
development environments.
Although
there is not much in the
way of recent data on the
effective-
ness
of desk checking, historical
data from 30 years ago
indicates about
40
percent to just over 60
percent in terms of defect
removal efficiency
levels.
Programming
and Code
Development
531
Today
in 2009, desk checking is
primarily reserved for a
small subset
of
very tricky bugs or defects
that have not been
successfully detected
and
removed via other methods.
These include security
vulnerabilities,
performance
problems, and sometimes
toxic requirements that
have
slipped
into source code. These
are hard to detect via
static analysis or
normal
testing because they may not
involve overt code errors
such as
branches
to incorrect locations or boundary
violations.
These
special and unique bugs
compose only about 5 percent
of total
numbers
of bugs likely to be found in
software applications. Deck
check-
ing
is actually close to 70 percent in
dealing with these very
troublesome
bugs
that have eluded other
methods. (The reason that
desk checking is
not
higher is because sometimes software
engineers don't realize
that
a
particular code practice is
wrong. This is why proofreading of
manu-
scripts
is needed. Authors cannot
always see their own
mistakes.)
While
these subtle bugs can be
detected using formal
inspections,
formal
inspections do not occur on
more than about 10 percent
of soft-
ware
applications and require
between three and eight
participants.
Desk
checking, on the other hand,
is a one-person activity that
can be
performed
at any time with no formal
preparation or training.
Desk
checking in 2009 is a supplemental
method that may not
be
needed
for every software project.
It is effective for a number of
subtle
bugs
and might be viewed as a
best practice on an as-needed
basis.
Software
engineers and pro-
Automated
debugging for defect
removal
grammers
circa 2009 have access to
hundreds of debugging tools.
These
tools
normally support either
specific programming languages
such as
Java
and Ruby or specific
operating systems such as
Linux, Leopard,
Windows
Vista, and many others. In
any case, a great many
debugging
tools
are available.
The
features of debugging tools
vary, but all of them
allow the execu-
tion
of code to be stopped at various
places; they allow changes
to code;
and
they may include features to
look for common problems
such as
buffer
overflows and branching
errors. Beyond that, the
specialized
debugging
tools have a number of
special features that are
relevant to
specific
languages or operating
systems.
Debugging
tools are so common that
usage is a standard
practice
and
therefore would be classed as a
best practice. That being
said,
none
are 100 percent effective,
and quite a few bugs
can escape. In fact,
given
the numbers of bugs found
later via inspections,
static analysis,
and
testing, the average
efficiency of program debugging is
only about
30
percent or less.
Automated
static analysis for defect
removal Static
analysis tools examine
source
code and the paths
through the code and
look for common
errors.
532
Chapter
Eight
Some
of these tools have built-in
sets of rules, while others
have exten-
sible
rule sets.
A
keyword search of the Web
using "automated static
analysis" turns
up
more than 100 such
tools including Axivion,
CAST, Coverity,
Fortify,
GrammaTeck,
Klocwork, Lattix, Ounce,
Parasoft, ProjectAnalyzer,
ReSharper,
SoArc, SofCheck, Viva64,
Understand, Visual Studio
Team
System,
and XTRAN.
Individually,
each static analysis tool
supports up to 30 languages.
For
common
languages such as Java and
C, dozens of static analysis
tools
are
available; for older
languages such as Ada,
Jovial, and PL/I,
there
are
only a few static analysis
tools. For very specialized
languages such
as
ABAP used for writing
code in SAP environments,
there are only one
or
two static analysis
tools.
Without
doing an exhaustive search, it
appears that out of the
current
total
of 2500 programming languages
developed to date, static
analysis
tools
are available for perhaps 50
programming languages.
However,
some
of these static analysis
tools support extensible
rules, so it is theo-
retically
possible to create rules for
examining all of the 2500
languages.
This
is unlikely to occur, due to
economic reasons for obscure
languages
or
those not used for
business or scientific
applications.
As
a class, static analysis
tools seem to be effective
and can find
per-
haps
85 percent of common programming
errors. Therefore, usage
of
static
analysis tools can be viewed
as a best practice; rapidly
becoming
a
standard practice,
too.
However,
static analysis tools only
find coding problems and do
not
find
toxic requirements, performance
problems, user interface
problems,
and
some kinds of security
vulnerabilities. Therefore, additional
forms
of
defect removal are
needed.
Some
static analysis tools
provide additional features
besides defect
detection.
Some are able to assist in
translating older languages
into
newer
languages, such as turning
COBOL into Java if
desired.
It
is also possible to raise
the level of static analysis
and examine the
meta-languages
underlying several forms of
requirements and
design
documentation
such as those created via
the unified modeling
language
(UML).
Indeed, it is theoretically possible to
use a form of
extended
static
analysis to create test
suites.
Because
static analysis and formal
code inspections usually
find many
of
the same kinds of bugs,
normally either one form or
the other is uti-
lized,
but not both. Static
analysis and inspections
have roughly the
same
levels of defect removal
efficiency, but static
analysis is cheaper
and
quicker. However, code
inspections can find more
subtle problems
such
as performance issues or security
vulnerabilities. These are
not
code
"bugs" per se, but
they do cause
trouble.
If
static analysis and code
inspections are both
utilized, which
occurs
for
mission-critical applications such as
some medical instruments
and
Programming
and Code
Development
533
some
kinds of security and
military software, static
analysis would nor-
mally
come before code
inspections.
A
small number of issues
identified by static analysis
tools turn out
to
be false positives, or code
segments identified as bugs
which turn out
to
be correct. However, a few
false positives is a small
price to pay for
such
a high level of defect
removal efficiency.
Testing
comes in many flavors
and
Subroutine
testing for defect
removal
covers
many different sizes of code
volumes. The phrase
subroutine
testing
refers
to a small collection of perhaps up to
ten source code
instructions
that produces an output or
performs an action that
needs
to
be verified. Subroutine testing is
usually the lowest level of
testing
in
terms of code
volumes.
By
contrast, unit testing would
normally include perhaps 100
instruc-
tions
or more, while the "public"
forms of testing such as
function testing
and
regression testing may deal
with thousands of instructions.
As
the volume of source code
increases, paths through the
code
increase,
and therefore more and
more test cases are
needed to actu-
ally
cover 100 percent of the
code. Indeed, for very large
systems,
100
percent coverage appears to be
impossible, or at least very
rare.
Subroutine
testing is a standard practice
and also a best
practice
because
it eliminates a significant number of
problems. However,
the
defect
removal efficiency of subroutine
testing is only 30 percent
to
perhaps
40 percent. This is because the
code volumes are too
small for
detecting
many kinds of bugs such as
branching errors.
Subroutine
testing may or may not
use actual formal test
cases. The
usual
mode is to execute the code
and check the outputs
for validity.
Subroutine
test cases, if any, are
normally disposable.
Unit
testing of complete
modules
Manual
unit testing for defect
removal
is
the largest form of testing
that is normally private or
carried out by
individual
programmers without the
involvement of other
personnel
such
as test specialists or software
quality assurance.
Manual
unit testing is the first
and oldest kind of formal
testing.
Indeed,
in the 1960s and early
1970s, when many
applications only
contained
100 code statements or so,
unit testing was often
the only
form
of testing performed.
The
phrase unit
testing refers
to testing a complete module of
perhaps
100
code statements that
performs a discrete function with
inputs, out-
puts,
algorithms, and logic that
need to be validated.
Unit
testing can combine "black
box" testing and "white
box" test-
ing.
The phrase black
box means
that the internal code of a
module
is
hidden, so only inputs and
outputs are visible. Black
box testing
therefore
tests input and output
validity. The phrase
white
box means
that
internal code is revealed, so
branches and control flow
through
534
Chapter
Eight
an
application can be tested.
Combining the two forms of
testing should
in
theory test everything.
However, code coverage
seldom hits 100
per-
cent,
and for large applications
that are high in cyclomatic
complexity
it
may drop below 50
percent.
Unit
testing tends to look at
limits, ranges of values,
error-handling,
and
security-related issues. Unfortunately,
unit testing is only in
the
range
of perhaps 30 percent to 50 percent
efficient in finding bugs.
For
example,
unit testing is not able to
find many performance-related
issues
because
they typically involve
longer paths and multiple
modules.
For
modules that tend to include
a number of branches or
complex
flows,
unit testing begins to
encounter problems with test
coverage. As
cyclomatic
complexity levels go up, it
takes more and more
test cases
to
cover every path. In fact,
100 percent coverage almost
never occurs
when
cyclomatic complexity levels
get above 5, even for
modules with
only
100 code statements.
Unit
testing is a standard activity
for software engineering
and
therefore
counts as a best practice in
spite of the somewhat low
defect
removal
efficiency. Without unit
testing, the later stages of
testing such
as
function testing, stress
testing, component testing,
and system test-
ing
would not be
possible.
The
test cases created for
unit testing are normally
placed in a formal
test
library so that they can be
used later for regression
testing. Since
the
test cases are going to be
long-lived and used
repeatedly, they need
proper
identification as to what applications
and features they test,
what
functions
they test, when they
were created, and by whom.
There will
also
be accompanying test scripts
that deal with invoking and
executing
the
test cases. The specifics of
formal test case design
are outside the
scope
of this book, but such
topics are covered in many
other books.
Unit
testing can be used in
conjunction with other forms of
defect
removal
such as formal code
inspections and static
analysis. Usually,
static
analysis would be performed
prior to unit testing, while
code
inspections
would be performed after
unit testing. This is because
static
analysis
is quick and inexpensive and
finds many bugs that
might be
found
via unit testing. Unit
testing is done prior to
code inspections for
the
same reason; it is faster
and cheaper. However, code
inspections are
very
effective at finding subtle
issues that elude both
static analysis and
unit
testing, such as security
vulnerabilities and performance
issues.
Using
code inspections, static
analysis, and unit testing
for the same
code
is a fairly rare occurrence
that most often occurs on
mission-critical
applications
such as weapons systems,
medical instruments, and
other
software
applications where failure
might cause death or
destruction.
Manual
unit testing was a normal
and standard activity for
more than
40
years and is still very
widespread. However, performance of
units varies
from
"poorly performed" to "extremely
good." Because of the
inconsistencies
Programming
and Code
Development
535
in
methods of carrying out unit
testing and in testing
results, the ranges
are
too wide to say that
unit testing per se is a
best practice. Careful
unit
testing
with both black box and
white box test cases
and thoughtful consid-
eration
to test coverage would be considered a
best practice. Careless
unit
testing
with hasty test cases and
partial coverage would rank no
better
than
marginally adequate and
would not be a best
practice.
Testing
is a teachable skill, and
there are many classes
available by
both
academia and commercial test
companies. There are also
several
forms
of certification for test
personnel. It would be useful to
know if
formal
test training and
certification elevated test
defect removal effi-
ciency
by significant amounts. There is
considerable anecdotal
evidence
that
certification is beneficial, but
more large-scale surveys and
studies
are
needed on this topic.
While
manual unit testing
has
Automated
unit testing for defect
removal
been
part of software engineering since
the 1960s, automated unit
testing is
newer
and started to occur only in
the 1980s in response to larger and
more
complex
applications plus the
arrival of graphical user
interfaces (GUI),
which
greatly expanded the nature
of software inputs and
outputs.
The
phrase "automated unit
testing" is somewhat ambiguous
circa
2009.
The most common usage of
the term implies manual
creation of
unit
test cases combined with a
framework or scaffold that
allows them
to
be run automatically on a regular
basis without explicit
actions by
software
engineers.
Automated
unit testing has been
adopted by the Agile and
extreme
programming
(XP) communities together with
the corollary idea of
cre-
ating
test cases before creating
code. This combination seems
to be fairly
effective
in terms of defect removal
and also pays off with
improved
defect
prevention by focusing the
attention of software engineers
on
quality
topics.
The
phrase automated
unit testing deals
mainly with test case
execu-
tion
and recording of defects
that are encountered: most
of the test cases
are
still created by hand.
However, it is theoretically possible to
envision
automated
test case creation as
well.
Recall
from Chapter 7 that during
requirements gathering and
analy-
sis,
seven fundamental topics and
30 supplemental topics need to
be
considered.
As it happens, these same 37
issues also need to be
tested.
A
form of static analysis
elevated to execute against
requirements and
specification
meta-languages should, in theory, be
able to produce a
suite
of test cases as a
byproduct.
Some
forms of test automation are
aimed at web applications;
others are
aimed
at embedded applications; and
still others are aimed at
information
technology
products. Automated testing is an
emerging technology that
as
of
2009 is still rapidly
evolving.
536
Chapter
Eight
There
is a shortage of solid empirical
data that compares
automated
unit
testing and manual unit
testing in a side-by-side fashion
for appli-
cations
of similar size and
complexity. Anecdotal information
gives an
edge
to automated testing for speed
and convenience. However,
the most
critical
metric for testing is that
of defect removal efficiency. As
this
book
is written, there is not
enough solid data that
compares automated
unit
testing to the best forms of
manual unit testing to judge
whether
automated
unit tests have higher
levels of defect removal
efficiency
than
manual unit tests.
As
additional data becomes
available, there is a good chance
that
automatic
unit testing will enter the
best practice class. As of
2009, the
data
shows some effort and
cost benefits, but defect
removal efficiency
benefits
remain uncertain.
About
40 percent of the
software
Defect
removal for legacy
applications
engineers
in the world are faced with
performing maintenance on
aging
legacy
applications that they did
not create themselves.
Although the
legacy
applications may be old,
they are far from
trouble free, and
they
still
contain latent bugs or
defects.
This
situation brings up a number of
questions about defect
removal for
legacy
code where the original
developers are gone, the
specifications may
be
missing or out of date,
comments may be sparse or
incorrect, regression
tests
are of unknown completeness,
and the code itself
may be in a dead
language
or one the current maintenance
team has not
used.
Fortunately,
a number of companies and
tools have addressed
the
issues
of maintaining aging legacy
code. Some of these
companies have
developed
"maintenance workbenches" that
include features such
as:
1.
Automated static
analysis
2.
Automated test coverage
analysis
3.
Automated function point
calculations
4.
Automated cyclomatic and
essential complexity
calculations
5.
Automated debugging support
for many (but not
all) languages
6.
Automated data mining for
business rules
7.
Automated translation from
dead languages to newer
languages
With
aging legacy applications
being written in as many as
2500 dif-
ferent
programming languages, no single
tool can provide universal
sup-
port.
However, for legacy code
written in the more common
languages
such
as Ada, COBOL, C, PL/I, and
the like, a number of
maintenance
tools
are available.
Usage
of maintenance workbenches as a class
counts as a best
prac-
tice,
but there are too
many tools and variations to
identify specific
Programming
and Code
Development
537
workbenches.
Also, these tools are
evolving fairly rapidly, and
new fea-
tures
occur frequently.
The
methods
Synergies
and combinations of personal
defect removal
discussed
in this section are used in
combination rather than
alone.
Debugging,
automated static analysis,
and unit testing form
the most
common
combination. The combined
effectiveness of these three
meth-
ods
can top 97 percent in terms
of defect removal efficiency
when per-
formed
by experienced software engineers.
The combined results
can
also
drop below 85 percent when
performed by novices.
Summary
and Conclusions on
Personal
Defect Removal
Although
personal defect removal
activities are private and
therefore
difficult
to study, they have been
the frontline of defense
against soft-
ware
defects for more than 50
years. That being said,
the fact that
soft-
ware
defects emerge and are
still present when software
is delivered
indicates
that none of the personal
defect removal methods are
100
percent
effective.
However,
some of the newer defect
removal tools such as
automated
static
analysis are improving the
situation and adding rigor
to the suite
of
personal defect removal
tools and methods.
Since
individual software engineers
can keep records of the
bugs they
find,
it would be useful and
valuable if personal defect
removal effi-
ciency
levels could be elevated up to
more than 90 percent before
the
public
forms of defect removal
begin.
Personal
defect removal will continue to
have a significant role
as
software
engineering evolves from a
craft to a true engineering
dis-
cipline.
Knowing the most effective
and efficient ways for
preventing
and
removing defects is a sign of
software engineering
professionalism.
Lack
of defect measures and
unknown levels of defect
removal efficiency
imply
amateurishness; not
professionalism.
Economic
Problems of the
"Lines
of Code" Metric
Introduction
Any
discussion of programming and
code development would be
incom-
plete
without considering the
famous lines of code (LOC)
metric, which
has
been used to measure both
productivity and quality
since the dawn
of
the computer era.
538
Chapter
Eight
The
LOC metric was first
introduced circa 1960 and
was used for
economic,
productivity, and quality
studies. At first the LOC
metric was
reasonably
effective for all three
purposes.
As
additional higher-level programming
languages were created,
the
LOC
metric began to encounter
problems. LOC metrics were
not able to
measure
noncoding activities such as
requirements and design,
which
were
becoming increasingly
expensive.
These
problems became so severe
that a controlled study in
1994
that
used both LOC metrics
and function point metrics
for ten versions
of
the same application coded in
ten languages reached an
alarming
conclusion:
LOC metrics violated the
standard assumptions of
economic
productivity
so severely that using LOC
metrics for studies
involving
more
than one programming
language constituted professional
mal-
practice!
Such
a strong statement cannot be
made without examples and
case
studies
to show the LOC problems.
Following is a chronology of the
use
of
LOC metrics that shows
when and why the metric
began to cease
being
useful and start being
troublesome. The chronology
runs from
1960
to the present day, and it
projects some ideas forward
to 2020.
Lines
of Code Metrics Circa
1960
The
lines of code (LOC) metric
for software projects was
first introduced
circa
1960 and was used
for economic, productivity,
and quality studies.
The
economics of software applications
were measured using
"dollars
per
LOC." Productivity was
measured in terms of "lines of
code per time
unit."
Quality was measured in
terms of "defects per KLOC"
where "K"
was
the symbol for 1000
lines of code. The LOC
metric was reasonably
effective
for all three
purposes.
When
the LOC metric was
first introduced, there was
only one pro-
gramming
language, basic assembly
language. Programs were
small
and
coding effort composed about
90 percent of the total
work. Physical
lines
and logical statements were
the same thing for
basic assembly
language.
In
this early environment, the
LOC metric was useful
for economic,
productivity,
and quality analyses. The
LOC metric worked fairly
well
for
a single language where
there was little or no
reused code and
where
there
were no significant differences
between counts of physical
lines
and
counts of logical statements. But
the golden age of the LOC
metric,
where
it was effective and had no
rivals, only lasted about
ten years.
However,
this ten-year span was
time enough so that the
LOC metric
became
firmly embedded in the
psychology of software engineering.
Once
an
idea becomes firmly fixed, it
tends to stay in place until
new evidence
becomes
overwhelming. Unfortunately, as the
software industry
changed
Programming
and Code
Development
539
and
evolved rapidly, the LOC
metric did not change. As
time passed,
the
LOC metric became less
and less useful until by
about 1980 it had
become
extremely harmful without
very many people realizing
it. Due to
cognitive
dissonance, the LOC metric
was used but not
examined criti-
cally
in the light of changes in
other software engineering
methods.
Lines
of Code Metrics Circa
1970
By
1970, basic assembly had
been supplanted by
macro-assembly.
The
first generation of higher-level
programming languages such
as
COBOL,
FORTRAN, and PL/I was
starting to be used. Usage of
basic
assembly
language was beginning to
drop out of use as better
alterna-
tives
became available. This was
perhaps the first instance
of a long
series
of programming languages that
died out, leaving a train of
aging
legacy
applications that would be
difficult to maintain as
programmers
and
compilers stopped being
available who were familiar
with the dead
languages.
The
first known problem with LOC
metrics was in 1970, when
many
IBM
publication groups exceeded
their budgets for that
year. It was
discovered
(by the author) that
technical publication group
budgets
had
been based on 10 percent of
the budgets assigned to
programming
or
coding.
The
publication projects based on
code budgets for assembly
language
did
not overrun their budgets,
but manuals for the
projects coded in
PL/S
(a derivative of PL/I) had
major overruns. This was
because PL/S
reduced
coding effort by half, but
the technical manuals were
as big as
ever.
Therefore, when publication
budgets were set at 10
percent of code
budgets,
and coding costs declined by
50 percent, all of the
publication
budgets
for PL/S projects were
exceeded.
The
initial solution to this
problem at IBM was to give a
formal math-
ematical
definition to language levels.
The level
was
defined as the
number
of statements in basic assembly
language needed to equal
the
functionality
of 1 statement in a higher-level
language. Thus, COBOL
was
a level 3 language because it took
three basic assembly
statements
to
equal one COBOL statement.
Using the same rule,
SMALLTALK is
a
level 18 language.
For
several years before
function points were
invented, IBM used
"equivalent
assembly statements" as the
basis for estimating
noncode
work
such as user manuals.
(Indeed, a few companies
still use equiva-
lent
assembly language even in
2009.)
Thus,
instead of basing a publication
budget on 10 percent of
the
effort
for writing a program in
PL/S, the budget would be
based on 10
percent
of the effort if the code
were basic assembly
language. This
method
was crude but reasonably
effective. This method
recognized that
540
Chapter
Eight
not
all languages required the
same number of lines of code
to deliver
specific
functions.
However,
neither IBM customers nor IBM
executives were
comfort-
able
with the need to convert the
sizes of modern languages
into the
size
of an antique language for
cost-estimating purposes. Therefore,
a
better
form of metric was felt to
be necessary.
The
documentation problem plus
dissatisfaction with the
equivalent
assembler
method were two of the
reasons IBM assigned Allan
Albrecht
and
his colleagues to develop
function point metrics.
Additional very
powerful
programming languages such as APL
were starting to
appear,
and
IBM wanted both a metric and
an estimating method that
could
deal
with noncoding work as well as
coding in an accurate
fashion.
The
use of macro-assembly language
had introduced code reuse,
and
this
caused measurement problems,
too. It raised the issue of
how to
count
reused code in software
applications, or how to count
any other
reused
material for economic
purposes.
The
solution here was to
separate productivity into
two discrete topics:
1.
Development
productivity
2.
Delivery
productivity
The
former, development productivity,
dealt with the code and
materi-
als
that had to be constructed
from scratch in the
traditional way.
The
latter, delivery productivity,
dealt with the final
application as
delivered,
including reused material.
For example, using
macro-assem-
bly
language, a productivity rate
for development
productivity might
be
300 lines of code per
month. But due to reusing
code in the form of
macro
expansions, delivery
productivity might
be as high as 750
lines
of
code per month.
This
is an important business distinction
that is not well
understood
even
in 2009. The true goal of
software engineering is to improve
the
rate
of delivery productivity. Indeed, it is
possible for delivery
productiv-
ity
to rise while development
productivity declines!
This
might occur by carefully
crafting a reusable code
module and
certifying
it to zero-defect quality levels.
Assume a 500line code
module
is
developed for widespread
reuse. Assume the module
was carefully
developed,
fully inspected, examined
via static analysis, and
fully tested.
The
module was certified to be of
zero-defect status.
This
kind of careful development
and certification might
yield a net
development
productivity rate of only
100 lines of code per
month, while
normal
development for a single-use
module would be closer to
500 lines
of
code per month. Thus, a
total of five months instead
of a single month
of
development effort went to
creating the module. This is
of course a
very
low rate of development
productivity.
Programming
and Code
Development
541
However,
once the module is certified
and available for reuse,
assume
that
utilizing it in additional applications
can be done in only one
hour.
Therefore,
every time the module is
utilized, it saves about one
month
of
custom development!
If
the module is utilized in
only five applications, it will
have paid for
its
low development productivity.
Every time this module is
used, its
effective
delivery productivity rate is
equal to 500 lines of code
per hour,
or
about 66,000 lines of code
per month!
Thus,
while the development
productivity
of the module dropped
down
to
only 100 lines of code
per month, the delivery
productivity
rate is
equivalent
to 66,000 lines of code per
month. The true economic
value
of
this module does not
reside in how fast it was
developed, but rather
in
how many times it can be
delivered in other applications because
it
is
reusable.
To
be successful, reused code
needs to approach or achieve
zero-defect
status.
It does not matter what
the development speed is, if once
com-
pleted
the code can then be
used in hundreds of
applications.
As
service-oriented architecture (SOA)
and software as a
service
(SaaS)
approach, their goal is to
make dramatic improvements in
the
ability
to deliver software features.
Development speed is comparatively
unimportant
so long as quality approaches
zero-defect levels.
Returning
to the historical chronology,
another issue shared
between
macro-assembly
language and other new
languages was the
difference
between
physical lines of code and
logical statements. Some
languages,
such
as Basic, allowed multiple
statements to be placed on a
physical
line.
Other languages, such as
COBOL, divided some logical
statements
into
multiple physical lines. The
difference between a count of
physical
lines
and a count of logical
statements could differ by as
much as 500
percent.
For some languages, there
would be more physical lines
than
logical
statements, but for other
languages, the reverse was
true. This
problem
was never fully resolved by
LOC users and remains
trouble-
some
even in 2009.
Due
to the increasing power and
sophistication of high-level
program-
ming
languages such as C++,
Objective C, SMALLTALK, and
the like,
the
percentage of project effort
devoted to coding was
dropping from
90
percent down to about 50
percent. As coding effort
declined, LOC metrics
were
no longer effective for economic,
productivity, or quality
studies.
After
function point metrics were
developed circa 1975, the
defini-
tion
of language level was
expanded to include the
number of logical
code
statements equivalent to 1 function
point. COBOL, for
example,
requires
about 105 statements per
function point in the
procedure and
data
divisions.
This
expansion is the mathematical
basis for backfiring,
or
direct
conversion
from source code to function
points. Of course,
individual
542
Chapter
Eight
programming
styles make backfiring a
method with poor accuracy
even
though
it remains widely used for
legacy applications where
code exists
but
specifications may be
missing.
There
are tables available from
several consulting companies
such as
David
Consulting, Gartner Group,
and Software Productivity
Research
(SPR)
that provide values for
source code statements per
function point
for
hundreds of programming
languages.
In
1978, A.J. Albrecht gave a
public lecture on function
point metrics
at
a joint IBM/SHARE/GUIDE conference in
Monterey, California.
Soon
after
this, function points
started to be published in the
software litera-
ture.
IBM customers soon began to use
function points, and this
led to
the
formation of a function point
user's group, originally in
Canada.
Lines
of Code Metrics Circa
1980
By
about 1980, the number of
programming languages had
topped 50,
and
object-oriented languages were
rapidly evolving. As a result,
soft-
ware
reusability was increasing
rapidly.
Another
issue that surfaced circa
1980 was the fact
that many appli-
cations
were starting to use more
than one programming language,
such
as
COBOL and SQL. The
trend for using multiple
languages in the same
application
has become the norm
rather than the exception.
However,
the
difficulty of counting lines of
code with accuracy was
increased when
multiple
languages were used.
About
the middle of this decade,
function point users
organized and
created
the nonprofit International
Function Point Users Group
(IFPUG).
Originally
based in Canada, IFPUG moved to the
United States in the
mid-1980s.
Affiliates in other countries
soon were formed, so that by
the
end
of the decade, function
point user groups were in a
dozen countries.
In
1985, the first commercial
software cost-estimating tool
based on
function
points reached the market,
SPQR/20. This tool supported
esti-
mates
for 30 common programming
languages and also could be
used
for
combinations of more than one
programming language.
This
tool included sizing and
estimating of paper documents
such as
requirements,
design, and user manuals. It
also estimated
noncoding
tasks
including testing and
project management.
Because
LOC metrics were still
widely used, the SPQR/20
tool
expressed
productivity and quality
results using both function
points
and
LOC metrics. Because it was
easy to switch from one
language
to
another, it was interesting to
compare the results using
both func-
tion
point and LOC metrics
when changing from
macro-assembly to
FORTRAN
or Ada or PL/I or
Java.
As
the level of a programming
language goes up, economic
productiv-
ity
expressed in terms of function
points per staff month
also goes up,
Programming
and Code
Development
543
which
matches standard economics. But as
language levels get
higher,
productivity
expressed in terms of lines of
code per month drops
down.
This
reversal by LOC metrics
violates all rules of
standard economics
and
is a key reason for
asserting that LOC metrics
constitute profes-
sional
malpractice.
It
is a well-known law of manufacturing
economics that when
a develop-
ment
cycle includes a high percentage of fixed
costs, and there is a
decline
in
the number of units
manufactured, the cost per
unit will go up.
If
line of code is considered to be a
manufacturing unit and there
is a
switch
from a low-level language to a
high-level language, the
number
of
units will decline. But the
paper documents in the form
of require-
ments,
specifications, and user
documents do not decline.
Instead they
stay
almost constant and have
the economic effect of fixed
costs. This
of
course will raise the cost
per unit. Because this
situation is poorly
understood,
two examples will clarify
the situation.
Suppose
we have an application that
consists of 1000 lines
of
Case
A
code
in basic assembly language.
(We can also assume
that the applica-
tion
is 5 function points.) Assume
the development personnel
are paid
at
a rate of $5000 per staff
month.
Assume
that coding took 1 staff
month and production of
paper docu-
ments
in the form of requirements,
specifications, and user
manuals
also
took 1 staff month. The
total project took 2 staff
months and cost
$10,000.
Productivity expressed as LOC
per staff month is 500.
The cost
per
LOC is $10.00. Productivity
expressed in terms of function
points
per
staff month is 2.5. The
cost per function point is
$2000.
Case
B Assume
that we are doing the
same application using the
Java
programming
language. Instead of 1000
lines of code, the Java
version
only
requires 200 lines of code.
The function point total
stays the same
at
5 function points. Development
personnel are also paid at
the same
rate
of $5000 per staff
month.
In
Case B suppose that coding
took only 1 staff week,
but the produc-
tion
of paper documents remained
constant at 1 staff
month.
Now
the entire project took
only 1.25 staff months
instead of 2 staff
months.
The cost was only
$6250 instead of $10,000.
Clearly economic
productivity
has improved, since we did
the same job as Case A with
a
savings
of $3750. We delivered exactly
the same functions to users,
but
with
much less code and
therefore much less effort,
so true economic
productivity
increased.
When
we measure productivity for
the entire project using
LOC met-
rics,
our rate has dropped
down to only 160 LOC
per month from
the
500
LOC per month shown
for Case A!
544
Chapter
Eight
Our
cost per LOC has
soared up to $31.25 per LOC.
Obviously, LOC
metrics
cannot measure true economic
productivity. Also obviously,
LOC
metrics
penalize high-level languages. In
fact, many studies have
proven
that
the penalty exacted by LOC
metrics is directly proportional to
the
level
of the programming language, with
the highest-level
languages
looking
the worst!
Since
the function point totals of
both Case A and Case B
versions are
the
same at 5 function points,
Case B has a productivity
rate of 4 func-
tion
points per staff month.
The cost per function
point is only $1250.
These
improvements match the rules
of standard economics, because
the
faster and cheaper version
has better results than
the slower more
expensive
version.
What
has happened of course is
that the paperwork portion
of the
project
did not decline even
though the code portion
declined substan-
tially.
This is why LOC metrics are
professional malpractice if
applied
to
compare projects that used
different programming languages.
They
move
in the opposite direction
from standard economic
productivity
rates
and penalize high-level
languages. Table 8-7
summarizes both
Case
A and Case B.
As
can be seen by looking at Cases A and B
when they are side by
side,
LOC
metrics actually reverse the
terms of the economic
equation and
make
the large, slow, costly
version look better than
the small, quick,
cheap
version.
It
might be said that the
reversal of productivity with LOC
metrics
is
because paperwork was aggregated with
coding. But even when
only
coding
by itself is measured, LOC
metrics still violate
standard eco-
nomic
assumptions.
TABLE
8-7 Comparing
Low-Level and High-Level
Languages
Case
A
Case
B
Difference
Language
Assembly
Java
Lines
of code (LOC)
1000
200
800
Function
points
5.00
5.00
0
Monthly
compensation
$5,000.00
$5,000.00
$0.00
Paperwork
effort (months)
1.00
1.00
0
Coding
effort (months)
1.00
0.25
0.75
Total
effort (months)
2.00
1.25
0.75
Project
cost
$10,000.00
$6,250.00
$3,750.00
LOC
per month
500
160
340
Cost
per LOC
$10.00
$31.25
$21.25
Function
points per month
2.50
4.00
1.5
Cost
per function point
$2,000.00
$1,250.00
$750.00
Programming
and Code
Development
545
The
1000 LOC of assembly code
was done in 1 month at a
rate of 1000
LOC
per month. The pure
coding cost was $5000 or
$5.00 per LOC.
The
200 LOC of Java code was
done in 1 week, or 0.25
month.
Converted
into a monthly rate, that is
only 800 LOC per
month. The
coding
cost for Java was
$1250, so the cost per
LOC was $6.25.
Thus,
Java costs more per
LOC than assembly, even
though Java took
only
one-fourth the time and
one-fourth the cost! When
you try and
measure
the two different languages
using LOC, assembly looks
better
than
Java, which is definitely a
false conclusion. Table 8-8
shows the
comparison
between assembly and Java
for coding only.
In
real economic terms, the
Java code only cost
$1250 while the
assem-
bly
code cost $5000. Obviously,
Java has better economics
because the
same
job was done for a
savings of $3750.
But
the Java LOC production
rate is lower than assembly,
and the
cost
per LOC has jumped
from $5.00 to $6.25! From an
economic stand-
point,
variations in LOC per month
and cost per LOC
are unimportant
if
there is a major difference in
how much code is needed to
complete
an
application.
Unfortunately,
LOC metrics end up as
professional malpractice no
matter
how you use them if
you are trying to measure
economic pro-
ductivity
between unlike programming
languages. By contrast, the
Java
code's
cost per function point
was $250, while the
assembly code's cost
per
function point was $1000,
and this matches the
assumptions of
standard
economics.
Function
point production for Java
was 20 function points per
staff
month
versus only 5 function
points per staff month
for assembly. Thus,
function
points match the assumptions
of standard economics
while
LOC
metrics violate standard
economics.
Returning
to the main thread, within a
few years, all other
commercial
software
estimating tools would also
support function point
metrics, so
TABLE
8-8 Comparing
Coding for Low-Level and
High-Level Languages
Case
A
Case
B
Difference
Language
Assembly
Java
Lines
of code (LOC)
1000
200
800
Function
points
5.00
5.00
0
Monthly
compensation
$5,000.00
$5,000.00
$0.00
Coding
effort (months)
1.00
0.25
0.75
Coding
cost
$5,000.00
$1,250.00
$3,750.00
LOC
per month
1000
800
200
Cost
per LOC
$5.00
$6.25
$1.25
Function
points per month
5
20
15
Cost
per function point
$1,000.00
$250.00
$750.00
546
Chapter
Eight
that
CHECKPOINT, COCOMO, KnowledgePlan,
Price-S, SEER, SLIM
SPQR/20,
and others could express
estimates in terms of both
function
points
and LOC metrics.
By
the end of this decade,
coding effort was below 35
percent of total
project
effort, and LOC was no
longer valid for either
economic or qual-
ity
studies. LOC metrics could
not quantify requirements
and design
defects,
which now outnumbered coding
defects. LOC metrics could
not
be
used to measure any of the
noncoding activities such as
require-
ments,
design, documentation, or project
management.
The
response of the LOC users to
these problems was
unfortunate:
they
merely stopped measuring
anything but code production
and
coding
defects. The bulk of all
published reports based on
LOC metrics
cover
less than 35 percent of
development effort and less
than 25 per-
cent
of defects, with almost no data
being published on
requirements
and
design defects, rates of
requirements creep, design
costs, and other
modern
problems.
The
history of the LOC metric
provides an interesting example
of
Dr.
Leon Festinger's theory of
cognitive dissonance. Once an
idea
becomes
entrenched, the human mind
tends to reject all evidence
to
the
contrary. Only when the
evidence becomes overwhelming will
there
be
changes of opinion, and such
changes tend to occur
rapidly.
Lines
of Code Metrics Circa
1990
By
about 1990, not only
were there more than
500 programming lan-
guages
in use, but some
applications were written in 12 to 15
different
languages.
There were no international
standards for counting code,
and
many
variations were used
sometimes without being
defined.
In
1991, the first edition of
the author's book Applied
Software
Measurement
included
a proposed draft standard
for counting lines
of
code based on counting
logical statements. One year
later, Bob Park
from
the Software Engineering
Institute (SEI), also
published a pro-
posed
draft standard, only based
on counting physical
lines.
A
survey of software journals by
the author in 1993 found
that about
one-third
of published articles used
physical lines, one-third
used logical
statements,
and the remaining third
used LOC metrics without
even
bothering
to say how they were
counted. Since there is
about a 500 per-
cent
variance between physical
LOC and logical statements
for many
languages,
this was not a good
situation.
The
technical journals that deal
with medical practice and
engineer-
ing
often devote as much as 50
percent of the text to
explaining and
defining
the measurement methods used
to derive the results. The
soft-
ware
engineering journals, on the
other hand, often fail to
define the
measurement
methods at all.
Programming
and Code
Development
547
The
software journals seldom
devote more than a few
lines of text to
explaining
the nature of the
measurements used for the
results. This is
one
of several reasons why the
term "software engineering" is
something
of
an oxymoron. In fact it is not
even legal to use the
term "software
engineering"
in some states and
countries, because software
develop-
ment
is not a recognized engineering
discipline or a licensed
engineering
discipline.
But
there was a worse problem
approaching than ambiguity in
count-
ing
lines of code. The arrival
of Visual Basic introduced a
class of pro-
gramming
languages where counting
lines of code was not
even possible.
This
is because a lot of Visual Basic
"programming" was not done
with
procedural
code, but rather with
buttons and pull-down
menus.
Of
the approximate 2500
programming languages and
dialects in
existence
circa 2009, there are
only effective published
counting rules
for
about 150. About another
2000 are similar to other
languages and
could
perhaps share the same
counting rules. But for at
least 50 lan-
guages
that use graphics or visual
means to augment procedural
code,
there
are no code counting rules
at all. Unfortunately, some of
the lan-
guages
without code counting rules
tend to be most recent
languages
that
are used for web
site development.
In
1994, a controlled study was
done that used both
LOC metrics
and
function points for ten
versions of the same
application written in
ten
different programming languages,
including four
object-oriented
languages.
The
study was published in
American
Programmer in
1994. This
study
found that LOC metrics
violated the basic concepts
of economic
productivity
and penalized high-level and
OO languages due to the
fixed
costs
of requirements, design, and
other noncoding activities.
This was
the
first published study to
state that LOC metrics
constituted profes-
sional
malpractice if used for
economic studies where more
than one
programming
language was
involved.
By
the 1990s most consulting
studies that collected
benchmark and
baseline
data used function points.
There are no large-scale
benchmarks
based
on LOC metrics. The
International Software
Benchmarking
Standards
Group (ISBSG) was formed in
1997 and only publishes
data
in
function point form.
Consulting companies such as
SPR and the
David
Consulting Group also use
function point
metrics.
By
the end of the decade,
some projects were spending
less than 20 per-
cent
of the total effort on
coding, so LOC metrics could
not be used for
the
80
percent of effort outside
the coding domain. The
LOC users remained
blindly
indifferent to these problems
and continued to measure
only
coding,
while ignoring the overall
economics of complete
development
cycles
that include requirements,
analysis, design, user
documentation,
project
management, and many other
noncoding tasks.
548
Chapter
Eight
By
the end of the decade,
noncoding defects in requirements
and
design
outnumbered coding defects
almost 2 to 1. But since
noncode
defects
could not be measured with
LOC metrics, the LOC
literature
simply
ignores them.
Indeed,
still in 2009, debates occur
about the usefulness of the
LOC
metric,
but the arguments
unfortunately are not
solidly grounded in
manufacturing
economics. The LOC
enthusiasts seem to ignore
the
impact
of fixed costs on software
development.
The
main argument of the LOC
enthusiasts is that development
effort
has
a solid statistical correlation to
size measured in terms of
lines of
code.
This is true, but irrelevant
in terms of standard
economics.
If
it takes 1000 lines of C
code to deliver ten function
points to custom-
ers
and the cost was
$10,000, then the cost
per LOC is $10.00.
Assuming
one
month of programming effort,
the productivity rate using
LOC is
1000
LOC per month.
If
the same ten function
points were delivered to
customers in
Objective
C, there might be only 250
lines of code and the
cost might
be
only $2500. The effort
might take only one week
instead of a whole
month.
But the cost per LOC is
unchanged at $10.00 and the
LOC pro-
ductivity
rate is also unchanged at
1000 LOC per
month.
With
LOC metrics, both versions
appear to have identical
productivity
rates
of 1000 LOC per month,
but these are development
rates;
not deliv-
ery
rates.
Since the functionality is
the same for both C
and Objective C
versions,
it is important that the
cost per function point
for C was $1000,
while
for Objective C the cost
per function point was
only $250.
Measured
in terms of function points
per month, the rate
for C was
10,
while the rate for
Objective C increased to 40.
Thus, when measured
correctly,
the economic value of
high-level languages and
delivery rates
are
clearly revealed, while the
LOC metric does not
show either eco-
nomic
or delivery productivity at
all.
Lines
of Code Metrics Circa
2000
By
the end of the century,
the number of programming
languages had
topped
2000 and continues to grow
at more than one new
program-
ming
language per month. Current
rates of new programming
language
development
may approach 100 new
languages per year.
Web
applications are mushrooming,
and all of these are
based on very
high-level
programming languages and
substantial reuse. The
Agile
methods
are also mushrooming and
also tend to use high-level
pro-
gramming
languages. Software reuse in
some applications now tops
80
percent.
LOC metrics cannot be used
for most web applications
and are
certainly
not useful for measuring
Scrum sessions and other
noncoding
activities
that are part of Agile
projects.
Programming
and Code
Development
549
Function
point metrics had become
the dominant metric for
serious
economic
and quality studies. But two
new problems appeared
that
have
kept function point metrics
from actually becoming the
industry
standard
for both economic and
quality studies.
The
first problem is that some
software applications are
now so large
(greater
than 300,000 function
points) that normal function
point analy-
sis
is too slow and too
expensive to be used.
There
are gaps at both ends of
normal function point
analysis. Above
15,000
function points, the costs
and schedule for counting
function point
metrics
become so high that large
projects are almost never
counted.
(Function
point analysis operates
between 400 and 600
function points
per
day per counter. The
approximate cost is about
$6.00 per function
point
counted.)
At
the low end of the
scale, the counting rules
for function points
do
not
operate below a size of
about 15 function points.
Thus, small changes
and
bug repairs cannot be
counted. Individually, such
changes may be as
th
small
as 1/50 of a function point
and are rarely larger
than 10 function
points.
But large companies can make
30,000 or more changes per
year,
with
a total size that can
top 100,000 function
points.
The
second problem is that the success of
the original function
point
metric
has triggered an explosion of
function point clones. As of
2009,
there
are at least 24 function
point variations. This makes
benchmark
and
baseline studies difficult, because
there are very few
conversion
rules
from one variation to
another.
In
addition to standard IFPUG function
points, there are also
Mark
II
function points, COSMIC
function points, Finnish
function points,
Netherlands
function points, story
points, feature points,
web-object
points,
and many others.
Although
LOC metrics continue to be
used, they continue to have
such
major
errors that they constitute
professional malpractice for
economic
and
quality studies where more
than one language is involved, or
where
non-coding
issues are
significant.
There
is also a psychological problem.
LOC usage tends to fixate
atten-
tion
on coding and make the
other kinds of software work
invisible. For
large
software projects there may
be many more noncode workers
than
programmers.
There will be architects, designers,
database administra-
tors,
quality assurance, technical
writers, project managers,
and many
other
occupations. But since none of
these can be measured using
LOC
metrics,
the LOC literature ignores
them.
Lines
of Code Metrics Circa
2010
It
would be nice to predict an
optimistic future, but the
recession has
changed
the nature of industry and
the future is now
uncertain.
550
Chapter
Eight
If
current trends continue, within a
few more years the
software
industry
will have more than 3000
programming languages, of
which
about
2900 will be obsolete or nearly
dead languages. The
industry
will
have more than 20 variations
for counting lines of code,
more than
50
variations for counting
function points, and
probably another 20
unreliable
metrics such as story
points, use-case points, cost
per defect,
or
using percentages of unknown
numbers. (The software
industry loves
to
make claims such as "improve
productivity by 10 to 1" without
defin-
ing
either the starting or the
ending point.)
Future
generations of sociologists will no doubt
be interested in why
the
software industry spends so
much energy on creating
variations of
things,
and so little energy on
fundamental issues. No doubt
large proj-
ects
will still be cancelled, litigation
for failures will still be
common,
software
quality will still be bad,
software productivity will remain
low,
security
flaws will be alarming, and
the software literature will
con-
tinue
to offer unsupported claims
without actually presenting
quanti-
fied
data.
What
the software industry needs
is actually fairly
straightforward:
1.
Measures of defect potentials
from all sources expressed
in terms of
function
points; that is,
requirements defects, design
defects, code
defects,
document defects, and bad
fixes.
2.
Measures of defect removal
efficiency levels for all
forms of inspec-
tion,
static analysis, and
testing.
3.
Activity-based productivity benchmarks
from requirements
through
delivery
and then for maintenance
and customer support
from
delivery
to retirement using function
points.
4.
Certified sources of reusable
material near the
zero-defect level.
5.
Much improved security
methods to guard against
viruses, spyware,
and
hacking.
6.
Licenses and board-certification
for software engineering
specialties.
But
until measurement becomes
both accurate and
cost-effective,
none
of these are likely to
occur. An occupation that will
not measure
its
own performance with accuracy is
not a true
profession.
Lines
of Code Circa
2020
If
we look forward to 2020,
there are best-case and
worst-case scenarios
to
consider.
The
best-case scenario for lines
of code metrics is that
usage dimin-
ishes
even faster than it has
been and that economic
productivity based
on
delivery becomes the
industry focus rather than
development and
Programming
and Code
Development
551
lines
of code. For this scenario
to occur, the speed of function
point analy-
sis
needs to increase and the
cost per function point
counted needs to
decrease
from about $6.00 per
function point counted to
less than $0.10
per
function point counted,
which is technically possible
and indeed
occurs
in 2009, although the
high-speed methods are not
yet widely
deployed
since they are so
new.
If
these changes occur, then
function point usage will
increase at least
tenfold,
and many new kinds of
economic studies can be
carried out.
Among
these will be measurement of entire
portfolios that might
top
10
million function points.
Corporate backlogs could be
sized and pri-
oritized,
and some of these exceed 1
million function points.
Risk/value
analyses
for major software
applications could become
both routine
and
professionally competent. It will also be
possible to do economic
analyses
of interesting new technologies
such as the Agile
methods,
service-oriented
architecture (SOA), software as a
service (SaaS), and
of
course total cost of
ownership (TCO).
Under
the best-case scenario, software
engineering would evolve
from
a
craft or art form into a
true engineering discipline.
Reliable measures
of
all activities and tasks
will lead to greater success rates on
large soft-
ware
applications. The goal of
software engineering should be to
become
a
true engineering discipline with
recognized specialties, board
certifica-
tion,
and accurate information on
productivity, quality, and
costs. But
that
cannot be accomplished when
project failures outnumber
successes
for
large applications.
So
long as quality and
productivity are ambiguous
and uncertain, it
is
difficult to carry out
multiple regression studies
and to select really
effective
tools and methods. LOC
metrics have been a major
barrier to
economic
and quality studies for
software.
The
worst-case scenario is that
LOC metrics continue at
about the
same
level as 2009. The software
industry will continue to ignore
eco-
nomic
productivity and remain
fixated on the illusory
"lines of code per
month"
metric. Under the worst-case
scenario, "software
engineering"
will
remain an oxymoron. Trial-and-error
methods will continue to
dom-
inate,
in part because effective tools
and methodologies cannot
even be
studied
using LOC metrics. Under
the worst-case scenario,
failures and
project
disasters will remain common
for large software
applications.
Function
point analysis will continue to
serve an important role
for
economic
studies, benchmarks, and
baselines, but only for
about 10
percent
of software applications of medium
size. The cost per
function
point
under the worst-case
scenario will remain so high
that usage
above
15,000 function points will
continue to be very rare.
There will
probably
be even more function point
variations, and the chronic
lack
of
conversion rules from one
variation to another will make
large-scale
international
economic studies almost
impossible.
552
Chapter
Eight
Summary
and Conclusions
The
history of lines of code
metrics is a cautionary tale
for all people
who
work in software. The LOC
metric started out well
and was fairly
effective
when there was only one
programming language and
coding
was
so difficult it constituted 90 percent of
the total effort for
putting
software
on a computer.
But
the software industry began
to develop hundreds of
program-
ming
languages. Applications started to
use multiple
programming
languages,
and that remains the
norm today. Applications
grew from
less
than 1000 lines of code up
to more than 10 million
lines of code.
Coding
is the major task for
small applications, but for
large systems,
the
work shifts to defect
removal and production of
paper documents
in
the forms of requirements,
specifications, user manuals,
test plans,
and
many others.
The
LOC metric was not
able to keep pace with either
change. It does
not
work well when there is
ambiguity in counting code,
which always
occurs
with high-level languages and
multiple languages in the
same
application.
It does not work well
for large systems where
coding is only
a
small fraction of the total
effort.
As
a result, LOC metrics became
less and less useful
until sometime
around
1985 they started to become
actually harmful. Given the
errors
and
misunderstandings that LOC
metrics bring to economic,
productiv-
ity,
and quality studies, it is
fair to say that in many
situations usage
of
LOC metrics can be viewed as
professional malpractice if more
than
one
programming language is part of
the study or the study
seeks to
measure
real economic
productivity.
The
final point is that
continued usage of LOC
metrics is a significant
barrier
that is delaying the
progress of software engineering
from a
craft
to a true engineering discipline. An
occupation that cannot
even
measure
its own work with accuracy
is hardly qualified to be
called
engineering.
Readings
and References
Barr,
Michael and Anthony Massa.
Programming
Embedded Systems: With C and
GNU
Development
Tools. Sebastopol,
CA: O'Reilly Media,
2006.
Beck,
K. Extreme
Programming Explained: Embrace
Change. Boston,
MA: Addison
Wesley,
1999.
Bott,
Frank, A. Coleman, J. Eaton,
and D. Rowland. Professional
Issues in Software
Engineering,
Third
Edition.
London
and New York: Taylor &
Francis, 2000.
Cockburn,
Alistair. Agile
Software Development. Boston,
MA: Addison Wesley,
2001.
Cohen,
D., M. Lindvall, & P. Costa,
"An Introduction to agile
methods." Advances
in
Computers.
New
York: Elsevier Science
(2004): 166.
Garmus,
David and David Herron.
Function
Point Analysis. Boston:
Addison Wesley,
2001.
Garmus,
David and David Herron.
Measuring
the Software Process: A Practical
Guide
to
Functional Measurement. Englewood
Cliffs, NJ: Prentice Hall,
1995.
Programming
and Code
Development
553
Glass,
Robert L. Facts
and Fallacies of Software
Engineering (Agile
Software
Development).
Boston:
Addison Wesley, 2002.
Hans,
Professor van Vliet.
Software
Engineering Principles and
Practices,
Third
Edition.
London, New York: John
Wiley & Sons,
2008.
Highsmith,
Jim. Agile
Software Development Ecosystems.
Boston,
MA: Addison Wesley,
2002.
Humphrey,
Watts. PSP:
A Self-Improvement Process for Software
Engineers. Upper
Saddle
River, NJ: Addison Wesley,
2005.
Humphrey,
Watts. TSP--Leading
a Development Team. Boston,
MA: Addison Wesley,
2006.
Hunt,
Andrew and David Thomas.
The
Pragmatic Programmer. Boston,
MA: Addison
Wesley,
1999.
Jeffries,
R., et al. Extreme
Programming Installed. Boston,
MA: Addison Wesley,
2001.
Jones,
Capers. Applied
Software Measurement, Third
Edition. New York, NY:
McGraw-
Hill,
2008.
Jones,
Capers. Conflict
and Litigation Between
Software Clients and
Developers,
Version
6. Burlington, MA: Software Productivity
Research, June 2006. 54
pages.
Jones,
Capers. Estimating
Software Costs, Second
Edition. New York, NY:
McGraw-Hill,
2007.
Jones,
Capers. Software
Assessments, Benchmarks, and Best
Practices. Boston,
MA:
Addison
Wesley Longman, 2000.
Jones,
Capers. "The Economics of
Object-Oriented Software." American
Programmer
Magazine,
October
1994: 2935.
Kan,
Stephen H. Metrics
and Models in Software
Quality Engineering, Second
Edition.
Boston,
MA: Addison Wesley Longman,
2003.
Krutchen,
Phillippe. The
Rational Unified Process--An
Introduction. Boston,
MA:
Addison
Wesley, 2003.
Larman,
Craig &, Victor Basili.
"Iterative and Incremental
Development--A
Brief
History."
IEEE
Computer Society, June
2003: 4755.
Love,
Tom. Object
Lessons. New
York, NY: SIGS Books,
1993.
Marciniak,
John J. (Ed.) Encyclopedia
of Software Engineering. (2
vols.) New York,
NY:
John
Wiley & Sons,
1994.
McConnell,
Steve. Code
Complete. Redmond,
WA: Microsoft Press,
1993.
------
Software Estimation--Demystifying the
Black Art. Redmond,
WA: Microsoft
Press,
2006.
Mills,
H., M. Dyer, & R. Linger.
"Cleanroom Software Engineering."
IEEE
Software, 4,
5
(Sept.
1987): 1925.
Morrison,
J. Paul. Flow-Based
Programming. A New Approach to
Application
Development.
New
York, NY: Van Nostrand
Reinhold, 1994.
Park,
Robert E. SEI-92-TR-20:
Software Size Measurement: A
Framework for
Counting
Software
Source Statements. Pittsburgh,
PA: Software Engineering
Institute, 1992.
Pressman,
Roger. Software
Engineering--Practitioner's Approach, Sixth
Edition. New
York,
NY: McGraw-Hill,
2005.
Putnam,
Lawrence and Ware Myers.
Industrial
Strength Software--Effective
Management
Using Measurement. Los
Alamitos, CA: IEEE Press,
1997.
------
Measures for Excellence--Reliable
Software On-Time Within
Budget. Englewood
Cliffs,
NJ: Yourdon Press, Prentice
Hall, 1992.
Sommerville,
Ian. Software
Engineering, Seventh
Edition. Boston, MA: Addison
Wesley,
2004.
Stapleton,
J. DSDM--Dynamic
System Development Method in
Practice. Boston,
MA :
Addison
Wesley, 1997.
Stephens
M. and D. Rosenberg. Extreme
Programming Refactored: The Case
Against
XP.
Berkeley,
CA: Apress L.P.,
2003.
This
page intentionally left
blank
Table of Contents:
|
|||||