|
|||||
Chapter
6
Project
Management and
Software
Engineering
Introduction
Project
management is a weak link in
the software engineering
chain. It
is
also a weak link in the
academic education curricula of
many univer-
sities.
More software problems and
problems such as cost and
schedule
overruns
can be attributed to poor
project management than to
poor
programming
or to poor software engineering
practices. Because
poor
project
management is so harmful to good
software engineering, it is
relevant
to a book on best
practices.
Working
as an expert witness in a number of
cases involving
cancelled
projects,
quality problems, and other
significant failures, the
author
observed
bad project management
practices on almost every
case. Not
only
were project management
problems common, but in some
lawsuits,
the
project managers and higher
executives actually interfered
with
good
software engineering practices by
canceling inspections and
trun-
cating
testing in the mistaken
belief that such actions
would shorten
development
schedules.
For
example, in a majority of breach of
contract lawsuits, project
man-
agement
issues such as inadequate
estimating, inadequate quality
con-
trol,
inadequate change control,
and misleading or even false
status
tracking
occur repeatedly.
As
the recession continues, it is
becoming critical to analyze
every
aspect
of software engineering in order to
lower costs without
degrading
operational
efficiency. Improving project
management is on the
critical
path
to successful cost
reduction.
351
352
Chapter
Six
Project
management needs to be defined in a
software context. The
term
project
management has
been artificially narrowed by
tool vendors
so
that it has become
restricted to the activities of
critical path
analysis
and
the production of various
scheduling aids such as PERT
and Gantt
charts.
For successful software
project management, many
other activi-
ties
must be supported.
Table
6-1 illustrates 20 key
project management functions,
and how
well
they are performed circa
2009, based on observations
within about
150
companies. The scoring range
is from 10 for very
poor performance
to
+10 for excellent
performance.
Using
this scoring method that
runs from +10 to 10,
the midpoint or
average
is 0. Observations made over
the past few years
indicate that proj-
ect
management is far below
average in far too many
critical activities.
The
top item in Table 6-1,
reporting "red flag" items,
refers to notify-
ing
clients and higher managers
that a project is in trouble. In
almost
every
software breach of contract
lawsuit, problems tend to be
concealed
or
ignored, which delays trying
to solve problems until they
grow too
serious
to be cured.
TABLE
6-1
Software
Project Management Performance Circa
2009
Project
Management Functions
Score
Definition
1.
Reporting
"red flag" problems
9.5
Very
poor
2.
Defect
removal efficiency
measurements
9.0
Very
poor
3.
Benchmarks
at project completion
8.5
Very
poor
4.
Requirements
change estimating
8.0
Very
poor
5.
Postmortems
at project completion
8.0
Very
poor
6.
Quality
estimating
7.0
Very
poor
7.
Productivity
measurements
6.0
Poor
8.
Risk
estimating
3.0
Poor
9.
Process
improvement tracking
2.0
Poor
10.
Schedule
estimating
1.0
Marginal
11.
Initial
application sizing
2.0
Marginal
12.
Status
and progress tracking
2.0
Marginal
13.
Cost
estimating
3.0
Fair
14.
Value
estimating
4.0
Fair
15.
Quality
measurements
4.0
Fair
16.
Process
improvement planning
4.0
Fair
17.
Quality
and defect tracking
5.0
Good
18.
Software
assessments
6.0
Good
19.
Cost
tracking
7.0
Very
good
20.
Earned-value
tracking
8.0
Very
good
Average
0.8
Poor
Project
Management and Software
Engineering
353
The
main reason for such
mediocre performance by software
project
managers
is probably lack of effective
curricula at the university
and
graduate
school level. Few software
engineers and even fewer MBA
stu-
dents
are taught anything about
the economic value of
software quality
or
how to measure defect
removal efficiency levels,
which is actually the
most
important single measurement in
software engineering.
If
you examine the tools
and methods for effective
software project
management
that are available in 2009,
a much different profile
would
occur
if software managers were
trained at state-of-the-art
levels.
Table
6-2 makes the assumption
that a much improved
curricula for
project
managers could be made
available within ten years,
coupled with
the
assumption that project
managers would then be
equipped with
modern
sizing, cost estimating,
quality estimating, and
measurement
tools.
Table 6-2 shows what
software project managers
could do if they
were
well trained and well
equipped.
Instead
of jumping blindly into
projects with poor estimates
and
inadequate
quality plans, Table 6-2
shows that it is at least
theoreti-
cally
possible for software
project managers to plan and
estimate with
TABLE
6-2
Potential
Software Project Management Performance by
2019
Project
Management Functions
Score
Definition
1.
Reporting
"red flag" problems
10.0
Excellent
2.
Benchmarks
at project completion
10.0
Excellent
3.
Postmortems
at project completion
10.0
Excellent
4.
Status
and progress tracking
10.0
Excellent
5.
Quality
measurements
10.0
Excellent
6.
Quality
and defect tracking
10.0
Excellent
7.
Cost
tracking
10.0
Excellent
8.
Defect
removal efficiency
measurements
9.0
Excellent
9.
Productivity
measurements
9.0
Very
good
10.
Software
assessments
9.0
Very
good
11.
Earned-value
tracking
9.0
Very
good
12.
Quality
estimating
8.0
Very
good
13.
Initial
application sizing
8.0
Very
good
14.
Cost
estimating
8.0
Very
good
15.
Risk
estimating
7.0
Good
16.
Schedule
estimating
7.0
Good
17.
Process
improvement tracking
6.0
Good
18.
Value
estimating
6.0
Good
19.
Process
improvement planning
6.0
Good
20.
Requirements
change estimating
5.0
Good
Average
8.4
Very
good
354
Chapter
Six
high
precision, measure with even
higher precision, and create
bench-
marks
for every major application
when it is finished. Unfortunately,
the
technology
of software project management is
much better than
actual
day-to-day
performance.
As
of 2009, the author
estimates that only about 5
percent of U.S.
software
projects create benchmarks of
productivity and quality
data at
completion.
Less than one-half of 1
percent submit benchmark
data to a
formal
benchmark repository such as
that maintained by the
International
Software
Benchmarking Standards Group
(ISBSG), Software
Productivity
Research
(SPR), the David Consulting
Group, Quality and
Productivity
Management
Group (QPMG), or similar
organizations.
Every
significant software project
should prepare formal
benchmarks
at
the completion of the
project. There should also
be a postmortem
review
of development methods to ascertain
whether improvements
might
be useful for future
projects.
As
of 2009, many independent
software project management
tools are
available,
but each only supports a
portion of overall software
project
management
responsibilities. A new generation of
integrated software
project
management tools is approaching,
which has the promise
of
eliminating
the gaps in current project
management tools and
improv-
ing
the ability to share
information from tool to
tool. New classes of
project
management tools such as
methodology management tools
have
also
joined the set available to
the software management
community.
Software
project management is one of the
most demanding jobs
of
the
21st century. Software
project managers are
responsible for the
con-
struction
of some of the most
expensive assets that
corporations have
ever
attempted to build. For
example, large software
systems cost far
more
to build and take much
longer to construct than the
office build-
ings
occupied by the companies
that have commissioned the
software.
Really
large software systems in
the 100,000function point
range can
cost
more than building a domed
football stadium, a 50-story
skyscraper,
or
a 70,000-ton cruise
ship.
Not
only are large software
systems expensive, but they
also have
one
of the highest failure rates
of any manufactured object in
human
history.
The term failure
refers
to projects that are
cancelled without
completion
due to cost or schedule
overruns, or which run later
than
planned
by more than 25
percent.
For
software failures and
disasters, the great
majority of blame can
be
assigned to the management
community rather than to the
tech-
nical
community. Table 6-3 is
derived from one of the
author's older
books,
Patterns
of Software System Failure
and Success, published
by
the
International Thomson Press.
Note the performance of
software
managers
on successful projects as opposed to
their performance
associ-
ated
with cancellations and severe
overruns.
Project
Management and Software
Engineering
355
TABLE
6-3
Software
Management Performance on Successful and
Unsuccessful
Projects
Activity
Successful
Projects
Unsuccessful
Projects
Sizing
Good
Poor
Planning
Very
good
Fair
Estimating
Very
good
Very
poor
Tracking
Good
Poor
Measurement
Good
Very
Poor
Quality
control
Excellent
Poor
Change
control
Excellent
Poor
Problem
resolutions
Good
Poor
Risk
analysis
Good
Very
poor
Personnel
management
Good
Poor
Supplier
management
Good
Poor
Overall
Performance
Very
good
Poor
As
mentioned in Chapter 5 of this
book, the author's study of
proj-
ect
failures and analysis of
software lawsuits for breach
of contract
reached
the conclusion that project
failures correlate more
closely to
the
number of managers involved with
software projects than they
do
with
the number of software
engineers.
Software
projects with more than
about six first-line
managers tend
to
run late and over
budget. Software projects with
more than about
12
first-line managers tend to
run very late and
are often cancelled.
As
can easily be seen,
deficiencies of the software
project management
function
are a fundamental root cause of
software disasters.
Conversely,
excellence
in project management can do
more to raise the
probability of
success
than almost any other
factor, such as buying
better tools, or
chang-
ing
programming languages. (This is
true for larger applications
above
1000
function points. For small
applications in the range of
100 function
points,
software engineering skills
still dominate
results.)
On
the whole, improving
software project management
performance
can
do more to optimize software
success probabilities and to
minimize
failure
probabilities than any other
known activity. However,
improving
software
project management performance is
also one of the more
difficult
improvement
strategies. If it were easy to do,
the software industry
would
have
many more successes and
far fewer failures than in
fact occur.
A
majority of the failures of
software projects can be
attributed to fail-
ures
of project management rather
than to failures of the
technical staff.
For
example, underestimating schedules
and resource requirements
is
associated
with more than 70 percent of
all projects that are
cancelled
due
to overruns. Another common
problem of project management
is
356
Chapter
Six
ignoring
or underestimating the work
associated with quality
control
and
defect removal. Yet another
management problem is failure to
deal
with
requirements changes in an effective
manner.
Given
the high costs and
significant difficulty associated with
software
system
construction, you might
think that software project
managers
would
be highly trained and well
equipped with state-of-the-art
planning
and
estimating tools, with substantial
analyses of historical software
cost
structures,
and with very thorough risk
analysis methodologies.
These
are
natural assumptions to make,
but they are false.
Table 6-4 illustrates
patterns
of project management tool
usage of leading, average,
and lag-
ging
software projects.
Table
6-4 shows that managers on
leading projects not only
use a
wider
variety of project management
tools, but they also
use more of
the
features of those
tools.
In
part due to the lack of
academic preparation for
software project
managers,
most software project
managers are either totally
untrained
or
at best partly trained for
the work at hand. Even
worse, software
project
managers are often severely
under-equipped with state-of-the-
art
tools.
TABLE
6-4
Numbers
and Size Ranges of Project
Management Tools
(Size
data expressed in terms of
function point metrics)
Project
Management
Lagging
Average
Leading
Project
planning
1,000
1,250
3,000
Project
cost estimating
3,000
Statistical
analysis
3,000
Methodology
management
750
3,000
Benchmarks
2,000
Quality
estimation
2,000
Assessment
support
500
2,000
Project
measurement
1,750
Portfolio
analysis
1,500
Risk
analysis
1,500
Resource
tracking
300
750
1,500
Value
analysis
350
1,250
Cost
variance reporting
500
1,000
Personnel
support
500
500
750
Milestone
tracking
250
750
Budget
support
250
750
Function
point analysis
250
750
Backfiring:
LOC to FP
750
Function
point subtotal
1,800
5,350
30,250
Number
of tools
3
10
18
Project
Management and Software
Engineering
357
From
data collected from
consulting studies performed by
the author,
less
than 25 percent of U.S.
software project managers
have received any
formal
training in software cost
estimating, planning, or risk
analysis;
less
than 20 percent of U.S.
software project managers
have access to
modern
software cost-estimating tools;
and less than 10 percent
have
access
to any significant quantity of
validated historical data
from proj-
ects
similar to the ones they are
responsible for.
The
comparatively poor training
and equipment of project
managers
is
troubling. There are at
least a dozen commonly used
software cost-
estimating
tools such as COCOMO,
KnowledgePlan, Price-S,
SEER,
SLIM,
and the like. Of a number of
sources of benchmark data,
the
International
Software Benchmarking Standards
Group (ISBSG) has
the
most accessible data
collection.
By
comparison, the software
technical personnel who
design and build
software
are often fairly well
trained in the activities of
analysis, design,
and
software development, although
there are certainly gaps in
topics
such
as software quality control
and software
security.
The
phrase "project management"
has unfortunately been
narrowed
and
misdefined in recent years by
vendors of automated tools
for sup-
porting
real project managers. The
original broad concept of
project
management
included all of the
activities needed to control
the outcome
of
a project: sizing deliverables,
estimating costs, planning
schedules
and
milestones, risk analysis,
tracking, technology selection,
assessment
of
alternatives, and measurement of
results.
The
more narrow concept used
today by project management
tool
vendors
is restricted to a fairly limited
set of functions associated
with
the
mechanics of critical path
analysis, work breakdown
structuring,
and
the creation of PERT charts,
Gantt charts, and other
visual sched-
uling
aids. These functions are of
course part of the work
that project
managers
perform, but they are
neither the only activities
nor even the
most
important ones for software
projects.
The
gaps and narrow focus of
conventional project management
tools
are
particularly troublesome when
the projects in question are
software
related.
Consider a very common
project management question
associ-
ated
with software projects: What
will be the results to software
sched-
ules
and costs from the adoption
of a new development method
such as
Agile
development or the Team
Software Process (TSP)?
Several
commercial software estimating
tools can predict the
results
of
both Agile and TSP
development methods, but not
even one standard
project
management tool such as
Microsoft Project has any
built-in
capabilities
for automatically adjusting
its assumptions when
dealing
with
alternative software development
approaches.
The
same is also true for
other software-related technologies
such
as
the project management
considerations of dealing with the
formal
358
Chapter
Six
inspections
in addition to testing, static
analysis, the ISO
9000-9004
standards,
the SEI maturity model,
reusable components, ITIL,
and
so
forth.
The
focus of this chapter is
primarily on the activities
and tasks asso-
ciated
with software
project management. Project
managers also spend
quite
a bit of time dealing with
personnel issues such as
hiring, apprais-
als,
pay raises, and staff
specialization. Due to the
recession, project
managers
will probably also face
tough decisions involving
layoffs and
downsizing.
Most
software project managers
are also involved with
departmental
and
corporate issues such as
creating budgets, handling
travel requests,
education
planning, and office space
planning. These are
important
activities,
but are outside the
scope of what managers do
when they are
involved
specifically with project
management.
The
primary focus of this
chapter is on the tools and
methods that
are
the day-to-day concerns of
software project managers,
that is, sizing,
estimating,
planning, measurement and
metrics, quality control,
process
assessments,
technology selection, and
process improvement.
There
are 15 basic topics that
project managers need to
know about,
and
each topic is a theme of
some importance to professional
software
project
managers:
1.
Software sizing
2.
Software project
estimating
3.
Software project
planning
4.
Software methodology
selection
5.
Software technology and tool
selection
6.
Software quality
control
7.
Software security
control
8.
Software supplier
management
9.
Software progress and
problem tracking
10.
Software measurements and
metrics
11.
Software benchmarking
12.
Software risk
analysis
13.
Software value
analysis
14.
Software process
assessments
15.
Software process
improvements
These
15 activities are not the
only topics of concern to
software proj-
ect
managers, but they are
critical topics in terms of
the ability to
control
Project
Management and Software
Engineering
359
major
software projects. Unless at
least 10 of these 13 are
performed in
a
capable and competent
manner, the probability of
the project running
out
of control or being cancelled will be
alarmingly high.
Because
the author's previous books
on Estimating
Software Costs
(McGraw-Hill,
2007) and Applied
Software Measurement (McGraw-Hill,
2008)
dealt with many managerial
topics, this book will cover
only 3 of
the
15 management topics:
1.
Software sizing
2.
Software progress and
problem tracking
3.
Software benchmarking
Sizing
is
the precursor to estimating.
Sizing has many
different
approaches,
but several new approaches
have been developed
within
the
past year.
Software
progress
tracking is
among the most critical of
all software
project
management activities. Unfortunately,
based on depositions
and
documents discovered during
litigation, software progress
track-
ing
is seldom performed competently.
Even worse, when projects
are
in
trouble, tracking tends to
conceal problems until it is
too late to
solve
them.
Software
benchmarking
is
underreported in the literature. As
this
book
is in production, the ISO standards
organization is preparing a
new
ISO standard on benchmarking. It seems
appropriate to discuss
how
to collect benchmark data
and what kinds of reports
constitute
effective
benchmarks.
Software
Sizing
The
term sizing
refers
to methods for predicting
the volume of various
deliverable
items such as source code,
specifications, and user
manu-
als.
Software bugs or defects
should also be included in
sizing, because
they
cost more money and
take more time than
any other software
"deliverable."
Bugs are an accidental
deliverable, but they are
always
delivered,
like it or not, so they need
to be included in sizing.
Because
requirements
are unstable and grow
during development, changes
and
growth
in application requirements should be
sized, too.
Sizing
is the precursor to cost
estimating and is one of the
most criti-
cal
software project management
tasks. Sizing is concerned with
pre-
dicting
the volumes of major kinds
of software deliverables,
including
but
not limited to those shown
in Table 6-5.
As
can be seen from the list of
deliverables, the term
sizing
includes
quite
a few deliverables. Many
more things than source
code need to be
predicted
to have complete size and
cost estimates.
360
Chapter
Six
TABLE
6-5
Software
Deliverables Whose Sizes
Should Be Quantified
Paper
documents
Requirements
Text
requirements
Function
requirements (features of
the
application)
Nonfunctional
requirements (quality
and
constraints)
Use-cases
User
stories
Requirements
change (new features)
Requirements
churn (changes that
don't
affect size)
Architecture
External
architecture (SOA,
client-
server,
etc.)
Internal
architecture (data
structure,
platforms,
etc.)
Specifications
and design
External
Internal
Planning
documents
Development
plans
Quality
plans
Test
plans
Security
plans
Marketing
plans
Maintenance
and support plans
User
manuals
Reference
manuals
Maintenance
manuals
Translations
into foreign
languages
Tutorial
materials
Translations
of tutorial materials
Online
HELP screens
Translations
of HELP screens
Source
code
New
source code
Reusable
source code from
certified
sources
Reusable
source code from
uncertified
sources
Inherited
or legacy source code
Code
added to support
requirements
change and churn
Project
Management and Software
Engineering
361
TABLE
6-5
Software
Deliverables Whose Sizes
Should Be Quantified (continued)
Test
cases
New
test cases
Reusable
test cases
Bugs
or defects
Requirements
defects (original)
Requirements
defects (in changed
requirements)
Architectural
defects
Design
defects
Code
defects
User
documentation defects
"Bad
fixes" or secondary defects
Test
case defects
Note
that while bugs or defects
are accidental deliverables,
there are
always
latent bugs in large
software applications and
they have serious
consequences.
Therefore, estimating defect
potentials and defect
removal
efficiency
levels are critical tasks of
software application
sizing.
This
section discusses several
methods of sizing software
applications,
which
include but are not
limited to:
1.
Traditional sizing by analogy with
similar projects
2.
Traditional sizing using
"lines of code" metrics
3.
Sizing using story point
metrics
4.
Sizing using use-case
metrics
5.
Sizing using IFPUG function
point metrics
6.
Sizing using other varieties
of function point
metrics
7.
High-speed sizing using
function point
approximations
8.
High-speed sizing legacy
applications using
backfiring
9.
High-speed sizing using
pattern matching
10.
Sizing application requirements
changes
Accurate
estimation and accurate
schedule planning depend
on
having
accurate size information, so
sizing is a critical topic
for success-
ful
software projects. Size and
size changes are so
important that a new
management
position called "scope manager"
has come into
existence
over
the past few
years.
362
Chapter
Six
New
methods for formal size or
scope control have been
created.
Interestingly,
the two most common
methods were developed in
very
distant
locations from each other. A
method called Southern
Scope was
developed
in Australia, while a method
called Northern
Scope was
devel-
oped
in Finland. Both of these
scope-control methods focus on
change
controls
and include formal sizing,
reviews of changes, and
other tech-
niques
for quantifying the impact
of growth and change. While
other
size
control methods exist, the
Southern Scope and Northern
Scope
methods
both appear to be more
effective than leaving
changes to ordi-
nary
practices.
Because
thousands of software applications
exist circa 2009,
care-
ful
forensic analysis of existing
software should be a good
approach
for
predicting the sizes of
future applications. As of 2009,
many "new"
applications
are replacements of existing
legacy applications.
Therefore,
historical
data would be useful, if it
were reliable and
accurate.
Size
is a useful precursor for
estimating staffing, schedules,
effort,
costs,
and quality. However, size
is not the only factor
that needs to be
known.
Consider an analogy with home
construction. You need to
know
the
number of square feet or
square meters in a house to
perform a cost
estimate.
But you also need to know
the specifics of the site,
the con-
struction
materials to be used, and
any local building codes
that might
require
costly additions such as
hurricane-proof windows or
special
septic
systems.
For
example, a 3000-square-foot home to be
constructed on a flat
suburban
lot with ordinary building
materials might be constructed
for
$100
per square foot, or
$300,000. But a luxury 3000-square-foot
home
built
on a steep mountain slope
that requires special
support and uses
exotic
hardwoods might cost $250
per square foot or
$750,000.
Similar
logic applies to software. An
embedded application in a
medi-
cal
device may cost twice as
much as the same size
application that
handles
business data. This is because
the liabilities associated
with
software
in medical devices require
extensive verification and
validation
compared
with ordinary business
applications.
(Author's
note: Prior to the
recession, one luxury home
was built on a
remote
lake so far from
civilization that it needed a
private airport and
its
own electric plant. The
same home featured
handcrafted windows
and
wall panels created on site
by artists and craftspeople.
The bud-
geted
cost was about $40
million, or more than $6,000
per square foot.
Needless
to say, this home was
built before the Wall
Street crash since
the
owner was a
financier.)
Three
serious problems have long
been associated with software
sizing:
(1)
Most of the facts needed to
create accurate sizing of
software deliver-
ables
are not known until
after the first cost
estimates are required;
(2)
Some
sizing methods such as
function point analysis are
time-consuming
Project
Management and Software
Engineering
363
and
expensive, which limits
their utility for large
applications; and (3)
Software
deliverables are not static
in size and tend to grow
during
development.
Estimating growth and change
is often omitted from
sizing
techniques.
Let us now consider a number
of current software
sizing
approaches.
Traditional
Sizing by Analogy
The
traditional method of sizing
software projects has been
that of anal-
ogy
with older projects that are
already completed, so that
the sizes of
their
deliverables are known.
However, newer methods are
available
circa
2009 and will be discussed
later in this
chapter.
The
traditional sizing-by-analogy method
has not been very
success-
ful
for a variety of reasons. It
can only be used for
common kinds of
software
projects where similar
projects exist. For example,
sizing by
analogy
works fairly well for
compilers, since there are
hundreds of
compilers
to choose from. The analogy
method can also work
for other
familiar
kinds of applications such as
accounting systems, payroll
sys-
tems,
and other common application
types. However, if an application
is
unique,
and no similar applications
have been constructed, then
sizing
by
analogy is not
useful.
Because
older legacy applications
predate the use of story
points or
use-case
points, or sometimes even
function points, not every
legacy
application
is helpful in terms of providing
size guidance for
new
applications.
For more than 90 percent of
legacy applications,
their
size
is not known with precision,
and even code volumes
are not
known,
due to "dead code" and
calls to external routines.
Also, many
of
their deliverables (i.e.,
requirements, specifications, plans,
etc.)
have
long since disappeared or
were not updated, so their
sizes may
not
be available.
Since
legacy applications tend to
grow at an annual rate of
about 8
percent,
their current size is not
representative of their initial
size at
their
first release. Very seldom
is data recorded about
requirements
growth,
so this can throw off
sizing by analogy.
Even
worse, a lot of what is
called "historical data" for
legacy applications
is
very inaccurate and can't be
relied upon to predict
future applications.
Even
if legacy size is known,
legacy effort and costs
are usually incom-
plete.
The gaps and missing
elements in historical data
include unpaid
overtime
(which is almost never
measured), project management
effort,
and
the work of part-time
specialists who are not
regular members of
the
development
team (database administration,
technical writers,
quality
assurance,
etc.). The missing data on
legacy application effort,
staffing,
and
costs is called leakage
in
the author's books. For
small applications
with
one or two developers, this
leakage from historical data
is minor.
364
Chapter
Six
But
for large applications with
dozens of team members,
leakage of miss-
ing
effort and cost data
can top 50 percent of total
effort.
Leakage
of effort and cost data is
worse for internal
applications
developed
by organizations that operate as
cost centers and that
there-
fore
have no overwhelming business
need for precision in
recording
effort
and cost data. Outsource
applications and software
built under
contract
is more accurate in accumulating
effort and cost data,
but even
here
unpaid overtime is often
omitted.
It
is an interesting point to think
about, but one of the
reasons why IT
projects
seem to have higher
productivity rates than
systems or embed-
ded
software is that IT project
historical data "leaks" a
great deal more
than
systems and embedded
software. This leakage is
enough by itself
to
make IT projects look at
least 15 percent more
productive than sys-
tems
or embedded applications of the
same size in terms of
function
points.
The reason is that most IT
projects are created in a
cost-center
environment,
while systems and embedded
applications are created
in
a
profit-center environment.
The
emergence of the International
Software Benchmarking
Standards
Group
(ISBSG) has improved the
situation somewhat, since
ISBSG now
has
about 5000 applications of
various kinds that are
available to the
software
engineering community. All readers
who are involved with
software
are urged to consider
collecting and providing
benchmark data.
Even
if the data cannot be
submitted to ISBSG for
proprietary or busi-
ness
reasons, keeping such data
internally will be valuable.
The
ISBSG questionnaires assist by
collecting the same kinds of
infor-
mation
for hundreds of applications,
which facilitates using the
data for
estimating
purposes. Also, companies
that submit data to the
ISBSG
organization
usually have better-than-average
effort and cost
tracking
methods,
so their data is probably
more accurate than
average.
Other
benchmark organizations such as
Software Productivity
Research
(SPR), the Quality and
Productivity Management
Group
(QPMG),
the David Consulting Group,
and a number of others
have
perhaps
60,000 projects, but this
data has limited
distribution to specific
clients.
This private data is also
more expensive than ISBSG
data. A
privately
commissioned set of benchmarks with a
comparison to similar
relevant
projects may cost between
$25,000 and $100,000, based
on the
number
of projects examined. Of course,
the on-site private
benchmarks
are
fairly detailed and also
correct common errors and
omissions, so the
data
is fairly reliable.
What
would be useful for the
industry is a major expansion in
soft-
ware
productivity and quality
benchmark data collection.
Ideally, all
development
projects and all major
maintenance and
enhancement
projects
would collect enough data so
that benchmarks would
become
standard
practices, rather than
exceptional activities.
Project
Management and Software
Engineering
365
For
the immediate project under
development, the benchmark
data
is
valuable for showing defects
discovered to date, effort
expended to
date,
and ensuring that schedules
are on track. In fact,
similar but
less
formal data is necessary
just for status meetings, so
a case can be
made
that formal benchmark data
collection is close to being free
since
the
information is needed whether or
not it will be kept for
benchmark
purposes
after completion of the
project.
Unfortunately,
while sizing by analogy
should be useful, flaws
and
gaps
with software measurement practices
have made both sizing
by
analogy
and also historical data of
questionable value in many
cases.
If
there are benchmarks or
historical size
Timing
of sizing by analogy
data
from similar projects, this
form of sizing can be done
early, even
before
the requirements for the
new application are fully
known. This
is
one of the earliest methods of
sizing. However, if historical
data is
missing,
then sizing by analogy can't
be done at all.
There
are at least 3 million
existing soft-
Usage
of sizing by analogy
ware
applications that might, in
theory, be utilized for
sizing by analogy.
However,
from visits to many large
companies and government
agencies,
the
author hypothesizes that
fewer than 100,000 existing
legacy appli-
cations
have enough historical data
for sizing by analogy to be
useful
and
accurate. About another
100,000 have partial data
but so many
errors
that sizing by analogy would
be hazardous. About 2.8
million
legacy
applications either have
little or no historical data, or
the data is
so
inaccurate that it should
not be used. For many
legacy applications,
no
reliable size data is
available in any
metric.
This
form of sizing is quick and
inexpensive,
Schedules
and costs
assuming
that benchmarks or historical
data are available. If
neither
size
nor historical data is
available, the method of
sizing by analogy
cannot
be used. In general, benchmark
data from an external
source
such
as ISBSG, the David
Consulting Group, QPMG, or
SPR will be
more
accurate than internal data.
The reason for this is
that the exter-
nal
benchmark organizations attempt to
correct common errors,
such
as
omitting unpaid
overtime.
The
main counter indication is
that
Cautions
and counter
indications
sizing
by analogy does not work at
all if there is neither
historical data
nor
accurate benchmarks. A caution
about this method is that
historical
data
is usually incomplete and
leaves out critical
information such as
unpaid
overtime. Formal benchmarks
collected for ISBSG or one of
the
other
benchmark companies will usually be
more accurate than
most
internal
historical data, which is of
very poor
reliability.
366
Chapter
Six
Traditional
Sizing Based on
Lines
of
Code (LOC)
Metrics
When
the "lines of code" or LOC
metric originated in the
early 1960s,
software
applications were small and
coding composed about 90
percent
of
the effort. Today in 2009,
applications are large, and
coding composes
less
than 40 percent of the
effort. Between the 1960s
and today, the
useful-
ness
of LOC metrics degraded
until that metric became
actually harmful.
Today
in 2009, using LOC metrics
for sizing is close to professional
mal-
practice.
Following are the reasons
why LOC metrics are now
harmful.
The
first reason that LOC
metrics are harmful is that
after more
than
60 years of usage, there are
no standard counting rules
for source
code!
LOC metrics can be counted
using either physical lines
or logical
statements.
There can be more than a
500 percent difference in
appar-
ent
size of the same code
segment when the counting
method switches
between
physical lines and logical
statements.
In
the first edition of the
author's book Applied
Software Measurement
in
1991, formal rules for
counting source code based on
logical statements
were
included. These rules were
used by Software Productivity
Research
(SPR)
for backfiring when
collecting benchmark data. But in
1992, the
Software
Engineering Institute (SEI)
issued their rules for
counting source
code,
and the SEI rules were
based on counts of physical
lines. Since both
the
SPR counting rules and
the SEI counting rules are
widely used, but
totally
different, the effect is
essentially that of having no
counting rules.
(The
author did a study of the
code-counting methods used in
major
software
journals such as IEEE
Software, IBM Systems
Journal,
CrossTalk,
the
Cutter
Journal, and
so on. About one-third of
the articles
used
physical lines, one-third
used logical statements, and
the remain-
ing
third used LOC metrics,
but failed to mention
whether physical
lines
or logical statements (or
both) were used in the
article. This is a
serious
lapse on the part of both
the authors and the
referees of soft-
ware
engineering journals. You
would hardly expect a
journal such as
Science
or
Scientific
American to
publish quantified data
without care-
fully
explaining the metrics used
to collect and analyze the
results.
However,
for software engineering
journals, poor measurements
are the
norm
rather than the
exception.)
The
second reason that LOC
metrics are hazardous is because
they
penalize
high-level programming languages in
direct proportion to
the
power
of the language. In other
words, productivity and
quality data
expressed
using LOC metrics looks
better for assembly language
than
for
Java or C++.
The
penalty is due to a well-known law of
manufacturing economics,
which
is not well understood by
the software community:
When
a manu-
facturing
process has a large number of
fixed costs and there is a
decline
in
the number of units
manufactured, the cost per
unit must go up.
Project
Management and Software
Engineering
367
A
third reason is that LOC
metrics can't be used to
size or measure
noncoding
activities such as requirements,
architecture, design,
and
user
documentation. An application written in
the C programming
lan-
guage
might have twice as much
source code as the same
application
written
in C++. But the requirements
and specifications would be
the
same
size.
It
is not possible to size
paper documents from source
code without
adjusting
for the level of the
programming language. For
languages
such
as Visual Basic that do not
even have source code
counting rules
available,
it is barely possible to predict
source code size, much
less the
sizes
of any other
deliverables.
The
fourth reason the LOC
metrics are harmful is that
circa 2009,
more
than 700 programming
languages exist, and they
vary from very
low-level
languages such as assembly to
very high-level languages
such
as
ASP NET. More than 50 of
these languages have no
known counting
rules.
The
fifth reason is that most
modern applications use more
than a
single
programming language, and
some applications use as
many as
15
different languages, each of
which may have unique
code counting rules.
Even
a simple mix of Java and HTML
makes code counting
difficult.
Historically,
the development of Visual
Basic and its many
competi-
tors
and descendants changed the
way many modern programs
are
developed.
Although "visual" languages do
have a procedural
source
code
portion, much of the more
complex programming uses
button con-
trols,
pull-down menus, visual
worksheets, and reusable
components.
In
other words, programming is
being done without anything
that can
be
identified as a "line of code" for
sizing, measurement, or
estimation
purposes.
By today in 2009, perhaps 60
percent of new software
appli-
cations
are developed using either
object-oriented languages or
visual
languages
(or both). Indeed, sometimes
as many as 12 to 15 different
languages
are used in the same
applications.
For
large systems, programming
itself is only the fourth
most expen-
sive
activity. The three
higher-cost activities cannot be
measured or esti-
mated
effectively using the lines
of code metric. Also, the
fifth major cost
element,
project management, cannot
easily be estimated or
measured
using
the LOC metric either.
Table 6-6 shows the
ranking in descending
order
of software cost elements
for large
applications.
The
usefulness of a metric such as
lines of code, which can
only mea-
sure
and estimate one out of the
five major software cost
elements of
software
projects, is a significant barrier to
economic understanding.
Following
is an excerpt from the 3rd
edition of the author's
book
Applied
Software Measurement (McGraw-Hill,
2008), which
illustrates
the
economic fallacy of KLOC
metrics. Here are two
case studies showing
both
the LOC results and
function point results for
the same application
368
Chapter
Six
TABLE
6-6
Rank
Order of Large System Software
Cost Elements
1.
Defect
removal (inspections, static
analysis, testing, finding,
and fixing bugs)
2.
Producing
paper documents (plans,
architecture, specifications, user
manuals)
3.
Meetings
and communication (clients,
team members,
managers)
4.
Programming
5.
Project
management
in
two languages: basic
assembly and C++. In Case 1,
we will assume
that
an application is written in assembly. In
Case 2, we will assume
that
the same application is
written in C++.
Case
1: Application written in the
assembly language Assume
that the
assembly
language program required
10,000 lines of code, and
the vari-
ous
paper documents (specifications,
user documents, etc.)
totaled to 100
pages.
Assume that coding and
testing required ten months
of effort,
and
writing the paper documents
took five months of effort.
The entire
project
totaled 15 months of effort,
and so has a productivity
rate of 666
LOC
per month. At a cost of
$10,000 per staff month,
the application
cost
$150,000. Expressed in terms of
cost per source line,
the cost is $15
per
line of source code.
Case
2: The same application
written in the C++ language
Assume
that the
C++
version of the same
application required only
1000 lines of code.
The
design documents probably
were smaller as a result of
using an
object-oriented
(OO) language, but the
user documents are the
same size
as
the previous case: assume a
total of 75 pages were
produced. Assume
that
coding and testing required
one month, and document
production
took
four months. Now we have a
project where the total
effort was only
five
months, but productivity
expressed using LOC has
dropped to only
200
LOC per month. At a cost of
$10,000 per staff month,
the applica-
tion
cost $50,000 or only
one-third as much as the
assembly language
version.
The C++ version is a full
$100,000 cheaper than the
assembly
version,
so clearly the C++ version
has much better economics.
But the
cost
per source line for
this version has jumped to
$50.
Even
if we measure only coding, we
still can't see the
value of high-
level
languages by means of the
LOC metric: the coding
rates for both
the
assembly language and C++
versions were both identical
at 1000
LOC
per month, even though
the C++ version took
only one month as
opposed
to ten months for the
assembly version.
Since
both the assembly and
C++ versions were identical
in terms of
features
and functions, let us assume
that both versions were 50
func-
tion
points in size. When we
express productivity in terms of
function
points
per staff month, the
assembly version had a
productivity rate of
Project
Management and Software
Engineering
369
3.33
function points per staff
month. The C++ version
had a productivity
rate
of 10 function points per
staff month. When we turn to
costs, the
assembly
version had a cost of $3000
per function point, while
the C++
version
had a cost of $1000 per
function point. Thus,
function point met-
rics
clearly match the
assumptions of standard economics,
which define
productivity
as goods
or services produced per
unit of labor or
expense.
Lines
of code metrics, on the
other hand, do not match
the assump-
tions
of standard economics and in
fact show a reversal. Lines
of code
metrics
distort the true economic
picture by so much that
their use for
economic
studies involving more than
one programming language
might
be
classified as professional
malpractice.
Timing
of sizing by lines of code
Unless
the application being sized
is
going
to replace an existing legacy
application, this method is
pure
guesswork
until the code is written.
If code benchmarks or
historical
code
size data from similar
projects exist, this form of
sizing can be done
early,
assuming the new language is
the same as the former
language.
However,
if there is no history, sizing
using lines of code, or the
old lan-
guage
is not the same as the
new, this can't be done with
accuracy, and
it
can't be done until the
code is written, which is
far too late.
When
either
the new application or the
old application (or both)
use multiple
languages,
code counting becomes very
complicated and
difficult.
Usage
of lines of code sizing
As of
2009, at least 3 million
legacy appli-
cations
still are in use, and
another 1.5 million are
under development.
However,
of this total of about 4.5
million applications, the
author esti-
mates
that more than 4 million
use multiple programming
languages
or
use languages for which no
effective counting rules
exist. Of the
approximate
total of 500,000 applications
that use primarily a
single
language
where counting rules do
exist, no fewer than 500
program-
ming
languages have been utilized.
Essentially, code sizing is
inaccurate
and
hazardous, except for
applications that use a
single language such
as
assembler, C, dialects of C, COBOL,
Fortran, Java, and about
100
others.
In
today's world circa 2009,
sizing using LOC metrics
still occurs
in
spite of the flaws and
problems with this metric.
The Department
of
Defense and military
software are the most
frequent users of LOC
metrics.
The LOC metric is still
widely used by systems and
embedded
applications.
The older waterfall method
often employed LOC sizing,
as
does
the modern Team Software
Process (TSP) development
method.
Schedules
and costs This
form of sizing is quick and
inexpensive, assum-
ing
that automated code counting
tools are available.
However, if the
application
has more than two
programming languages, automated
code
370
Chapter
Six
counting
may not be possible. If the
application uses some modern
lan-
guage,
code counting is impossible
because there are no
counting rules for
the
buttons and pull-down menus
used to "program" in some
languages.
The
main counter indication is
that
Cautions
and counter
indications
lines
of code metrics penalize
high-level languages. Another
indication
is
that this method is
hazardous for sizing
requirements, specifications,
and
paper documents. Also,
counts of physical lines of
code may differ
from
counts of logical statements by
more than 500 percent.
Since the
software
literature and published
productivity data is ambiguous
as
to
whether logical or physical
lines are used, this
method has a huge
margin
of error.
Sizing
Using Story Point
Metrics
The
Agile development method was
created in part because of a
reaction
against
the traditional software
cost drivers shown in Table
6-6. The
Agile
pioneers felt that software
had become burdened by
excessive vol-
umes
of paper requirements and
specifications, many of which
seemed
to
have little value in
actually creating a working
application.
The
Agile approach tries to
simplify and minimize the
production of
paper
documents and to accelerate
the ability to create
working code.
The
Agile philosophy is that the
goal of software engineering is
the cre-
ation
of working applications in a
cost-effective fashion. In fact,
the goal
of
the Agile method is to
transform the traditional
software cost drivers
into
a more cost-effective sequence, as shown
in Table 6-7.
As
part of simplifying the
paper deliverables of software
applications,
a
method for gathering the
requirements for Agile
projects is that of user
stories.
These
are very concise statements
of specific requirements
that
consist
of only one or two sentences,
which are written on 3"×5"
cards
to
ensure compactness.
An
example of a basic user
story for a software
cost-estimating tool
might
be, The
estimating tool should
include currency conversion
between
dollars,
euros, and yen.
Once
created, user stories are
assigned relative weights
called story
points,
which
reflect their approximate
difficulty and complexity
compared
TABLE
6-7
Rank
Order of Agile Software Cost
Elements
1.
Programming
2.
Meetings and communication
(clients, team members,
managers)
3.
Defect removal (inspections,
static analysis, testing,
finding and fixing
bugs)
4.
Project management
5.
Producing paper documents
(plans, architecture, specifications,
user manuals)
Project
Management and Software
Engineering
371
with
other stories for the
same application. The
currency conversion
exam-
ple
just shown is quite simple
and straightforward (except
for the fact
that
currencies
fluctuate on a daily basis), so it
might be assigned a
weight
of
1 story point. Currency
conversion is a straightforward
mathematical
calculation
and also is readily
available from online
sources, so this is
not
a
difficult story or feature to
implement.
The
same cost-estimating application will of
course perform other
functions
that are much harder
and more complex than
currency con-
version.
An example of a more difficult
user story might be,
The
esti-
mating
tool will show the effects of CMMI
levels on software quality
and
productivity.
This
story is much harder to
implement than currency
conversion,
because
the effects of CMMI levels
vary with the size and
nature of the
application
being developed. For small
and simple applications,
CMMI
levels
have hardly any impact,
but for large and
complex applications,
the
higher CMMI levels have a
significant impact. Obviously,
this story
would
have a larger number of
story points than currency
conversion,
and
might be assigned a weight of 5,
meaning that it is at least
five
times
as difficult as the previous
example.
The
assignment of story point
weights for a specific
application is
jointly
worked out between the
developers and the user
representative.
Thus,
for specific applications,
there is probably a high
degree of math-
ematical
consistency between story
point levels; that is,
levels 1, 2, 3,
and
so on, probably come close to
capturing similar levels of
difficulty.
The
Agile literature tends to
emphasize that story points
are units of
size,
not units of time or effort.
However, that being said,
story points are
in
fact often used for
estimating team velocity and
even for estimating
the
overall schedules for both
sprints and even entire
applications.
However,
user stories and therefore
story points are very
flexible, and
there
is no guarantee that Agile
teams on two different
applications will
use
exactly the same basis
for assigning story point
weights.
It
may be that as the Agile
approach gains more and
more adherence
and
wider usage, general rules
for determining story point
weights will
be
created and utilized, but
this is not the case
circa 2009.
It
would be theoretically possible to
develop mathematical
conversion
rules
between story points and
other metrics such as IFPUG
function
points,
COSMIC function points, use-case
points, lines of code, and
so
forth.
However, for this to work,
story points would need to
develop
guidelines
for consistency between
applications. In other words,
quanti-
ties
such as 1 story point, 2
story points, and so on,
would have to have
the
same values wherever they
were applied.
From
looking at samples of story
points, there does not
seem to be a
strict
linear relation between user
stories and story points in
terms of
effort.
What might be a useful
approximation is to assume that
for each
372
Chapter
Six
increase
of 1 in terms of story points,
the IFPUG function points
needed
for
the story would double.
For example:
Story
Points
IFPUG
Function Points
1
2
2
4
3
8
4
16
5
32
This
method is of course hypothetical,
but it would be interesting
to
carry
out trials and experiments
and create a reliable
conversion table
between
story points and function
points.
It
would be useful if the Agile
community collected valid
historical
data
on effort, schedules, defects,
and other deliverables and
submitted
them
to benchmarking organizations such as
ISBSG. Larger volumes
of
historical data would
facilitate the use of story
points for estimating
purposes
and would also speed up the
inclusion of story points in
com-
mercial
estimating tools such as
COCOMO, KnowledgePlan,
Price-S,
SEER,
SLIM, and the like.
A
few Agile projects have
used function point metrics
in addition to
story
points. But as this book is
written in 2009, no Agile
projects have
submitted
formal benchmarks to ISBSG or to
other public
benchmark
sources.
Some Agile projects have
been analyzed by private
benchmark
organizations,
but the results are
proprietary and
confidential.
As
a result, there is no reliable
quantitative data circa 2009
that
shows
either Agile productivity or
Agile quality levels. This
is not a sign
of
professional engineering, but it is a
sign of how backwards
"software
engineering"
is compared with more mature
engineering fields.
While
Agile projects attempt an
over-
Timing
of sizing with story
points
view
of an entire application at the
start, user stories occur
continu-
ously
with every sprint throughout
development. Therefore, user
stories
are
intended primarily for the
current sprint and don't
have much to
say
about future sprints that
will occur downstream. As a result,
story
points
are hard to use for
early sizing of entire
applications, although
useful
for the current
sprint.
Agile
is a very popular method,
but it is
Usage
of story point
metrics
far
from being the only
software development method.
The author esti-
mates
that circa 2009, about
1.5 million new applications
are being
developed.
Of these perhaps 200,000 use
the Agile method and
also
use
story points. Story points
are used primarily for
small to mid-sized
IT
software applications between
about 250 and 5000
function points.
Project
Management and Software
Engineering
373
Story
points are not used
often for large applications
greater than
10,000
function points, nor are
they often used for
embedded, systems,
and
military software.
Schedules
and costs Since
story points are assigned
informally by team
consensus,
this form of sizing is quick
and inexpensive. It is possible
to
use
collections of story cards
and function points, too.
User stories could
be
used as a basis for function
point analysis. But Agile
projects tend to
stay
away from function points.
It would also be possible to
use some of
the
high-speed function point
methods with Agile projects,
but as this
book
is written, there is no data
that shows this being
done.
The
main counter indication for
story
Cautions
and counter
indications
points
is that they tend to be
unique for specific
applications. Thus, it
is
not easy to compare
benchmarks between two or
more different Agile
applications
using story points, because
there is no guarantee that
the
applications
used the same weights
for their story
points.
Another
counter indication is that
story points are useless
for com-
parisons
with applications that were
sized using function points,
use-
case
points, or any other
software metric. Story
points can only be
used
for
benchmark comparisons against
other story points, and
even here
the
results are
ambiguous.
A
third counter indication is
that there are no
large-scale collections
of
benchmark data that are
based on story points. For
some reason, the
Agile
community has been lax on
benchmarks and collecting
historical
data.
This is why it is so hard to ascertain if
Agile has better or
worse
productivity
and quality levels than
methods such as TSP,
iterative
development,
or even waterfall development.
The shortage of
quantita-
tive
data about Agile
productivity and quality is a
visible weakness of
the
Agile approach.
Sizing
Using Use-Case
Metrics
Use-cases
have been in existence since
the 1980s. They were
originally
discussed
by Ivar Jacobsen and then
became part of the unified
model-
ing
language (UML). Use-cases
are also an integral part of
the Rational
Unified
Process (RUP), and Rational
itself was acquired by IBM.
Use-
cases
have both textual and
several forms of graphical
representation.
Outside
of RUP, use-cases are often
used for object-oriented
(OO) appli-
cations.
They are sometimes used
for non-OO applications as
well.
Use-cases
describe software application
functions from the point of
view
of
a user or actor. Use-cases
can occur in several levels
of detail, including
"brief,"
"casual," and "fully
dressed," which is the most
detailed. The fully
dressed
use-cases are of sufficient
detail that they can be
used for function
point
analysis and also can be
used to create use-case
points.
374
Chapter
Six
Use-cases
include other topics besides
actors, such as
preconditions,
postconditions,
and several others. However,
these are well defined
and
fairly
consistent from application to
application.
Use-cases
and user stories have
similar viewpoints, but
use-cases
are
more formal and often
much larger than user
stories. Because of
the
age and extensive literature
about use-cases, they tend to be
more
consistent
from application to application
than user stories do.
Some
criticisms are aimed at use-cases
for not dealing with
nonfunc-
tional
requirements such as security
and quality. But this same
criti-
cism
could be aimed with equal
validity at any design
method. In any
case,
it is not difficult to append
quality, security, and other
nonfunc-
tional
design issues to use-cases.
Use-case
points are based on
calculations and logic
somewhat simi-
lar
to function point metrics in
concept but not in specific
details. The
factors
that go into use-case points
include technical and
environmen-
tal
complexity factors. Once
calculated, use-case points can be
used to
predict
effort and costs for
software development. About 20
hours of
development
work per use-case has been
reported, but the
activities
that
go into this work can
vary.
Use-case
diagrams and supporting text
can be used to calculate
func-
tion
point metrics as well as use-case
metrics. In fact, the rigor
and con-
sistency
of use-cases should allow automatic
derivation of both use-case
points
and function points.
The
use-case community tends to be resistant
to function points
and
asserts
that use-cases and function
points look at different
aspects,
which
is only partly true.
However, since both can
yield information on
work
hours per point, it is
obvious that there are
more similarities
than
the
use-case community wants to admit
to.
If
you assume that a work
month consists of 22 days at 8
hours per
day,
there are about 176
hours in a work month.
Function point
produc-
tivity
averages about 10 function
points per staff month, or
17.6 work
hours
per function point.
Assuming
that use-case productivity averages
about 8.8 use-cases
per
month, which is equivalent to 20
hours per use-case, it can be
seen
that
use-case points and IFPUG function
points yield results that
are
fairly
close together.
Other
authors and benchmark
organizations such as the
David
Consulting
Group and ISBSG have
published data on conversion
ratios
between
IFPUG function point metrics
and use-case points. While
the
other
conversion ratios are not
exactly the same as the ones
in this
chapter,
they are quite close, and
the differences are probably
due to
using
different samples.
There
may be conversion ratios
between use-case points and
COSMIC
function
points, Finnish function
points, or other function
point variants,
Project
Management and Software
Engineering
375
but
the author does not
use any of the variants
and has not
searched
their
literature.
Of
course, productivity rates
using both IFPUG function
points and
use-case
points have wide ranges,
but overall they are
not far apart.
Timing
of sizing with use-case
points Use-cases
are used to define
require-
ments
and specifications, so use-case points
can be calculated when
use-
cases
are fairly complete; that
is, toward the end of
the requirements
phase.
Unfortunately, formal estimates
are often needed before
this
time.
RUP
is a very popular method,
but it is far
Usage
of use-case points
from
being the only software
development method. The
author esti-
mates
that circa 2009, about
1.5 million new applications
are being
developed.
Of these, perhaps 75,000 use
the RUP method and also
use-
case
points. Perhaps another
90,000 projects are
object-oriented and
utilize
use-cases, but not RUP.
Use-case points are used
for both small
and
large software projects.
However, the sheer volume of
use-cases
becomes
cumbersome for large
applications.
Since
use-case points have simpler
calculations
Schedules
and costs
than
function points, this form
of sizing is somewhat quicker
than func-
tion
point analysis. Use-case
points can be calculated at a
range of
perhaps
750 per day, as opposed to
about 400 per day
for function point
analysis.
Even so, the cost
for calculating use-case points
can top $3
per
point if manual sizing is
used. Obviously, automatic
sizing would
be
a great deal cheaper and
also faster. In theory,
automatic sizing of
use-case
points could occur at rates
in excess of 5000 use-case
points
per
day.
The
main counter indication for
use-
Cautions
and counter
indications
case
points is that there are no
large collections of benchmark
data
that
use them. In other words,
use-case points cannot yet be
used for
comparisons
with industry databases such as
ISBSG, because function
point
metrics are the primary
metric for benchmark
analysis.
Another
counter indication is that use-case
points are useless for
com-
parisons
with applications that were
sized using function points,
story
points,
lines of code, or any other
software metric. Use-case
points can
only
be used for benchmark
comparisons against other use-case
points,
and
even here the results
are sparse and difficult to
find.
A
third counter indication is
that supplemental data such
as pro-
ductivity
and quality is not widely
collected for projects that
utilize
use-cases.
For some reason, both
the OO and RUP communities
have
been
lax on benchmarks and
collecting historical data.
This is why it
376
Chapter
Six
is
so hard to ascertain if RUP or OO
applications have better or
worse
productivity
and quality levels than
other methods. The shortage
of
quantitative
data about RUP productivity
and quality compared
with
other
methods such as Agile and
TSP productivity and quality
is a vis-
ible
weakness of the use-case point
approach.
Sizing
Based on IFPUG
Function
Point
Analysis
Function
point metrics were developed
by A.J. Albrecht and his
col-
leagues
at IBM in response to a directive by IBM
executives to find a
metric
that did not distort
economic productivity, as did
the older lines
of
code metric. After research
and experimentation, Albrecht
and his
colleagues
developed a metric called
"function point" that was
indepen-
dent
of code volumes.
Function
point metrics were announced
at a conference in 1978
and
put
into the public domain. In
1984, responsibility for the
counting rules
of
function point metrics was
transferred from IBM to a nonprofit
organi-
zation
called the International
Function Point Users Group
(IFPUG).
Sizing
technologies based on function
point metrics have been
pos-
sible
since this metric was
introduced in 1978. Function
point sizing is
more
reliable than sizing based
on lines of code because function
point
metrics
support all deliverable
items: paper documents,
source code,
test
cases, and even bugs or
defects. Thus, function
point sizing has
transformed
the task of sizing from a
very difficult kind of work
with a
high
error rate to one of acceptable
accuracy.
Although
the counting rules for
function points are complex
today,
the
essence of function point
analysis is derived by a weighted
formula
that
includes five
elements:
1.
Inputs
2.
Outputs
3.
Logical files
4.
Inquiries
5.
Interfaces
There
are also adjustments for
complexity. The actual rules
for count-
ing
function points are
published by the International
Function Point
Users
Group (IFPUG) and are
outside the scope of this
section.
The
function point counting
items can be quantified by
reviewing
software
requirements and specifications.
Note that conventional
paper
specifications,
use-cases, and user stories
can all be used for
function
point
analysis. The counting rules
also include complexity
adjustments.
Project
Management and Software
Engineering
377
The
exact rules for counting
function points are outside
the scope of this
book
and are not
discussed.
Now
that function points are
the most widely used
software size
metric
in the world, thousands of
projects have been measured
well
enough
to extract useful sizing
data for all major
deliverables: paper
documents
such as plans and manuals,
source code, and test
cases. Here
are
a few examples from all
three sizing domains. Table
6-8 illustrates
typical
document volumes created for
various kinds of
software.
Table
6-8 illustrates only a small
sample of the paperwork and
docu-
ment
sizing capabilities that are
starting to become commercially
avail-
able.
In fact, as of 2009, more
than 90 kinds of document
can be sized
using
function points, including
translations into other
national lan-
guages
such as Japanese, Russian,
Chinese, and so on.
Not
only can function points be
used to size paper
deliverables, but
they
can also be used to size
source code, test cases,
and even software
bugs
or defects. In fact, function
point metrics can size
the widest range
of
software deliverables of any
known metric.
For
sizing source code volumes,
data now is available on
roughly 700
languages
and dialects. There is also
embedded logic in several
com-
mercial
software estimating tools
for dealing with multiple
languages
in
the same application.
Since
the function point total of
an application is known at
least
roughly
by the end of requirements,
and in some detail by the
middle of
the
specification phase, it is now
possible to produce fairly
accurate size
estimates
for any application where
function points are
utilized. This
form
of sizing is now a standard
function for many commercial
software
estimating
tools such as COCOMO II,
KnowledgePlan, Price-S,
SEER,
SLIM,
and others.
The
usefulness of IFPUG function point
metrics has made them
the
metric
of choice for software
benchmarks. As of 2009, benchmarks
based
on
function points outnumber
all other metrics combined.
The ISBSG
TABLE
6-8
Number
of Pages Created Per Function
Point for Software
Projects
Systems
Military
Commercial
Software
MIS
Software
Software
Software
User
requirements
0.45
0.50
0.85
0.30
Functional
specifications
0.80
0.55
1.75
0.60
Logic
specifications
0.85
0.50
1.65
0.55
Test
plans
0.25
0.10
0.55
0.25
User
tutorial documents
0.30
0.15
0.50
0.85
User
reference documents
0.45
0.20
0.85
0.90
Total
document set
3.10
2.00
6.15
3.45
378
Chapter
Six
benchmark
data currently has about
5000 projects and is growing
at a
rate
of perhaps 500 projects per
year.
The
proprietary benchmarks by companies
such as QPMG, the
David
Consulting
Group, Software Productivity
Research, Galorath
Associates,
and
several others total perhaps
60,000 software projects
using function
points
and grow at a collective
rate of perhaps 1000
projects per year.
There
are no other known metrics
that even top 1000
projects.
Over
the past few years,
concerns have been raised
that software
applications
also contain "nonfunctional
requirements" such as
per-
formance,
quality, and so on. This is
true, but the significance
of these
tends
to be exaggerated.
Consider
the example of home
construction. A major factor in
the
cost
of home construction is the
size of the home, measured
in terms of
square
feet or square meters. The
square footage, the
amenities, and
the
grade of construction materials
are user requirements. But in
the
author's
state (Rhode Island), local
building codes add
significant costs
due
to nonfunctional requirements. Homes
built near a lake, river,
or
aquifer
require special hi-tech
septic systems, which cost
about $30,000
more
than standard septic
systems. Homes built within
a mile of the
Atlantic
Ocean require hurricane-proof
windows, which cost about
three
times
more than standard
windows.
These
government mandates are not
user requirements. But
they
would
not occur without a home
being constructed, so they
can be dealt
with
as subordinate cost elements.
Therefore, estimates and
measures
such
as "cost per square foot"
are derived from the
combination of func-
tional
user requirements and
government building codes that
force
mandated
nonfunctional requirements on
homeowners.
IFPUG
function points are
derived
Timing
of IFPUG function point
sizing
from
requirements and specifications,
and can be quantified by the
time
initial
requirements are complete.
However, the first formal
cost esti-
mates
usually are needed before
requirements are
complete.
While
the IFPUG method is the
most
Usage
of IFPUG function
points
widely
used form of function point
analysis, none of the
function point
methods
are used widely. Out of an
approximate total of perhaps
1.5
million
new software applications
under development circa
2009, the
author
estimates that IFPUG function
point metrics are
currently
being
used on about 5000
applications. Function point
variants, back-
firing,
and function point
approximation methods are
probably in use
on
another 2500 applications.
Due to limitations in the
function point
method
itself, IFPUG function points
are seldom used for
applications
greater
than 10,000 function points
and can't be used at all
for small
updates
less than 15 function points
in size.
Project
Management and Software
Engineering
379
Schedules
and costs This
form of sizing is neither
quick nor inexpen-
sive.
Function point analysis is so
slow and expensive that
applications
larger
than about 10,000 function
points are almost never
analyzed.
Normal
function point analysis
requires a certified function
point analy-
sis
to be performed with accuracy
(uncertified counts are
highly inaccu-
rate).
Normal function point
analysis proceeds at a rate of between
400
and
600 function points per
day. At a daily average
consulting fee of $3000,
the
cost is between $5.00 and
$7.50 for every function
point counted.
Assuming
an average cost of $6.00 per
function point
counted,
counting
a 10,000function point application
would cost $60,000.
This
explains
why normal function point
analysis is usually only
performed
for
applications in the 1000-function
point size range.
Later
in this section, various
forms of high-speed function
point
approximation
are discussed. It should be
noted that automatic
func-
tion
point counting is possible
when formal specification
methods such
as
use-cases are utilized.
Cautions
and counter indications
The
main counter indication with
func-
tion
point analysis is that it is
expensive and fairly
time-consuming.
While
small applications less than
1000 function points can be
sized
in
a few days, large systems
greater than 10,000 function
points would
require
weeks. No really large
systems greater than 100,000
function
points
have ever been sized with
function points due to the
high costs
and
the fact that the
schedule for the analysis
would take months.
Another
counter indication is that
from time to time, the
counting
rules
change. When this occurs,
historical data based on
older versions
of
the counting rules may
change or become incompatible with
newer
data.
This situation requires
conversion rules from older
to newer count-
ing
rules. If nonfunctional requirements
are indeed counted
separately
from
functional requirements, such a
change in rules would cause
sig-
nificant
discontinuities in historical benchmark
data.
Another
counter indication is that
there is a lower limit for
function point
analysis.
Small changes less than 15
function points can't be
sized due to
the
lower limits of the
adjustment factors. Individually,
these changes are
trivial,
but within large companies,
there may be thousands of
them every
year,
so their total cost can exceed
several million
dollars.
A
caution is that accurate
function point analysis
requires certified
function
point counters who have
successfully passed the
certification
examination
offered by IFPUG. Uncertified
counters should not be
used
because
the counting rules are
too complex. As with tax
regulations, the
rules
change fairly often.
Function
point analysis is accurate
and useful, but slow
and expen-
sive.
As a result, a number of high-speed
function point methods
have
been
developed and will be discussed
later in this
section.
380
Chapter
Six
Sizing
Using Function Point
Variations
The
success of IFPUG function point metrics
led to a curious
situation.
The
inventor of function point
metrics, A.J. Albrecht, was
an electri-
cal
engineer by training and
envisioned function points as a
general-
purpose
metric that could be used
for information technology
projects,
embedded
software, systems software,
and military software, and
even
games
and entertainment software.
However, the first published
results
that
used function point metrics
happened to be information
technology
applications
such as accounting and
financial software.
The
historical accident that
function point metrics were
first used for
IT
applications led some
researchers to conclude that
function points
only
worked
for IT applications. As a result, a
number of function
point
variations
have come into being, with
many of them being aimed at
sys-
tems
and embedded software. These
function point variations
include
but
are not limited
to:
1.
COSMIC function
points
2.
Engineering function
points
3.
3-D function points
4.
Full function points
5.
Feature points
6.
Finnish function
points
7.
Mark II function points
8.
Netherlands function
points
9.
Object-oriented function
points
10.
Web-object function
points
When
IFPUG function points were
initially used for systems
and
embedded
software, it was noted that
productivity rates were
lower
for
these applications. This is because
systems and embedded
software
tend
to be somewhat more complex
than IT applications and
really are
harder
to build, so productivity will be about
15 percent lower than
for
IT
applications of the same
size.
However,
rather than accepting the
fact that some embedded
and
systems
applications are tougher
than IT applications and will
there-
fore
have lower productivity
rates, many function point
variants were
developed
that increased the apparent
size of embedded and
systems
applications
so that they appear to be
about 15 percent larger
than
when
measured with IFPUG function
points.
As
mentioned earlier, it is an interesting
point to think about, but
one
of
the reasons why IT projects
seem to have higher
productivity rates
Project
Management and Software
Engineering
381
than
systems or embedded software is
that IT project historical
data
leaks
a great deal more than
historical data systems and
embedded
software.
This is because IT applications are
usually developed by a
cost
center, but systems and
embedded software are
usually developed
by
a profit center. This
leakage is enough by itself to
make IT projects
look
at least 15 percent more
productive than systems or
embedded
applications
of the same size in terms of
function points. It is perhaps
a
coincidence
that the size increases
for systems and embedded
software
predicted
by function point variants
such as COSMIC are almost
exactly
the
same as the leakage rates
from IT application historical
data.
Not
all of the function point
variants are due to a desire
to puff up the
sizes
of certain kinds of software,
but many had that
origin. As a result
now,
in 2009, the term function
point is
extremely ambiguous and
includes
many
variations. It is not possible to mix
these variants and have a
single
unified
set of benchmarks. Although
some of the results may be
similar,
mixing
the variants into the
same benchmark data
collection would be
like
mixing
yards and meters or statute
miles and nautical
miles.
The
function point variations
all claim greater accuracy
for certain
kinds
of software than IFPUG function
points, but what this
means is
that
the variations produce
larger counts than IFPUG for
systems and
embedded
software and for some
other types of software.
This is not the
same
thing as "accuracy" in an objective
sense.
In
fact, there is no totally
objective way of ascertaining
the accuracy of
either
IFPUG function points or the
variations. It is possible to
ascertain
the
differences in results between
certified and uncertified
counters, and
between
groups of counters who
calculate function points
for the same
test
case. But this is not true
accuracy: it's only the
spread of human
variation.
With
so many variations, it is now
very difficult to use any of
them for
serious
estimating and planning
work. If you happen to use
one of the vari-
ant
forms of function points,
then it is necessary to seek guidance
from the
association
or group that controls the
specific counting rules
used.
As
a matter of policy, inventors of
function point variants
should be
responsible
for creating conversion
rules between these variants
and
IFPUG
function points, which are
the oldest and original
form of func-
tional
measurement. However, with few
exceptions, there are no
really
effective
conversion rules. There are
some conversion rules
between
IFPUG
and COSMIC and also
between several other
variations such
as
the Finnish and Netherlands
functional metrics.
The
older feature point metric
was jointly developed by
A.J. Albrecht
and
the author, so it was
calibrated to produce results
that matched
IFPUG
function points in over 90
percent of cases; for the
other
10
percent, the counting rules
created more feature points
than function
points,
but the two could be
converted by mathematical
means.
382
Chapter
Six
There
are other metrics with
multiple variations such as
statute miles
and
nautical miles, Imperial
gallons and U.S. gallons, or
temperature
measured
using Fahrenheit or Celsius.
Unfortunately, the
software
industry
has managed to create more
metric variations than any
other
form
of "engineering." This is yet
another sign that software
engineering
is
not yet a true engineering
discipline, since it does
not yet know
how
to
measure results with high
precision.
Timing
of function point variant
sizing Both
IFPUG function points and
the
variations such as COSMIC
are derived from
requirements and
specifications,
and can be quantified by the
time initial requirements
are
complete.
However, the first formal
cost estimates usually are
needed
before
requirements are
complete.
The
four function point
variations
Usage
of function point
variations
that
are certified by the ISO
standards organization include
the IFPUG,
COSMIC,
Netherlands, and Finnish
methods. Because IFPUG is
much
older,
it has more users. The
COSMIC, Netherlands, and
Finnish meth-
ods
probably have between 200
and 1000 applications
currently using
them.
The older Mark II method
probably had about 2000
projects
mainly
in the United Kingdom. The
other function point
variations
have
perhaps 50 applications
each.
IFPUG,
COSMIC, and most variations
require
Schedules
and costs
about
the same amount of time.
These forms of sizing are
neither quick
nor
inexpensive. Function point
analysis of any flavor is so
slow and
expensive
that applications larger
than about 10,000 function
points
are
almost never
analyzed.
Normal
function point analysis for
all of the variations
requires a cer-
tified
function point analysis to be
performed with accuracy
(uncertified
counts
are highly inaccurate).
Normal function point
analysis proceeds
at
a rate of between 400 and
600 function points per
day. At a daily
average
consulting fee of $3000, the
cost is between $5.00 and
$7.50 for
every
function point
counted.
Assuming
an average cost of $6 per
function point counted for
the
major
variants, counting a 10,000function
point application would
cost
$60,000.
This explains why normal
function point analysis is
usually
only
performed for applications in
the 1000-function point size
range.
The
main counter indication
with
Cautions
and counter
indications
function
point analysis for all
variations is that it is expensive
and
fairly
time-consuming. While small
applications less than 1000
function
points
can be sized in a few days,
large systems greater than
10,000
function
points would require weeks.
No really large systems
greater
Project
Management and Software
Engineering
383
than
100,000 function points have
ever been sized using
either IFPUG
or
the variations such as
COSMIC due to the high
costs and the
fact
that
the schedule for the
analysis would take
months.
Another
counter indication is that
there is a lower limit for
function
point
analysis. Small changes less
than 15 function points
can't be sized
due
to the lower limits of the
adjustment factors. This is
true for all of
the
variations such as COSMIC,
Finnish, and so on.
Individually, these
changes
are trivial, but large
companies could have
thousands of them
every
year at a total cost
exceeding several million
dollars.
A
caution is that accurate
function point analysis
requires a certified
function
point counter who has
successfully passed the certification
exam-
ination
offered by the function
point association that
controls the metric.
Uncertified
counters should not be used,
because the counting rules
are
too
complex. As with tax regulations,
the rules change fairly
often.
Function
point analysis is accurate
and useful, but slow
and expen-
sive.
As a result, a number of high-speed
function point methods
have
been
developed and will be discussed
later in this
section.
High-Speed
Sizing Using
Function
Point
Approximations
The
slow speed and high costs of
normal function point
analysis were
noted
within a few years of the
initial development of function
point
metrics.
Indeed, the very first
commercial software cost-estimating
tool
that
supported function point
metrics, SPQR/20 in 1985,
supported a
method
of high-speed function point
analysis based on
approximation
rather
than actual counting.
The
term approximation
refers
to developing a count of function
points
without
having access to, or
knowledge of, every factor
that determines
function
point size when using
normal function point
analysis.
The
business goal of the
approximation methods is to achieve
func-
tion
point totals that would
come within about 15 percent
of an actual
count
by a certified counter, but
achieve that result in less
than one
day
of effort. Indeed, some of
the approximation methods
operate in
only
a minute or two. The
approximation methods are
not intended
as
a full substitute for
function point analysis, but
rather to provide
quick
estimates early in development.
This is because the initial
cost
estimate
for most projects is
demanded even before
requirements are
complete,
so there is no way to carry
out formal function point
analysis
at
that time.
There
are a number of function
point approximation methods
circa
2009,
but the ones that are
most often used
include
1.
Unadjusted function
points
2.
Function points derived from
simplified complexity
adjustments
384
Chapter
Six
3.
Function points
"light"
4.
Function points derived from
data mining of legacy
applications
5.
Function points derived from
questions about the
application
6.
Function points derived from
pattern matching (discussed
later in
this
section)
The
goal of these methods is to
improve on the average
counting speed
of
about 400 function points
per day found with normal
function point
analysis.
That being said, the
"unadjusted" function point
method seems
to
achieve rates of about 700
function points per day.
The method using
simplified
complexity factors achieves
rates of about 600 function
points
per
day. The function point
"light" method achieves
rates of perhaps 800
function
points per day.
The
function point light method
was developed by David
Herron of
the
David Consulting Group, who
is a certified function point
counter.
His
light method is based on
simplifying the standard
counting rules
and
especially the complexity
adjustments.
The
method based on data mining
of legacy applications is
technically
interesting.
It was developed by a company
called Relativity
Technologies
(now
part of Micro Focus). For
COBOL and other selected
languages, the
Relativity
function point tool extracts
hidden business rules from
source
code
and uses them as the
basis for function point
analysis.
The
technique was developed in
conjunction with certified
function
point
analysts, and the results
come within a few percentage
points of
matching
standard function point
analysis. The nominal speed of
this
approach
is perhaps 1000 function
points per minute (as
opposed to 400
per
day for normal counts).
For legacy applications,
this method can be
very
valuable for retrofitting
function points and using
them to quantify
maintenance
and enhancement work.
There
are several methods of
approximation based on
questions
about
the application. Software
Productivity Research (SPR)
and Total
Metrics
both have such tools
available. The SPR
approximation methods
are
embedded in the KnowledgePlan
estimation tool. The Total
Metrics
approximation
method is called Function
Point Outline and
deals with
some
interesting external attributes of
software applications, such
as
the
size of the requirements or
functional specifications.
As
noted earlier in this
chapter, function points
have long been
used
to
measure and predict the
size of requirements and
specifications. The
FP
Outlook approach merely
reversed the mathematics and
uses known
document
sizes to predict function
points, which is essentially
another
form
of backfiring. Of course, document
size is only one of the
questions
asked,
but the idea is to create
function point approximations
based on
easily
available information.
Project
Management and Software
Engineering
385
The
speed of the FP Outlook tool
and the other question-based
func-
tion
point approximation methods
seems to be in the range of
perhaps
4000
function points per day, as
opposed to the 400 function
points per
day
of normal function point
analysis.
The
methods based on ques-
Timing
of function point approximation
sizing
tions
about applications can be
used earlier than standard
function
points.
Function points "light" can
be used at the same time as
stan-
dard
function points; that is,
when the requirements are
known. The
data
mining approach requires
existing source code and
hence is used
primarily
for legacy applications.
However, the approximation
methods
that
use questions about software
applications can be used
very early
in
requirements: several months
prior to when standard
function point
analysis
might be carried out.
The
function point
approxima-
Usage
of function point
approximations
tion
methods vary in usage. The
Relativity method and the
Total Metrics
method
were only introduced in
2008, so usage is still
growing: perhaps
250
projects each. The older
approximation methods may
have as many
as
750 projects each.
The
main purpose of the
approximation methods
Schedules
and costs
is
to achieve faster function
point counts and lower
costs than IFPUG,
COSMIC,
or any other standard method
of function point analysis.
Their
speed
of operation ranges between
about twice that of standard
function
points
up to perhaps 20 times standard
function point analysis. The
cost
per
function point counted runs
from less than 1 cent up to
perhaps $3,
but
all are cheaper than
standard function point
analysis.
The
main counter indication with
func-
Cautions
and counter
indications
tion
point approximation is accuracy.
The Relativity method
matches
standard
IFPUG function points almost
exactly. The other
approximation
methods
only come within about 15
percent of manual counts by
certified
counters.
Of course, coming within 15 percent
three months earlier
than
normal
function points might be
counted, with a cost of perhaps
one-tenth
normal
function point analysis, are
both significant business
advantages.
Sizing
Legacy Applications
Based
on
"Backfiring" or LOC to
Function
Point
Conversion
The
concept of backfiring
is
nothing more than reversing
the direction
of
the equations used when
predicting source code size
from function
points.
The technology of backfiring or
direct conversion of LOC
data
386
Chapter
Six
into
the equivalent number of
function points was
pioneered by Allan
Albrecht,
the inventor of the function
point metric. The first
backfire
data
was collected within IBM
circa 1975 as a part of the
original devel-
opment
of function point
metrics.
The
first commercial software
estimating tool to support
backfiring
was
SPQR/20, which came out in
1985 and supported
bi-directional
sizing
for 30 languages. Today,
backfiring is a standard function
for
many
commercial software estimating
tools such as the ones
already
mentioned
earlier in this
section.
From
30 languages in 1985, the
number of languages that can
be
sized
or backfired has now grown
to more than 450 circa
2009, when
all
dialects are counted. Of
course, for the languages
where no counting
rules
exist, backfiring is not
possible. Software Productivity
Research
publishes
an annual table of conversion
ratios between logical lines
of
code
and function points, and
the current edition circa
2009 contains
almost
700 programming languages
and dialects. Similar tables
are
published
by other consulting organizations
such as Gartner Group
and
the
David Consulting
Group.
There
are far too many
programming languages to show
more than a
few
examples in this short
subsection. Note also that
the margin of error
when
backfiring is rather large.
Even so, the results
are interesting and
now
widely utilized. Following
are examples taken from
the author's
Table
of Programming Languages and
Levels, which
is updated sev-
eral
times a year by Software
Productivity Research (Jones,
1996). This
data
indicates the ranges and
median values in the number
of source
code
statements required to encode one
function point for selected
lan-
guages.
The counting rules for
source code are based on
logical state-
ments
and are defined in an
appendix of the author's
book Applied
Software
Measurement (McGraw-Hill,
2008). Table 6-9 shows
samples
of
the ratios of logical source
code statements to function
points. A full
table
for all 2,500 or so
programming languages would
not fit within
the
book.
Although
backfiring is usually not as
accurate as actually
counting
function
points, there is one special
case where backfiring is
more accu-
rate:
very small modifications to
software applications that
have fewer
than
15 function points. For
changes less than 1 function
point, backfir-
ing
is one of only two current
approaches for deriving
function points.
(The
second approach is pattern matching,
which will be discussed
later
in
this section.)
While
backfiring is widely used
and also supported by many
com-
mercial
software cost-estimating tools,
the method is something of
an
"orphan,"
because none of the function
point user groups such as
IFPUG,
COMIC,
and the like have
ever established committees to
evaluate back-
firing
or produced definitive tables of
backfire data.
Project
Management and Software
Engineering
387
TABLE
6-9
Ratios
of Logical Source Code Statements to
Function Points
for
Selected
Programming Languages
Source
Statements Per Function
Point
Language
Nominal
Level
Low
Mean
High
1st
Generation
1.00
220
320
500
Basic
assembly
1.00
200
320
450
Macro
assembly
1.50
130
213
300
C
2.50
60
128
170
BASIC
(interpreted)
2.50
70
128
165
2nd
Generation
3.00
55
107
165
FORTRAN
3.00
75
107
160
ALGOL
3.00
68
107
165
COBOL
3.00
65
107
150
CMS2
3.00
70
107
135
JOVIAL
3.00
70
107
165
PASCAL
3.50
50
91
125
3rd
Generation
4.00
45
80
125
PL/I
4.00
65
80
95
MODULA
2
4.00
70
80
90
ADA
83
4.50
60
71
80
LISP
5.00
25
64
80
FORTH
5.00
27
64
85
QUICK
BASIC
5.50
38
58
90
C++
6.00
30
53
125
Ada
9X
6.50
28
49
110
Data
base
8.00
25
40
75
Visual
Basic (Windows)
10.00
20
32
37
APL
(default value)
10.00
10
32
45
SMALLTALK
15.00
15
21
40
Generators
20.00
10
16
20
Screen
painters
20.00
8
16
30
SQL
27.00
7
12
15
Spreadsheets
50.00
3
6
9
One
potential use of backfiring
would be to convert historical
data
for
applications that used story
points or use-case points into
function
point
form. This would only
require deriving logical
code size and
then
using
published backfire
ratios.
It
would also be fairly trivial
for various kinds of code
analyzers such
as
complexity analysis tools or
static analysis tools to
include backfire
algorithms,
as could compilers for that
matter.
Even
though the function point
associations ignore backfiring,
many
benchmark
organizations such as Software
Productivity Research
(SPR),
388
Chapter
Six
the
David Consulting Group,
QPMG, Gartner Group, and so
on, do pub-
lish
tables of backfire conversion
ratios.
While
many languages in these
various tables have the
same level
from
company to company, other
languages vary widely in the
apparent
number
of source code statements
per function point based on
which
company's
table is used. This is an
awkward problem, and
coopera-
tion
among metrics consulting
groups would be useful to
the industry,
although
it will probably not
occur.
Somewhat
surprisingly, as of 2009, all of
the published data on
back-
firing
relates to standard IFPUG function
point metrics. It would
be
readily
possible to generate backfiring
rules for COSMIC
function
points,
story point, use-case points, or
any other metric, but
this does
not
seem to have happened, for
unknown reasons.
Timing
of backfire function point
sizing Since
backfiring is based on source
code,
its primary usage is for
sizing legacy applications so
that historical
maintenance
data can be expressed in
terms of function points. A
sec-
ondary
usage for backfiring is to
convert historical data
based on lines
of
code metrics into function
point data so it can be
compared against
industry
benchmarks such as those
maintained by ISBSG.
Usage
of backfire function points
The
backfire method was created
in
part
by A.J. Albrecht as a byproduct of
creating function point
met-
rics.
Therefore, backfiring has
been in continuous use since
about 1975.
Because
of the speed and ease of
backfiring, more applications
have
been
sized with this method than
almost any other. Perhaps as
many
as
100,000 software applications
have been sized via
backfiring.
Schedules
and costs If
source code size is known,
the backfiring form of
sizing
is both quick and
inexpensive. Assuming automated
code count-
ing,
rates of more than 10,000
LOC per minute can be
converted into
function
point form. This brings
the cost down to less
than 1 cent per
function
point, as opposed to about $6 per
function point for
normal
manual
function point analysis.
Backfiring does not require
a certified
counter.
Of course, the accuracy is
not very high.
Cautions
and counter indications
The
main counter indication for
back-
firing
is that it is not very
accurate. Due to variations in
program-
ming
styles, individual programmers
can vary by as much as
6-to-1 in
the
number of lines of code used
to implement the same
functionality.
Therefore,
backfiring also varies
widely. When backfiring is
used for
hundreds
of applications in the same
language, such as COBOL,
the
average
value of about 106 code
statements in the procedure
and data
Project
Management and Software
Engineering
389
division
yield reasonably accurate
function point totals. But
for lan-
guages
with few samples, the ranges
are very wide.
A
second caution is that there
are no standard methods for
counting
lines
of code. The backfire
approach was originally
developed based on
counts
of logical statements. If backfiring is
used on counts of
physical
lines,
the results might vary by
more than 500 percent
from backfiring
the
same samples using logical
statements.
Another
counter indication is that
backfiring becomes very
compli-
cated
for applications coded in two or
more languages. There are
auto-
mated
tools that can handle
backfire conversions for any
number of
languages,
but it is necessary to know
the proportions of code in
each
language
for the tools to
work.
A
final caution is that the
published rules that show
conversion ratios
between
lines of code and function
points vary based on the
source. The
published
rules by the David
Consulting Group, Gartner
Group, the
Quality
and Productivity Management
Group (QPMG), and
Software
Productivity
Research (SPR) do not show
the same ratios for
many
languages.
Since none of the function
point associations such as
IFPUG
have
ever studied backfiring, nor
have any universities, there
is no over-
all
authoritative source for
validating backfire
assumptions.
Backfiring
remains popular and widely
used, even though of
question-
able
accuracy. The reason for
its popularity is because of the
high costs
and
long schedules associated with
normal function point
analysis.
Sizing
Based on Pattern
Matching
The
other sizing methods in this
section are in the public
domain and are
available
for use as needed. But sizing based on
pattern matching has
had
a
patent application filed, so
the method is not yet
generally available.
The
pattern-matching method was
not originally created as a
sizing
method.
It was first developed to provide an
unambiguous way of
identify-
ing
applications for benchmark
purposes. After several
hundred applica-
tions
had been measured using
the taxonomy, it was noted
that applications
with
the same patterns on the
taxonomy were of the same
size.
Pattern
matching is based on the fact
that thousands of legacy
applica-
tions
have been created, and
for a significant number,
size data already
exists.
By means of a taxonomy that
captures the nature, scope,
class,
and
type of existing software
applications, a pattern
is
created that can
be
used to size new software
applications.
What
makes pattern-matching work is a
taxonomy that
captures
key
elements of software applications.
The taxonomy consists of
seven
topics:
(1) nature, (2) scope,
(3) class, (4) type,
(5) problem
complexity,
(6)
code complexity, and (7)
data complexity. Each topic
uses numeric
values
for identification.
390
Chapter
Six
In
comparing one software
project against another, it is
important to
know
exactly what kinds of
software applications are
being compared. This
is
not as easy as it sounds. The industry
lacks a standard taxonomy of
soft-
ware
projects that can be used to
identify projects in a clear
and unambigu-
ous
fashion other than the
taxonomy that is used with
this invention.
The
author has developed a
multipart taxonomy for
classifying proj-
ects
in an unambiguous fashion. The
taxonomy is copyrighted
and
explained
in several of the author's
previous books including
Estimating
Software
Costs (McGraw-Hill,
2007) and Applied
Software Measurement
(McGraw-Hill,
2008). Following is the
taxonomy:
When
the taxonomy is used for
benchmarks, four additional
factors
from
public sources are part of
the taxonomy:
Country
code
=
1
(United
States)
Region
code
=
06
(California)
City
code
=
408
(San
Jose)
NAIC
industry code
=
1569
(Telecommunications)
These
codes are from telephone
area codes, ISO codes, and
the North
American
Industry Classification (NAIC)
codes of the Department
of
Commerce.
These four codes do not
affect the size of
applications, but
provide
valuable information for
benchmarks and international
eco-
nomic
studies. This is because software
costs vary widely by
country,
geographic
region, and industry. For
historical data to be meaningful,
it
is
desirable to record all of
the factors that influence
costs.
The
portions of the taxonomy
that are used for
estimating application
size
include the following
factors:
PROJECT
NATURE: __
1.
New program development
2.
Enhancement (new functions
added to existing
software)
3.
Maintenance (defect repair to
existing software)
4.
Conversion or adaptation (migration to
new platform)
5.
Reengineering (re-implementing a legacy
application)
6.
Package modification (revising
purchased software)
PROJECT
SCOPE: __
1.
Algorithm
2.
Subroutine
3.
Module
4.
Reusable module
Project
Management and Software
Engineering
391
5.
Disposable prototype
6.
Evolutionary prototype
7.
Subprogram
8.
Stand-alone program
9.
Component of a system
10.
Release of a system (other
than the initial
release)
11.
New departmental system (initial
release)
12.
New corporate system (initial
release)
13.
New enterprise system (initial
release)
14.
New national system (initial
release)
15.
New global system (initial
release)
PROJECT
CLASS: __
1.
Personal program, for
private use
2.
Personal program, to be used by
others
3.
Academic program, developed in an
academic environment
4.
Internal program, for use at
a single location
5.
Internal program, for use at
a multiple locations
6.
Internal program, for use on
an intranet
7.
Internal program, developed by
external contractor
8.
Internal program, with functions
used via time
sharing
9.
Internal program, using
military specifications
10.
External program, to be put in
public domain
11.
External program to be placed on
the Internet
12.
External program, leased to
users
13.
External program, bundled with
hardware
14.
External program, unbundled
and marketed
commercially
15.
External program, developed
under commercial
contract
16.
External program, developed
under government
contract
17.
External program, developed
under military
contract
PROJECT
TYPE: __
1.
Nonprocedural (generated, query,
spreadsheet)
2.
Batch application
392
Chapter
Six
3.
Web application
4.
Interactive application
5.
Interactive GUI applications
program
6.
Batch database applications
program
7.
Interactive database applications
program
8.
Client/server applications
program
9.
Computer game
10.
Scientific or mathematical
program
11.
Expert system
12.
Systems or support program
including "middleware"
13.
Service-oriented architecture
(SOA)
14.
Communications or telecommunications
program
15.
Process-control program
16.
Trusted system
17.
Embedded or real-time
program
18.
Graphics, animation, or image-processing
program
19.
Multimedia program
20.
Robotics, or mechanical automation
program
21.
Artificial intelligence
program
22.
Neural net program
23.
Hybrid project (multiple
types)
PROBLEM
COMPLEXITY: ________
1.
No calculations or only simple
algorithms
2.
Majority of simple algorithms
and simple
calculations
3.
Majority of simple algorithms
plus a few of average
complexity
4.
Algorithms and calculations of
both simple and average
complexity
5.
Algorithms and calculations of
average complexity
6.
A few difficult algorithms
mixed with average and
simple
7.
More difficult algorithms
than average or
simple
8.
A large majority of difficult
and complex
algorithms
9.
Difficult algorithms and
some that are extremely
complex
10.
All algorithms and calculations
are extremely complex
Project
Management and Software
Engineering
393
CODE
COMPLEXITY: _________
1.
Most "programming" done with
buttons or pull-down
controls
2.
Simple nonprocedural code
(generated, database,
spreadsheet)
3.
Simple plus average
nonprocedural code
4.
Built with program skeletons
and reusable modules
5.
Average structure with small
modules and simple
paths
6.
Well structured, but some
complex paths or
modules
7.
Some complex modules, paths,
and links between
segments
8.
Above average complexity,
paths, and links between
segments
9.
Majority of paths and
modules are large and
complex
10.
Extremely complex structure with
difficult links and large
modules
DATA
COMPLEXITY: _________
1.
No permanent data or files
required by application
2.
Only one simple file
required, with few data
interactions
3.
One or two files, simple
data, and little
complexity
4.
Several data elements, but
simple data
relationships
5.
Multiple files and data
interactions of normal
complexity
6.
Multiple files with some
complex data elements and
interactions
7.
Multiple files, complex data
elements and data
interactions
8.
Multiple files, majority of
complex data elements and
interactions
9.
Multiple files, complex data
elements, many data
interactions
10.
Numerous complex files, data
elements, and complex
interactions
As
most commonly used for
either measurement or sizing,
users will
provide
a series of integer values to
the factors of the taxonomy,
as
follows:
PROJECT
NATURE
1
PROJECT
SCOPE
8
PROJECT
CLASS
11
PROJECT
TYPE
15
PROBLEM
COMPLEXITY
5
DATA
COMPLEXITY
6
CODE
COMPLEXITY
2
394
Chapter
Six
Although
integer values are used
for nature, scope, class,
and type, up
to
two decimal places can be
used for the three
complexity factors.
The
algorithms
will interpolate between integer
values. Thus,
permissible
values
might also be
PROJECT
NATURE
1
PROJECT
SCOPE
8
PROJECT
CLASS
11
PROJECT
TYPE
15
PROBLEM
COMPLEXITY
5.25
DATA
COMPLEXITY
6.50
CODE
COMPLEXITY
2.45
The
combination of numeric responses to
the taxonomy provides
a
unique
"pattern" that facilitates
both measurement and sizing.
The funda-
mental
basis for sizing based on
pattern matching rests on
two points:
1.
Observations have demonstrated
that software applications
that
have
identical patterns in terms of
the taxonomy are also close
to
being
identical in size expressed in
function points.
2.
The seven topics of the
taxonomy are not equal in
their impacts.
The
second key to pattern matching is
the derivation of the
relative
weights
that each factor provides in
determining application
size.
To
use the pattern-matching
approach, mathematical weights
are
applied
to each parameter. The
specific weights are defined
in the
patent
application for the method
and are therefore
proprietary and
not
included here. However, the
starting point for the
pattern-matching
approach
is the average sizes of the
software applications covered by
the
"scope"
parameter. Table 6-10
illustrates the unadjusted
average values
prior
to applying mathematical
adjustments.
As
shown in Table 6-10, an
initial starting size for a
software applica-
tion
is based on user responses to
the scope
parameter.
Each answer is
assigned
an initial starting size
value in terms of IFPUG function
points.
These
size values have been
determined by examination of
applications
already
sized using standard IFPUG
function point analysis. The
initial
size
values represent the mode of
applications or subcomponents
that
have
been measured using function
points.
The
scope parameter by itself
only provides an approximate
initial
value.
It is then necessary to adjust
this value based on the
other param-
eters
of class, type, problem
complexity, code complexity,
and data com-
plexity.
These adjustments are part
of the patent application
for sizing
based
on pattern matching.
From
time to time, new forms of
software will be developed. When
this
occurs,
the taxonomy can be expanded
to include the new
forms.
Project
Management and Software
Engineering
395
TABLE
6-10
Initial
Starting Values for Sizing
by Pattern Matching
APPLICATION
SCOPE PARAMETER
Value
Definition
Size
in Function Points
1.
Algorithm
1
2.
Subroutine
5
3.
Module
10
4.
Reusable
module
20
5.
Disposable
prototype
50
6.
Evolutionary
prototype
100
7.
Subprogram
500
8.
Stand-alone
program
1,000
9.
Component
of a system
2,500
10.
Release
of a system
5,000
11.
New
Departmental system
10,000
12.
New
Corporate system
50,000
13.
New
Enterprise system
100,000
14.
New
National system
250,000
15.
New
Global system
500,000
The
taxonomy can be used well
before an application has
started its
requirements.
Since the taxonomy contains
information that should
be
among
the very first topics
known about a future
application, it is pos-
sible
to use the taxonomy months
before requirements are
finished and
even
some time before they
begin.
It
is also possible to use the
taxonomy on legacy applications
that
have
been in existence for many
years. It is often useful to
know the
function
point totals of such
applications, but normal
counting of func-
tion
points may not be feasible
since the requirements and
specifications
are
seldom updated and may
not be available.
The
taxonomy can also be used
with commercial software, and
indeed
with
any form of software
including classified military
applications
where
there is sufficient public or
private knowledge of the
application
to
assign values to the
taxonomy tables.
The
taxonomy was originally
developed to produce size in
terms of
IFPUG
function points and also
logical source code
statements. However,
the
taxonomy could also be used
to produce size in terms of
COSMIC
function
points, use-case points, or story
points. To use the
taxonomy
with
other metrics, historical
data would need to be
analyzed.
The
sizing method based on
pattern matching can be used
for any
size
application ranging from
small updates that are
only a fraction
of
a function point up to massive
defense applications that
might top
300,000
function points. Table 6-11
illustrates the
pattern-matching
396
Chapter
Six
TABLE
6-11
Sample
of 150 Applications Sized
Using Pattern
Matching
Note
1: IFPUG rules version 4.2
are assumed.
Note
2: Code counts are based on
logical statements; not
physical lines
Lines
per
Size
in
Function
Function
Points Language
Total
Point
(IFPUG
4.2)
Level
Source Code
Application
1.
Star Wars missile
defense
352,330
3.50
32,212,992
91
2.
Oracle
310,346
4.00
24,827,712
80
3.
WWMCCS
307,328
3.50
28,098,560
91
4.
U.S. Air Traffic
control
306,324
1.50
65,349,222
213
5.
Israeli air defense
system
300,655
4.00
24,052,367
80
6.
SAP
296,764
4.00
23,741,088
80
7.
NSA Echelon
293,388
4.50
20,863,147
71
8.
North Korean border
defenses
273,961
3.50
25,047,859
91
9.
Iran's air defense
system
260,100
3.50
23,780,557
91
10.
Aegis destroyer C&C
253,088
4.00
20,247,020
80
11.
Microsoft VISTA
157,658
5.00
10,090,080
64
12.
Microsoft XP
126,788
5.00
8,114,400
64
13.
IBM MVS
104,738
3.00
11,172,000
107
14.
Microsoft Office
Professional
93,498
5.00
5,983,891
64
15.
Airline reservation
system
38,392
2.00
6,142,689
160
16.
NSA code decryption
35,897
3.00
3,829,056
107
17.
FBI Carnivore
31,111
3.00
3,318,515
107
18.
Brain/Computer interface
25,327
6.00
1,350,757
53
19.
FBI fingerprint analysis
25,075
3.00
2,674,637
107
20.
NASA space shuttle
23,153
3.50
2,116,878
91
21.
VA patient monitoring
23,109
1.50
4,929,910
213
22.
F115 avionics package
22,481
3.50
2,055,438
91
23.
Lexis-Nexis legal
analysis
22,434
3.50
2,051,113
91
24.
Russian weather
satellite
22,278
3.50
2,036,869
91
25.
Data warehouse
21,895
6.50
1,077,896
49
26.
Animated film
graphics
21,813
8.00
872,533
40
27.
NASA Hubble controls
21,632
3.50
1,977,754
91
28.
Skype
21,202
6.00
1,130,759
53
29.
Shipboard gun
controls
21,199
3.50
1,938,227
91
30.
Natural language
translation
20,350
4.50
1,447,135
71
31.
American Express
billing
20,141
4.50
1,432,238
71
32.
M1 Abrams battle tank
19,569
3.50
1,789,133
91
33.
Boeing 747 avionics
package
19,446
3.50
1,777,951
91
34.
NASA Mars rover
19,394
3.50
1,773,158
91
Project
Management and Software
Engineering
397
TABLE
6-11
Sample
of 150 Applications Sized
Using Pattern Matching
(continued)
Note
1: IFPUG rules version 4.2
are assumed.
Note
2: Code counts are based on
logical statements; not
physical lines
Lines
per
Size
in
Function
Function
Points Language
Total
Point
(IFPUG
4.2)
Level
Source Code
Application
35.
Travelocity
19,383
8.00
775,306
40
36.
Apple iPhone
19,366
12.00
516,432
27
37.
Nuclear reactor
controls
19,084
2.50
2,442,747
128
38.
IRS income tax
analysis
19,013
4.50
1,352,068
71
39.
Cruise ship
navigation
18,896
4.50
1,343,713
71
40.
MRI medical imaging
18,785
4.50
1,335,837
71
41.
Google search engine
18,640
5.00
1,192,958
64
42.
Amazon web site
18,080
12.00
482,126
27
43.
Order entry system
18,052
3.50
1,650,505
91
44.
Apple Leopard
17,884
12.00
476,898
27
45.
Linux
17,505
8.00
700,205
40
46.
Oil refinery process control
17,471
3.50
1,597,378
91
47.
Corporate cost
accounting
17,378
3.50
1,588,804
91
48.
FedEx shipping
controls
17,378
6.00
926,802
53
49.
Tomahawk cruise
missile
17,311
3.50
1,582,694
91
50.
Oil refinery process control
17,203
3.00
1,834,936
107
51.
ITT System 12 telecom
17,002
3.50
1,554,497
91
52.
Ask search engine
16,895
6.00
901,060
53
53.
Denver Airport
luggage
16,661
4.00
1,332,869
80
54.
ADP payroll application
16,390
3.50
1,498,554
91
55.
Inventory management
16,239
3.50
1,484,683
91
56.
eBay transaction
controls
16,233
7.00
742,072
46
57.
Patriot missile
controls
15,392
3.50
1,407,279
91
58.
Second Life web site
14,956
12.00
398,828
27
59.
IBM IMS database
14,912
1.50
3,181,283
213
60.
America Online (AOL)
14,761
5.00
944,713
64
61.
Toyota robotic mfg.
14,019
6.50
690,152
49
62.
Statewide child
support
13,823
6.00
737,226
53
63.
Vonage VOIP
13,811
6.50
679,939
49
64.
Quicken 2006
11,339
6.00
604,761
53
65.
ITMPI web site
11,033
14.00
252,191
23
66.
Motor vehicle
registrations
10,927
3.50
999,065
91
67.
Insurance claims
handling
10,491
4.50
745,995
71
68.
SAS statistical
package
10,380
6.50
511,017
49
69.
Oracle CRM features
6,386
4.00
510,878
80
(Continued)
398
Chapter
Six
TABLE
6-11
Sample
of 150 Applications Sized
Using Pattern Matching
(continued)
Note
1: IFPUG rules version 4.2
are assumed.
Note
2: Code counts are based on
logical statements; not
physical lines
Lines
per
Size
in
Function
Function
Points Language
Total
Point
(IFPUG
4.2)
Level
Source Code
Application
70.
DNA analysis
6,213
9.00
220,918
36
71.
Enterprise JavaBeans
5,877
6.00
313,434
53
72.
Software renovation
tool
suite
5,170
6.00
275,750
53
73.
Patent data mining
4,751
6.00
253,400
53
74.
EZ Pass vehicle
controls
4,571
4.50
325,065
71
75.
U.S. patent
applications
4,429
3.50
404,914
91
76.
Chinese submarine
sonar
4,017
3.50
367,224
91
77.
Microsoft Excel 2007
3,969
5.00
254,006
64
78.
Citizens bank online
3,917
6.00
208,927
53
79.
MapQuest
3,793
8.00
151,709
40
80.
Bank ATM controls
3,625
6.50
178,484
49
81.
NVIDIA graphics card
3,573
2.00
571,637
160
82.
Lasik surgery (wave
guide)
3,505
3.00
373,832
107
83.
Sun D-Trace utility
3,309
6.00
176,501
53
84.
Microsoft Outlook
3,200
5.00
204,792
64
85.
Microsoft Word 2007
2,987
5.00
191,152
64
86.
Artemis Views
2,507
4.50
178,250
71
87.
ChessMaster 2007 game
2,227
6.50
109,647
49
88.
Adobe Illustrator
2,151
4.50
152,942
71
89.
SpySweeper antispyware
2,108
3.50
192,757
91
90.
Norton antivirus
software
2,068
6.00
110,300
53
91.
Microsoft Project
2007
1,963
5.00
125,631
64
92.
Microsoft Visual
Basic
1,900
5.00
121,631
64
93.
Windows Mobile
1,858
5.00
118,900
64
94.
SPR KnowledgePlan
1,785
4.50
126,963
71
95.
All-in-one printer
1,780
2.50
227,893
128
96.
AutoCAD
1,768
4.00
141,405
80
97.
Software code
restructuring
1,658
4.00
132,670
80
98.
Intel Math function
library
1,627
9.00
57,842
36
99.
Sony PlayStation game
controls
1,622
6.00
86,502
53
100.
PBX switching system
1,592
3.50
145,517
91
101.
SPR Checkpoint
1,579
3.50
144,403
91
102.
Microsoft Links golf
game
1,564
6.00
83,393
53
103.
GPS navigation system
1,518
8.00
60,730
40
Project
Management and Software
Engineering
399
TABLE
6-11
Sample
of 150 Applications Sized
Using Pattern Matching
(continued)
Note
1: IFPUG rules version 4.2
are assumed.
Note
2: Code counts are based on
logical statements; not
physical lines
Lines
per
Size
in
Function
Function
Points Language
Total
Point
(IFPUG
4.2)
Level
Source Code
Application
104.
Motorola cell phone
1,507
6.00
80,347
53
105.
Seismic analysis
1,492
3.50
136,438
91
106.
PRICE-S
1,486
4.50
105,642
71
107.
Sidewinder missile
controls
1,450
3.50
132,564
91
108.
Apple iPod
1,408
10.00
45,054
32
109.
Property tax
assessments
1,379
4.50
98,037
71
110.
SLIM
1,355
4.50
96,342
71
111.
Microsoft DOS
1,344
1.50
286,709
213
112.
Mozilla Firefox
1,340
6.00
71,463
53
113.
CAI APO (original
estimate)
1,332
8.00
53,288
40
114.
Palm OS
1,310
3.50
119,772
91
115.
Google Gmail
1,306
8.00
52,232
40
116.
Digital camera
controls
1,285
5.00
82,243
64
117.
IRA account management
1,281
4.50
91,096
71
118.
Consumer credit
report
1,267
6.00
67,595
53
119.
Laser printer driver
1,248
2.50
159,695
128
120.
Software complexity
analyzer
1,202
4.50
85,505
71
121.
JAVA compiler
1,185
6.00
63,186
53
122.
COCOMO II
1,178
4.50
83,776
71
123.
Smart bomb targeting
1,154
5.00
73,864
64
124.
Wikipedia
1,142
12.00
30,448
27
125.
Music synthesizer
1,134
4.00
90,736
80
126.
Configuration control
1,093
4.50
77,705
71
127.
Toyota Prius engine
1,092
3.50
99,867
91
128.
Cochlear implant
(internal)
1,041
3.50
95,146
91
129.
Nintendo Game Boy DS
1,002
6.00
53,455
53
130.
Casio atomic watch
993
5.00
63,551
64
131.
Football bowl
selection
992
6.00
52,904
53
132.
COCOMO I
883
4.50
62,794
71
133.
APAR analysis and
routing
866
3.50
79,197
91
134.
Computer BIOS
857
1.00
274,243
320
135.
Automobile fuel
injection
842
2.00
134,661
160
136.
Antilock brake
controls
826
2.00
132,144
160
137.
Quick Sizer
Commercial
794
6.00
42,326
53
(Continued)
400
Chapter
Six
TABLE
6-11
Sample
of 150 Applications Sized
Using Pattern Matching
(continued)
Note
1: IFPUG rules version 4.2
are assumed.
Note
2: Code counts are based on
logical statements; not
physical lines
Lines
per
Size
in
Function
Function
Points Language
Total
Point
(IFPUG
4.2)
Level
Source Code
Application
138.
CAI APO (revised
estimate)
761
8.00
30,450
40
139.
LogiTech cordless
mouse
736
6.00
39,267
53
140.
Function point
workbench
714
4.50
50,800
71
141.
SPR SPQR/20
699
4.50
49,735
71
142.
Instant messaging
687
5.00
43,944
64
143.
Golf handicap
analyzer
662
8.00
26,470
40
144.
Denial of service
virus
138
2.50
17,612
128
145.
Quick Sizer prototype
30
20.00
480
16
146.
ILOVEYOU computer
worm
22
2.50
2,838
128
147.
Keystroke logger
virus
15
2.50
1,886
128
148.
MYDOOM computer virus
8
2.50
1,045
128
149.
APAR bug report
3.85
3.50
352
91
150.
Screen format change
0.87
4.50
62
71
AVERAGE
33,269
4.95
2,152,766
65
sizing
method for a sample of 150
software applications. Each
applica-
tion
was sized in less than one
minute.
Because
the pattern-matching approach is
experimental and being
cali-
brated,
the information shown in
Table 6-11 is provisional
and subject to
change.
The data should not be
used for any serious
business purpose.
Note
that the column labeled
"language level" refers to a
mathemati-
cal
rule that was developed in
the 1970s in IBM. The
original definition
of
"level" was the number of
statements in a basic assembly
language
that
would be needed to provide
the same function as one
statement in
a
higher-level language. Using
this rule, COBOL is a "level
3" language
because
three assembly statements
would be needed to provide the
func-
tions
of 1 COBOL statement. Using
the same rule, Smalltalk
would be
a
level 18 language, while
Java would be a level 6
language.
When
function point metrics were
developed in IBM circa 1975,
the
existing
rules for language level
were extended to include the
number
of
logical source code
statements per function
point.
For
both backfiring and
predicting source code size
using pattern
matching,
language levels are a
required parameter. However,
there is
Project
Management and Software
Engineering
401
published
data with language levels
for about 700 programming
lan-
guages
and dialects.
Timing
of pattern-matching sizing Because
the taxonomy used for
pat-
tern
matching is generic, it can be
used even before
requirements are
fully
known. In fact, pattern
matching is the sizing
method that can be
applied
the earliest in software
development: long before
normal func-
tion
point analysis, story
points, use-case points, or any
other known
metric.
It is the only method that
can be used before
requirements
analysis
begins, and hence provide a
useful size approximation
before
any
money is committed to a software
project.
Because
the pattern matching
approach is
Usage
of pattern matching
covered
by a patent application and
still experimental, usage as of
2009
has
been limited to about 250
trial software
applications.
It
should be noted that because
pattern matching is based on an
exter-
nal
taxonomy rather than on
specific requirements, the
pattern-match-
ing
approach can be used to size
applications that are
impossible to size
using
any other method. For
example, it is possible to size
classified mili-
tary
software being developed by
other countries such as Iran
and North
Korea,
neither of whom would
provide such information
knowingly.
The
pattern-matching approach is embodied in
a
Schedules
and costs
prototype
sizing tool that can
predict application size at
rates in excess
of
300,000 function points per
minute. This makes pattern
matching
the
fastest and cheapest sizing
method yet developed. The
method is so
fast
and so easy to perform that
several size estimates can
easily be per-
formed
using best-case, expected-case,
and worst-case
assumptions.
Even
without the automated
prototype, the pattern-matching
calcu-
lations
can be performed using a
pocket calculator or even by
hand in
perhaps
2 minutes per
application.
Cautions
and counter indications
The
main counter indication for
pattern
matching
is that it is still experimental
and being calibrated.
Therefore,
results
may change
unexpectedly.
Another
caution is that the accuracy
of pattern matching needs to
be
examined
with a large sample of historical
projects that have
standard
function
point counts.
Sizing
Software Requirements
Changes
Thus
far, all of the sizing
methods discussed have
produced size esti-
mates
that are valid only
for a single moment.
Observations of software
projects
indicate that requirements
grow and change at rates of
between
402
Chapter
Six
1
percent and more than 2
percent every calendar month
during the
design
and coding phases.
Therefore,
if the initial size estimate
at the end of the
requirements
phase
is 1000 function points,
then this total might
grow by 6 percent or
60
function points during the
design phase and by 8
percent or 80 func-
tion
points during the coding
phase. When finally
released, the
original
1000
function points will have
grown to 1140.
Because
growth in requirements is related to
calendar schedules,
really
large applications in the
10,000-function point range or
higher can
top
35 percent or even 50 percent in
total growth. Obviously,
this much
growth
will have a significant impact on
both schedules and
costs.
Some
software cost-estimating tools
such as KnowledgePlan
include
algorithms
that predict growth rates in
requirements and allow
users
to
either accept or reject the
predictions. Users can also
include their
own
growth predictions.
There
are two flavors of
requirements change:
Requirements
creep These
are changes to requirements
that cause func-
tion
point totals to increase and
that also cause more
source code to be
written.
Such changes should be sized
and of course if they are
signifi-
cant,
they should be included in
revised cost and schedule
estimates.
These
are changes that do not
add to the function
Requirements
churn
point
size total of the
application, but which may
cause code to be
writ-
ten.
An example of churn might be
changing the format or
appearance
of
an input screen, but not
adding any new queries or
data elements. An
analogy
from home construction might
be replacing existing
windows
with
hurricane-proof windows that fit
the same openings. There is
no
increase
in the square footage or
size of the house, but
there will be
effort
and costs.
Software
application size is never
stable and continues to change
during
development
and also after release.
Therefore, sizing methods
need to be
able
to deal with changes and growth in
requirements, and these
require-
ments
changes will also cause growth in
source code volumes.
Requirements
creep has a more significant
impact than just
growth
itself.
As it happens, because changing
requirements tend to be
rushed,
they
have higher defect
potentials than the original
requirements. They
also
tend be harder to find and
eliminate bugs, because if the
changes
are
late, inspections may be
skipped and testing will be
less thorough.
As
a result, creeping requirements on
large software projects tend
to
be
the source of many more
defects that get delivered
than the original
requirements.
For large systems in the
10,000-function point
range,
almost
50 percent of the delivered
defects can be attributed to
require-
ments
changes during
development.
Project
Management and Software
Engineering
403
Software
Progress and
Problem
Tracking
From
working as an expert witness in a
number of software
lawsuits,
the
author noted a chronic
software project management
problem. Many
projects
that failed or had serious
delays in schedules or quality
prob-
lems
did not identify any
problems during development by
means of
normal
progress reports.
From
depositions and discovery,
both software engineers and
first-line
project
managers knew about the
problems, but the
information was
not
included in status reports to
clients and senior
management when
the
problems were first noticed.
Not until very late, usually
too late
to
recover, did higher
management or clients become
aware of serious
delays,
quality problems, or other
significant issues.
When
asked why the information
was concealed, the primary
reason
was
that the lower managers
did not want to look
bad to executives. Of
course,
when the problems finally
surfaced, the lower managers
looked
very
bad, indeed.
By
contrast, projects that are
successful always deal with
problems
in
a more rational fashion.
They identify the problems
early, assemble
task
groups to solve them, and
usually bring them under
control before
they
become so serious that they
cannot be fixed. One of the
interesting
features
of the Agile method is that
problems are discussed on a
daily
basis.
The same is true for
the Team Software Process
(TSP).
Software
problems are somewhat like
serious medical problems.
They
usually
don't go away by themselves,
and many require treatment
by
professionals
in order to eliminate
them.
Once
a software project is under
way, there are no fixed
and reli-
able
guidelines for judging its
rate of progress. The
civilian software
industry
has long utilized ad hoc
milestones such as completion
of
design
or completion of coding. However,
these milestones are
notori-
ously
unreliable.
Tracking
software projects requires
dealing with two separate
issues:
(1) achieving specific and
tangible milestones, and (2)
expend-
ing
resources and funds within
specific budgeted
amounts.
Because
software milestones and
costs are affected by
requirements
changes
and "scope creep," it is important to
measure the increase
in
size
of requirements changes, when
they affect function point
totals.
However,
as mentioned in a previous section in
this chapter, some
requirements
changes do not affect
function point totals, which
are
termed
requirements
churn. Both
creep and churn occur at
random
intervals.
Churn is harder to measure
than creep and is often
measured
via
"backfiring" or mathematical conversion
between source code
state-
ments
and function point
metrics.
404
Chapter
Six
As
of 2009, automated tools are
available that can assist
project man-
agers
in recording the kinds of
vital information needed for
milestone
reports.
These tools can record
schedules, resources, size
changes, and
also
issues or problems.
For
an industry now more than 60
years of age, it is somewhat
sur-
prising
that there is no general or
universal set of project
milestones for
indicating
tangible progress. From the
author's assessment and
bench-
mark
studies, following are some
representative milestones that
have
shown
practical value.
Note
that these milestones assume
an explicit and formal
review
connected
with the construction of every
major software
deliverable.
Table
6-12 shows representative
tracking milestones for
large soft-
ware
projects. Formal reviews and
inspections have the highest
defect
removal
efficiency levels of any
known kind of quality
control activity,
and
are characteristic of "best in
class" organizations.
The
most important aspect of
Table 6-12 is that every
milestone is
based
on completing a review, inspection, or
test. Just finishing up
a
document
or writing code should not
be considered a milestone
unless
the
deliverables have been
reviewed, inspected, or
tested.
TABLE
6-12
Representative
Tracking Milestones for Large Software
Projects
1.
Requirements
document completed
2.
Requirements
document review
completed
3.
Initial
cost estimate
completed
4.
Initial
cost estimate review
completed
5.
Development
plan completed
6.
Development
plan review completed
7.
Cost
tracking system
initialized
8.
Defect
tracking system
initialized
9.
Prototype
completed
10.
Prototype
review completed
11.
Complexity
analysis of base system (for
enhancement projects)
12.
Code
restructuring of base system
(for enhancement
projects)
13.
Functional
specification completed
14.
Functional
specification review
completed
15.
Data
specification completed
16.
Data
specification review
completed
17.
Logic
specification completed
18.
Logic
specification review
completed
19.
Quality
control plan
completed
Project
Management and Software
Engineering
405
TABLE
6-12
Representative
Tracking Milestones for Large Software
Projects
(continued)
20.
Quality
control plan review
completed
21.
Change
control plan
completed
22.
Change
control plan review
completed
23.
Security
plan completed
24.
Security
plan review completed
25.
User
information plan
completed
26.
User
information plan review
completed
27.
Code
for specific modules
completed
28.
Code
inspection for specific modules
completed
29.
Code
for specific modules unit
tested
30.
Test
plan completed
31.
Test
plan review completed
32.
Test
cases for specific test
stage completed
33.
Test
case inspection for specific
test stage completed
34.
Test
stage completed
35.
Test
stage review
completed
36.
Integration
for specific build
completed
37.
Integration
review for specific build
completed
38.
User
information completed
39.
User
information review
completed
40.
Quality
assurance sign off
completed
41.
Delivery
to beta test clients
completed
42.
Delivery
to clients completed
In
the litigation where the
author worked as an expert
witness, these
criteria
were not met. Milestones
were very informal and
consisted
primarily
of calendar dates, without
any validation of the
materials
themselves.
Also,
the format and structure of
the milestone reports were
inad-
equate.
At the top of every
milestone report, problems
and issues or "red
flag"
items should be highlighted
and discussed first.
During
depositions and reviews of
court documents, it was
noted that
software
engineering personnel and
many managers were aware of
the
problems
that later triggered the
delays, cost overruns,
quality prob-
lems,
and litigation. At the
lowest levels, these
problems were often
included
in weekly status reports or
discussed at daily team
meetings.
But
for the higher-level
milestone and tracking
reports that reached
clients
and executives, the
hazardous issues were either
omitted or
glossed
over.
406
Chapter
Six
A
suggested format for monthly
progress tracking reports
delivered
to
clients and higher
management would include
these sections:
Suggested
Format for Monthly Status
Reports for Software
Projects
1.
New "red flag" problems
noted this month
2.
Status of last month's "red
flag" problems
3.
Discussion of "red flag"
items more than one month in
duration
4.
Change requests processed
this month versus change
requests
predicted
5.
Change requests predicted
for next month
6.
Size in function points for
this month's change
requests
7.
Size in function points
predicted for next month's
change
requests
8.
Change requests that do not
affect size in function
points
9.
Schedule impacts of this
month's change
requests
10.
Cost impacts of this month's
change requests
11.
Quality impacts of this
month's change
requests
12.
Defects found this month
versus defects
predicted
13.
Defect severity levels of
defects found this
month
14.
Defect origins (requirements,
design, code, etc.)
15.
Defects predicted for next
month
16.
Costs expended this month
versus costs
predicted
17.
Costs predicted for next
month
18.
Earned value for this
month's deliverable (if
earned value is used)
19.
Deliverables completed this
month versus deliverables
predicted
20.
Deliverables predicted for
next month
Although
the suggested format
somewhat resembles the items
calcu-
lated
using the earned value
method, this format deals
explicitly with
the
impact of change requests
and also uses function
point metrics for
expressing
costs and quality
data.
An
interesting question is the
frequency with which milestone
prog-
ress
should be reported. The most
common reporting frequency
is
monthly,
although an exception report
can be filed at any time it
is
suspected
that something has occurred
that can cause
perturbations.
For
example, serious illness of
key project personnel or
resignation of
key
personnel might very well
affect project milestone
completions, and
this
kind of situation cannot be
anticipated.
Project
Management and Software
Engineering
407
It
might be thought that
monthly reports are too
far apart for
small
projects
that only last six or
fewer months in total. For
small projects,
weekly
reports might be preferred.
However, small projects
usually do
not
get into serious trouble
with cost and schedule
overruns, whereas
large
projects almost always get
in trouble with cost and
schedule
overruns.
This article concentrates on
the issues associated with
large
projects.
In the litigation where the
author has been an expert
wit-
ness,
every project under
litigation except one was
larger than 10,000
function
points.
The
simultaneous deployment of software
sizing tools,
estimating
tools,
planning tools, and
methodology management tools
can pro-
vide
fairly unambiguous points in
the development cycle that
allow
progress
to be judged more or less
effectively. For example,
software
sizing
technology can now predict
the sizes of both
specifications and
the
volume of source code
needed. Defect estimating
tools can predict
the
numbers of bugs or errors
that might be encountered
and discov-
ered.
Although such milestones are
not perfect, they are
better than the
former
approaches.
Project
management is responsible for
establishing milestones,
moni-
toring
their completion, and
reporting truthfully on whether
the mile-
stones
were successfully completed or
encountered problems.
When
serious
problems are encountered, it is
necessary to correct the
problems
before
reporting that the milestone
has been completed.
Failing
or delayed projects usually
lack serious milestone
tracking.
Activities
are often reported as
finished while work was
still ongoing.
Milestones
on failing projects are
usually dates on a calendar
rather
than
completion and review of
actual deliverables.
Delivering
documents or code segments
that are incomplete,
contain
errors,
and cannot support
downstream development work is
not the
way
milestones are used by
industry leaders.
Another
aspect of milestone tracking
among industry leaders is
what
happens
when problems are reported
or delays occur. The
reaction
is
strong and immediate:
corrective actions are
planned, task forces
assigned,
and correction begins. Among
laggards, on the other
hand,
problem
reports may be ignored, and
very seldom do corrective
actions
occur.
In
more than a dozen legal
cases involving projects
that failed or were
never
able to operate successfully,
project tracking was
inadequate in
every
case. Problems were either
ignored or brushed aside,
rather than
being
addressed and solved.
Because
milestone tracking occurs
throughout software
development,
it
is the last line of defense
against project failures and
delays. Milestones
should
be established formally and
should be based on reviews,
inspec-
tions,
and tests of deliverables.
Milestones should not be the
dates that
408
Chapter
Six
deliverables
more or less were finished.
Milestones should reflect
the
dates
that finished deliverables
were validated by means of
inspections,
testing,
and quality assurance
review.
An
interesting form of project
tracking has been developed
by the
Shoulders
Corp for keeping track of
object-oriented projects. This
method
uses
a 3-D model of software
objects and classes using
Styrofoam balls
of
various sizes that are
connected by dowels to create a
kind of mobile.
The
overall structure is kept in a
location viewable by as many
team
members
as possible. The mobile
makes the status instantly
visible to
all
viewers. Color-coded ribbons
indicate status of each
component, with
different
colors indicated design
complete, code complete,
documenta-
tion
complete, and testing
complete (gold). There are
also ribbons for
possible
problems or delays. This
method provides almost
instantaneous
visibility
of overall project status.
The same method has been
automated
using
a 3-D modeling package, but
the physical structures are
easier
to
see and have proven
more useful on actual
projects. The
Shoulders
Corporation
method condenses a great
deal of important
information
into
a single visual representation
that nontechnical staff can
readily
understand.
A
combination of daily status
meetings that center on
problems and
possible
delays are very useful.
When formal written reports
are submit-
ted
to higher managers or clients,
the data should be
quantified. In addi-
tion,
possible problems that might
cause delays or quality
issues should
be
the very first topics in
the report because they are
more important
than
any other topics that
are included.
Software
Benchmarking
As
this book is being written
in early 2009, a new draft
standard on per-
formance
benchmarks is being circulated
for review by the
International
Standards
Organization (ISO). The
current draft is not yet
approved.
The
current draft deals with
concepts and definitions,
and will be fol-
lowed
by additional standards later.
Readers should check with
the ISO
organization
for additional
information.
One
of the main business uses of
software measurement and
metric
data
is that of benchmarking,
or
comparing the performance of a
com-
pany
against similar companies
within the same industry, or
related
industries.
(The same kind of data
can also be used as a
"baseline" for
measuring
process improvements.)
The
term benchmark
is
far older than the
computing and
software
professions.
It seemed to have its origin
in carpentry as a mark of
stan-
dard
length on workbenches. The
term soon spread to other
domains.
Another
early definition of benchmark
was in surveying, where it
indi-
cated
a metal plate inscribed with
the exact longitude,
latitude, and
Project
Management and Software
Engineering
409
altitude
of a particular point. Also
from the surveying domain
comes
the
term baseline
which
originally defined a horizontal
line measured
with
high precision to allow it to be
used for triangulation of
heights
and
distances.
When
the computing industry
began, the term benchmark
was origi-
nally
used to define various
performance criteria for
processor speeds,
disk
and tape drive speeds,
printing speeds, and the
like. This definition
is
still in use, and indeed a
host of new and specialized
benchmarks has
been
created in recent years for
new kinds of devices such as
CD-ROM
drives,
multisynch monitors, graphics
accelerators, solid-state
flash
disks,
high-speed modems, and the
like.
As
a term for measuring the
relative performance of
organizations
in
the computing and software
domains, the term benchmark
was first
applied
to data centers in the
1960s. This was a time
when computers
were
entering the mainstream of
business operations, and
data centers
were
proliferating in number and
growing in size and
complexity. This
usage
is still common for judging
the relative efficiencies of
data center
operations.
Benchmark
data has a number of uses
and a number of ways of
being
gathered
and analyzed. The most
common and significant ways
of gath-
ering
benchmark data are these
five:
1.
Internal
collection for internal
benchmarks This
form is data
gathered
for internal use within a
company or government unit
by
its own employees. In the
United States, the author
estimates
that
about 15,000 software
projects have been gathered
using this
method,
primarily by large and
sophisticated corporations such
as
AT&T,
IBM, EDS, Microsoft, and the
like. This internal
benchmark
data
is proprietary and is seldom
made available to other
organiza-
tions.
The accuracy of internal
benchmark data varies
widely. For
some
sophisticated companies such as IBM,
internal data is very
accurate.
For other companies, the
accuracy may be
marginal.
2.
Consultant
collection for internal
benchmarks The
second
form
is that of data gathered for
internal use within a
company
or
government unit by outside
benchmark consultants. The
author
estimates
that about 20,000 software
projects have been
gathered
using
this method, since benchmark
consultants are fairly
numer-
ous.
This data is proprietary, with
the exception that results
may
be
included in statistical studies
without identifying the
sources
of
the data. Outside
consultants are used because
benchmarks are
technically
complicated to do well, and
specialists generally
outper-
form
untrained managers and
software engineers. Also,
the extensive
experience
of benchmark consultants helps in
eliminating leakage
and
in finding other
problems.
410
Chapter
Six
3.
Internal
collection for public or
ISBSG benchmarks This
form
is data gathered for
submission to an external nonprofit
bench-
mark
organization such as the
International Software
Benchmarking
Standards
Group (ISBSG) by a company's
own employees. The
author
estimates
that in the United States
perhaps 3000 such projects
have
been
submitted to the ISBSG. This
data is readily available
and
can
be commercially purchased by companies
and individuals. The
data
submitted to ISBSG is also
made available via
monographs
and
reports on topics such as
estimating, the effectiveness of
vari-
ous
development methods, and
similar topics. The
questionnaires
for
such benchmarks are provided
to clients by the ISBSG,
together
with
instructions on how to collect
the data. This method of
gather-
ing
data is inexpensive, but may
have variability from
company to
company
since answers may not be
consistent from one company
to
another.
4.
Consultant
collection for proprietary
benchmarks This
form
consists of data gathered
for submission to an external
for-
profit
benchmark organization such as
Gartner Group, the
David
Consulting
Group, Galorath Associates,
the Quality and
Productivity
Management
Group, Software Productivity
Research (SPR), and
others
by consultants who work for
the benchmark
organizations.
Such
benchmark data is gathered
via on-site interviews. The
author
estimates
that perhaps 60,000 projects
have been gathered by
the
for-profit
consulting organizations. This
data is proprietary, with
the
exception
of statistical studies that
don't identify data sources.
For
example,
this book and the
author's previous book,
Applied
Software
Measurement,
utilize
corporate benchmarks gathered by
the author
and
his colleagues under
contract. However, the names
of the clients
and
projects are not mentioned
due to nondisclosure
agreements.
5.
Academic
benchmarks This
form is data gathered for
academic
purposes
by students or faculty of a university.
The author estimates
that
perhaps 2000 projects have
been gathered using this
method.
Academic
data may be used in PhD or
other theses, or it may be
used
for
various university research
projects. Some of the
academic data
will
probably be published in journals or
book form.
Occasionally,
such
data may be made available
commercially. Academic data
is
usually
gathered via questionnaires
distributed by e-mail,
together
with
instructions for filling
them out.
When
all of these benchmark
sources are summed, the
total is about
100,000
projects. Considering that at
least 3 million legacy
applications
exist
and another 1.5 million
new projects are probably in
development,
the
sum total of all software
benchmarks is only about 2
percent of
software
projects.
Project
Management and Software
Engineering
411
When
the focus narrows to
benchmark data that is
available to the
general
public through nonprofit or
commercial sources, the U.S.
total is
only
about 3000 projects, which
is only about 0.07 percent.
This is far too
small
a sample to be statistically valid
for the huge variety of
software
classes,
types, and sizes created in
the United States. The
author sug-
gests
that public benchmarks from
nonprofit sources such as
the ISBSG
should
expand up to at least 2 percent or
about 30,000 new projects
out
of
1.5 million or so in development. It
would also be useful to have
at
least
a 1 percent sample of legacy
applications available to the
public,
or
another 30,000
projects.
A
significant issue with current
benchmark data to date is
the unequal
distribution
of project sizes. The bulk
of all software benchmarks
are for
projects
between about 250 and
2500 function points. There
is very little
benchmark
data for applications larger
than 10,000 function
points,
even
though these are the
most expensive and
troublesome kinds of
applications.
There is almost no benchmark
data available for
small
maintenance
projects below 15 function
points in size, even though
such
projects
outnumber all other sizes
put together.
Another
issue with benchmark data is
the unequal distribution
by
project
types. Benchmarks for IT
projects comprise about 65
percent
of
all benchmarks to date.
Systems and embedded
software comprise
about
15 percent, commercial software
about 10 percent, and
military
software
comprises about 5 percent.
(Since the Department of
Defense
and
the military services own
more software than any
other organiza-
tions
on the planet, the lack of
military benchmarks is probably
due to
the
fact that many military
projects are classified.)
The remaining 5
percent
includes games, entertainment,
iPhone and iPod
applications,
and
miscellaneous applications such as
tools.
Categories
of Software Benchmarks
There
are a surprisingly large
number of kinds of software
benchmarks,
and
they use different metrics,
different methods, and are
aimed at dif-
ferent
aspects of software as a business
endeavor.
Benchmarks
are primarily collections of
quantitative data that
show
application,
phase, or activity productivity
rates. Some
benchmarks
also
include application quality
data in the form of defects
and defect
removal
efficiency. In addition, benchmarks
should also gather
informa-
tion
about the programming
languages, tools, and
methods used for
the
application.
Over
and above benchmarks, the
software industry also
performs soft-
ware
process
assessments. Software
process assessments gather
detailed
data
on software best practices
and on specific topics such
as project
management
methods, quality control
methods, development
methods,
412
Chapter
Six
maintenance
methods, and the like.
The process assessment
method
developed
by the Software Engineering
Institute (SEI) that
evaluates
an
organization's "capability maturity
level" is probably the
best-known
form
of assessment, but there are
several others as
well.
Since
it is obvious that assessment
data and benchmark data
are
synergistic,
there are also hybrid
methods that collect
assessment and
benchmark
data simultaneously. These
hybrid methods tend to
use
large
and complicated questionnaires
and are usually performed
via
on-site
consultants and face-to-face
interviews. However, it is
possible
to
use e-mail or web-based
questionnaires and communicate with
soft-
ware
engineers and managers via
Skype or some other method
rather
than
actual travel.
The
major forms of software
benchmarks included in this
book circa
2009
are
1.
International software
benchmarks
2.
Industry software
benchmarks
3.
Overall software cost and
resource benchmarks
4.
Corporate software portfolio
benchmarks
5.
Project-level software productivity
and quality
benchmarks
6.
Phase-level software productivity
and quality
benchmarks
7.
Activity-level software productivity
and quality
benchmarks
8.
Software outsource versus
internal performance
benchmarks
9.
Software maintenance and
customer support
benchmarks
10.
Methodology benchmarks
11.
Assessment benchmarks
12.
Hybrid assessment and
benchmark studies
13.
Earned-value benchmarks
14.
Quality and test coverage
benchmarks
15.
Cost of quality (COQ)
benchmarks
16.
Six Sigma benchmarks
17.
ISO quality standard
benchmarks
18.
Security benchmarks
19.
Software personnel and skill
benchmarks
20.
Software compensation
benchmarks
21.
Software turnover or attrition
benchmarks
22.
Software performance
benchmarks
Project
Management and Software
Engineering
413
23.
Software data center
benchmarks
24.
Software customer satisfaction
benchmarks
25.
Software usage
benchmarks
26.
Software litigation and
failure benchmarks
27.
Award benchmarks
As
can be seen from this rather
long list of software-related
bench-
marks,
the topic is much more
complicated than might be
thought.
Between
the recession and
global
International
software benchmarks
software
competition, it is becoming very
important to be able to
com-
pare
software development practices
around the world.
International
software
benchmarking is a fairly new
domain, but has already
begun
to
establish a substantial literature, with
useful books by
Michael
Cusumano,
Watts Humphries, Howard
Rubin, and Edward Yourdon
as
well
as by the author of this
book. One weakness with the
ISBSG data
is
that country of origin is
deliberately concealed. This
policy should be
reconsidered
in light of the continuing
recession.
When
performing international benchmarks,
many local factors
need
to
be recorded. For example,
Japan has at least 12 hours
of unpaid over-
time
per week, while other
countries such as Canada and
Germany have
hardly
any. In Japan the workweek
is about 44 hours, while in
Canada
it
is only 36 hours. Vacation
days also vary from
country to country,
as
do the number of public
holidays. France and the EU
countries, for
example,
have more than twice as
many vacation days as the
United
States.
Of
course, the most important
international topics for the
purposes of
outsourcing
are compensation levels and
inflation rates.
International
benchmarks
are a great deal more
complex than domestic
benchmarks.
Industry
benchmarks As
the recession continues,
more and more
atten-
tion
is paid to severe imbalances
among industries in terms of
costs
and
salaries. For example, the
large salaries and larger
bonuses paid to
bankers
and financial executives
have shocked the world
business com-
munity.
Although not as well-known because
the amounts are
smaller,
financial
software executives and
financial software engineering
per-
sonnel
earn more than similar
personnel in other industries,
too. As
the
recession continues, many
companies are facing the
difficult ques-
tion
of whether to invest significant
amounts of money and effort
into
improving
their own software
development practices, or to turn
over all
software
operations to an outsourcing vendor
who may already be
quite
sophisticated.
Benchmarks of industry schedules,
effort, and costs
will
become
increasingly important.
414
Chapter
Six
As
of 2009, enough industry
data exists to show
interesting variations
between
finance, insurance, health
care, several forms of
manufactur-
ing,
defense, medicine, and
commercial software
vendors.
Overall
software cost and resource
benchmarks Cost
and resources at
the
corporate level are
essentially similar to the
classic data center
benchmarking
studies, only transferred to a
software development
organization.
These studies collect data
on the annual
expenditures
for
personnel and equipment,
number of software personnel
employed,
number
of clients served, sizes of
software portfolios, and
other tangible
aspects
associated with software development
and maintenance. The
results
are then compared against
norms or averages from
companies
of
similar sizes, companies
within the same industry, or
companies that
have
enough in common to make the
comparisons interesting.
These
high-level
benchmarks are often
produced by "strategic"
consulting
organization
such as McKinsey, Gartner
Group, and the like.
This form
of
benchmark does not deal with
individual projects, but
rather with
corporate
or business-group expense
patterns.
In
very large enterprises with
multiple locations, similar
benchmarks
are
sometimes used for internal
comparisons between sites or
divisions.
The
large accounting companies
and a number of management
consult-
ing
companies can perform
general cost and resource
benchmarks.
A
corporate portfolio can be
as
Corporate
software portfolio
benchmarks
large
as 10 million function points
and contain more than
5000 applica-
tions.
The applications can include
IT projects, systems software,
embed-
ded
software, commercial software,
tools, outsourced applications,
and
open-source
applications. Very few
companies know how much
software
is
in their portfolios. Considering
that the total portfolio is
perhaps the
most
valuable asset that the
company owns, the lack of
portfolio-level
benchmarks
is troubling.
There
are so few portfolio
benchmarks because of the huge
size of
portfolios
and the high costs of
collecting data on the
entire mass of
software
owned by large
corporations.
A
portfolio benchmark study in
which the author
participated for
a
large manufacturing conglomerate
took about 12 calendar
months
and
involved 10 consultants who
visited at least 24 countries
and 60
companies
owned by the conglomerate.
Just collecting data for
this one
portfolio
benchmark cost more than $2
million. However, the value
of
the
portfolio itself was about
$15 billion. That is a very
significant asset
and
therefore deserves to be studied
and understood.
Of
course, for a smaller
company whose portfolio was
concentrated
in
a single data center, such a
study might have been
completed in a
month
by only a few consultants. But
unfortunately, large
corporations
Project
Management and Software
Engineering
415
are
usually geographically dispersed,
and their portfolios are
highly
fragmented
across many cities and
countries.
Project-level
productivity and quality
benchmarks Project-level
produc-
tivity
and quality benchmarks drop
down below the level of
entire
organizations
and gather data on specific
projects. These
project-level
benchmark
studies accumulate effort,
schedule, staffing, cost,
and qual-
ity
data from a sample of
software projects developed
and/or maintained
by
the organization that
commissioned the benchmark.
Sometimes the
sample
is as large as 100 percent,
but more often the
sample is more
limited.
For example, some companies
don't bother with projects
below
a
certain minimum size, such as 50
function points, or exclude
projects
that
are being developed for
internal use as opposed to projects
that are
going
to be released to external
clients.
Project-level
productivity and quality
benchmarks are sometimes
per-
formed
using questionnaires or survey
instruments that are
e-mailed or
distributed
to participants. This appears to be
the level discussed in
the
new
ISO draft benchmark standard.
Data at the project level
includes
schedules,
effort in hours or months,
and costs. Supplemental data
on
programming
languages and methodologies
may be included.
Quality
data
should be included, but
seldom is.
To
avoid "apples to oranges"
comparisons, companies that
perform
project-level
benchmark studies normally
segment the data so that
sys-
tems
software, information systems,
military software, scientific
soft-
ware,
and other kinds of software
are compared against
projects of the
same
type. Data is also segmented
by application size, to ensure
that
very
small projects are not
compared against huge
systems. New proj-
ects
and enhancement and
maintenance projects are
also segmented.
Although
collecting data at the
project level is fairly easy
to do, there
is
no convenient way to validate
the data or to ensure that
"leakage"
has
not omitted a significant
quantity of work and
therefore costs. The
accuracy
of project level data is
always suspect.
Unfortunately,
project-
Phase-level
productivity and quality
benchmarks
level
data is essentially impossible to
validate and therefore tends
to
be
unreliable. Dropping down to
the level of phases provides
increased
granularity
and therefore increased
value. There are no standard
defi-
nitions
of phases
that
are universally agreed to
circa 2009. However,
a
common phase pattern
includes requirements, design,
development,
and
testing.
When
a benchmark study is carried
out as a prelude to software
process
improvement
activities, the similar term
baseline
is
often used. In this
context,
the baseline reflects the
productivity, schedule, staffing,
and/or
quality
levels that exist when
the study takes place.
These results can
416
Chapter
Six
then
be used to measure progress or
improvements at future
intervals.
Benchmarks
and baselines collect
identical information and
are essen-
tially
the same. Project-level data
is not useful for baselines,
so phase-
level
data is the minimum level of
granularity that can show
process
improvement
results.
Phase-level
benchmarks are used by the
ISBSG and also
frequently
used
in academic studies. In fact,
the bulk of the literature
on software
benchmarks
tends to deal with phase-level
data. Enough
phase-level
data
is now available to have
established fairly accurate
averages and
ranges
for the United States,
and preliminary averages for
many other
countries.
Activity-level
productivity and quality
benchmarks Unfortunately,
mea-
surement
that collects only project
data is impossible to validate.
Phase-
level
data is hard to validate because
many activities such as
technical
documentation
and project management cross
phase boundaries.
Activity-based
benchmarks are even more
detailed than the
project-
level
benchmarks already discussed.
Activity-based benchmarks
drop
down
to the level of the specific
kinds of work that must be
performed in
order
to build a software application.
For example, the 25
activities used
by
the author since the 1980s
include specific sub-benchmarks
for require-
ments,
prototyping, architecture, planning,
initial design, detail
design,
design
reviews, coding, reusable
code acquisition, package
acquisition,
code
inspections, independent verification
and validation,
configuration
control,
integration, user documentation,
unit testing, function
testing,
integration
testing, system testing,
field testing, acceptance testing,
inde-
pendent
testing, quality assurance,
installation, and
management.
Activity-based
benchmarks are more
difficult to perform than
other
kinds
of benchmark studies, but
the results are far
more useful for
process
improvement, cost reduction,
quality improvement,
schedule
improvement,
or other kinds of improvement
programs. The great
advantage
of activity-based benchmarks is that
they reveal very
impor-
tant
kinds of information that
the less granular studies
can't provide.
For
example, for many kinds of
software projects, the major
cost drivers
are
associated with the production of
paper documents (plans,
speci-
fications,
user manuals) and with
quality control (inspections,
static
analysis,
testing). Both paperwork
costs and defect removal
costs are
often
more expensive than coding.
Findings such as this are
helpful in
planning
improvement programs and
calculating returns on
invest-
ments.
But to know the major cost
drivers within a specific
company
or
enterprise, it is necessary to get
down to the level of
activity-based
benchmark
studies.
Activity-based
benchmarks are normally
collected via on-site
interviews,
although
today Skype or a conference call
might be used. The
benchmark
Project
Management and Software
Engineering
417
interview
typically takes about two
hours and involves the
project man-
ager
and perhaps three team
members. Therefore the hours
are about
eight
staff hours plus consulting
time for collecting the
benchmark itself.
If
function points are counted
by the consultant, they
would take addi-
tional
time.
Software
outsource versus internal
performance benchmarks One
of the
most
frequent reasons that the
author has been commissioned
to carry
out
productivity and quality
benchmark studies is that a
company
is
considering outsourcing some or
all of their software
development
work.
Usually
the outsource decision is
being carried out high in
the com-
pany
at the CEO or CIO levels.
The lower managers are
alarmed that
they
might lose their jobs,
and so they commission
productivity and
quality
studies to compare in-house
performance against both
industry
data
and also data from
major outsource vendors in
the United States
and
abroad.
Until
recently, U.S. performance
measured in terms of function
points
per
month was quite good
compared with the outsource
countries of
China,
Russia, India, and others.
However, when costs were
measured,
the
lower labor costs overseas
gave offshore outsourcers a
competitive
edge.
Within the past few
years, inflation rates have
risen faster over-
seas
than in the United States,
so the cost differential has
narrowed.
IBM,
for example, recently
decided to build a large
outsource center in
Iowa
due to the low
cost-of-living compared with other
locations.
The
continuing recession has
resulted in a surplus of U.S.
software
professionals
and also lowered U.S.
compensation levels. As a result,
cost
data
is beginning to average out
across a large number of
countries. The
recession
is affecting other countries
too, but since travel
costs continue
to
go up, it is becoming harder or at
least less convenient to do
business
overseas.
As
of 2009,
Software
maintenance and customer
support benchmarks
there
are more maintenance and
enhancement software engineers
than
development
software engineers. Yet
benchmarks for maintenance
and
enhancement
work are not often
performed. There are several
reasons
for
this. One reason is that
maintenance work has no
fewer than 23
different
kinds of update to legacy
applications, ranging from
minor
changes
through complete renovation.
Another reason is that a
great
deal
of maintenance work involves
changes less than 15
function points
in
size, which is below the
boundary level of normal
function point
analysis.
Although individually these
small changes may be fast
and
inexpensive,
there are thousands of them,
and their cumulative
costs
in
large companies total to
millions of dollars per
year.
418
Chapter
Six
One
of the key maintenance
metrics that has value is
that of main-
tenance
assignment scope or
the amount of software one
person can
keep
up and running. Other
maintenance metrics include
number of
users
supported, rates at which
bugs are fixed, and
normal productivity
rates
expressed in terms of function
points per month or work
hours
per
function point. Defect
potentials and defect
removal efficiency
level
are
also important.
One
strong caution for
maintenance benchmarks: the
traditional "cost
per
defect" metric is seriously
flawed and tends to penalize
quality. Cost
per
defect achieves the lowest
costs for the buggiest
software. It also
seems
to be cheaper early rather
than late, but this is
really a false
conclusion
based on overhead rather
than actual time and
motion.
The
new requirements for service
and customer support
included in
the
Information Technology Infrastructure
Library (ITIL) are
giving
a
new impetus to maintenance
and support benchmarks. In
fact, ITIL
benchmarks
should become a major
subfield of software
benchmarks.
Methodology
benchmarks There
are many different forms of
software
development
methodology such as Agile
development, extreme
program-
ming
(XP), Crystal development,
waterfall development, the
Rational
Unified
Process (RUP), iterative
development, object-oriented
develop-
ment
(OO), rapid application
development (RAD), the Team
Software
Process
(TSP), and dozens more.
There are also scores of
hybrid develop-
ment
methods and probably
hundreds of customized or local
methods
used
only by a single
company.
In
addition to development methods, a
number of other
approaches
can
have an impact on software
productivity, quality, or both.
Some
of
these include Six Sigma,
quality function deployment
(QFD), joint
application
design (JAD), and software
reuse.
Benchmark
data should be granular and
complete enough to
dem-
onstrate
the productivity and quality
levels associated with
various
development
methods. The ISBSG benchmark
data is complete
enough
to
do this. Also, the data
gathered by for-profit benchmark
organizations
such
as QPMG and SPR can do
this, but there are
logistical problems.
The
logistical problems include
the following: Some of the
popular
development
methods such as Agile and
TSP use nonstandard
metrics
such
as story points, use-case points,
ideal time, and task
hours. The
data
gathered using such metrics
is incompatible with major
industry
benchmarks,
all of which are based on
function point metrics and
stan-
dard
work periods.
Another
logistical problem is that
very few organizations that
use some
of
these newer methods have
commissioned benchmarks by outside
con-
sultants
or used the ISBSG data
questionnaires. Therefore, the
effective-
ness
of many software development
methods is ambiguous and
uncertain.
Project
Management and Software
Engineering
419
Conversion
of data to function points
and standard work periods is
techni-
cally
possible, but has not
yet been performed by the
Agile community or
most
of the other methods that
use nonstandard
metrics.
Software
assessment has been
available in
Assessment
benchmarks
large
companies such as IBM since
the 1970s. IBM-style
assessments
became
popular when Watts Humphrey
left IBM and created the
assess-
ment
method for the Software
Engineering Institute (SEI)
circa 1986.
By
coincidence, the author also
left IBM and created the
Software
Productivity
Research (SPR) assessment
method circa 1984.
Software
process assessments received a
burst of publicity from
the
publication
of two books. One of these
was Watts Humphrey's
book
Managing
the Software Process (Addison
Wesley, 1989), which
describes
the
assessment method used by
the Software Engineering
Institute (SEI).
A
second book on software assessments
was the author's Assessment
and
Control of Software Risks
(Prentice
Hall, 1994), which
describes
the
results of the assessment
method used by Software
Productivity
Research
(SPR). Because both authors
had been involved with
software
assessments
at IBM, the SEI and SPR
assessments had some
attributes
in
common, such as a heavy
emphasis on software
quality.
Both
the SEI and SPR assessments
are similar in concept to
medical
examinations.
That is, both assessment
approaches try to find
every-
thing
that is right and everything
that may be wrong with the
way
companies
build and maintain software.
Hopefully, not too much will
be
wrong,
but it is necessary to know
what is wrong before truly
effective
therapy
programs can be
developed.
By
coincidence, both SPR and
SEI utilize 5-point scales in
evaluating
software
performance. Unfortunately, the
two scales run in
opposite
directions.
The SPR scale is based on a
Richter scale, with the
larger
numbers
indicating progressively more
significant hazards. The
SEI
scale
uses "1" as the most
primitive score, and moves
toward "5" as
processes
become more rigorous.
Following is the SEI scoring
system,
and
the approximate percentages of
enterprises that have been
noted
at
each of the five
levels.
SEI
Scoring System for the
Capability Maturity Model
(CMM)
Definition
Frequency
1
= Initial
75.0%
2
= Repeatable
15.0%
3
= Defined
7.0%
4
= Managed
2.5%
5
= Optimizing
0.5%
420
Chapter
Six
As
can be seen, about 75 percent of
all enterprises assessed using
the
SEI
approach are at the bottom
level, or "initial." Note
also that the SEI
scoring
system lacks a midpoint or
average.
A
complete discussion of the SEI
scoring system is outside
the scope
of
this book. The SEI scoring
is based on patterns of responses to a
set
of
about 150 binary questions.
The higher SEI maturity
levels require
"Yes"
answers to specific patterns of
questions.
Following
is the SPR scoring system,
and the approximate
percent-
ages
of results noted within
three industry groups:
military software,
systems
software, and management
information systems
software.
SPR
Assessment Scoring
System
Frequency
Military
Systems
MIS
Definition
(Overall)
Frequency
Frequency
Frequency
1
= Excellent
2.0%
1.0%
3.0%
1.0%
2
= Good
18.0%
13.0%
26.0%
12.0%
3
= Average
56.0%
57.0%
50.0%
65.0%
4
= Poor
20.0%
24.0%
20.0%
19.0%
5
= Very Poor
4.0%
5.0%
2.0%
3.0%
The
SPR scoring system is easier
to describe and understand. It
is
based
on the average responses to
the 300 or so SPR questions
on the
complete
set of SPR assessment
questionnaires.
By
inversion and mathematical
compression of the SPR scores, it
is
possible
to establish a rough equivalence
between the SPR and
SEI
scales,
as follows:
SPR
Scoring Range
Equivalent
SEI Score
Approximate
Frequency
5.99
to 3.00
1
= Initial
80.0%
2.99
to 2.51
2
= Repeatable
10.0%
2.01
to 2.50
3
= Defined
5.0%
1.01
to 2.00
4
= Managed
3.0%
0.01
to 1.00
5
= Optimizing
2.0%
The
conversion between SPR and
SEI assessment results is not
per-
fect,
of course, but it does allow
users of either assessment
methodology
to
have an approximate indication of
how they might have
appeared
using
the other assessment
technique.
There
are other forms of
assessment too. For example,
ISO quality
certification
uses a form of software
assessment, as do the SPICE
and
TickIT
approaches in Europe.
In
general, software assessments
are performed by outside
consul-
tants,
although a few organizations do
have internal assessment
experts.
Project
Management and Software
Engineering
421
For
SEI-style assessments, a number of
consulting groups are
licensed
to
carry out the assessment
studies and gather
data.
Benchmark
data shows pro-
Hybrid
assessment and benchmark
studies
ductivity
and quality levels, but
does not explain what
caused them.
Assessment
data shows the
sophistication of software
development
practices,
or the lack of same. But
assessments usually collect no
quan-
titative
data.
Obviously,
assessment data and
benchmark data are
synergistic, and
both
need to be gathered. The
author recommends that a
merger of
assessment
and benchmark data would be
very useful to the
industry.
In
fact the author's own
benchmarks are always hybrid
and gather
assessment
and benchmark data
concurrently.
One
of the key advantages of
hybrid benchmarks is that
the quantita-
tive
data can demonstrate the
economic value of the higher
CMM and
CMMI
levels. Without empirical
benchmark data, the value of
ascending
the
CMMI from level 1 to level 5 is
uncertain. But benchmarks do
dem-
onstrate
substantial productivity and
quality levels for CMMI
levels 3,
4,
and 5 compared with levels 1
and 2.
The
software industry would
benefit from a wider
consolidation of
assessment
and benchmark data
collection methods. The
advantage of
the
hybrid approach is that it
minimizes the number of
times managers
and
technical personnel are
interviewed or asked to provide
informa-
tion.
This keeps the assessment
and benchmark data
collection activi-
ties
from being intrusive or
interfering with actual day-to-day
work.
Some
of the kinds of data that
need to be consolidated to get an
over-
all
picture of software within a
large company or government
group
include
1.
Demographic data on team
sizes
2.
Demographic data on
specialists
3.
Demographic data on colocation or
geographic dispersion of
teams
4.
Application size using
several metrics (function
points, story points,
LOC,
etc.)
5.
Volumes of reusable code and
other deliverables
6.
Rates of requirements change
during development
7.
Data on project management
methods
8.
Data on software development
methods
9.
Data on software maintenance
methods
10.
Data on specific programming
languages
11.
Data on specific tool suites
used
422
Chapter
Six
12.
Data on quality-control and
testing methods
13.
Data on defect potentials
and defect removal
efficiency levels
14.
Data on security-control
methods
15.
Activity-level schedule, effort,
and cost data
Hybrid
assessment and benchmark
data collection could gather
all
of
this kind of information in a
fairly cost-effective and
nonintrusive
fashion.
The
earned-value method of comparing
accu-
Earned-value
benchmarks
mulated
effort and costs against
predicted milestones and
deliverables
is
widely used on military
software applications; indeed, it is a
require-
ment
for military contracts.
However, outside of the
defense community,
earned-value
calculations are also used
by some outsource contracts
and
occasionally
on internal applications.
Earned-value
calculations are performed at
frequent intervals,
usu-
ally
monthly, and show progress
versus expense levels. The
method is
somewhat
specialized and the
calculations are complicated,
although
dozens
of tools are available that
can carry them
out.
The
earned-value approach by itself is
not a true benchmark
because
it
has a narrow focus and
does not deal with topics
such as quality,
requirements
changes, and other issues.
However, the data that is
col-
lected
for the earned-value
approach is quite useful for
benchmark stud-
ies,
and could also show
correlations with assessment results
such as
the
levels of the capability
maturity model integration
(CMMI).
Quality
and test coverage benchmarks
Software
quality is poorly rep-
resented
in the public benchmark data
offered by nonprofit
organiza-
tions
such as ISBSG (International
Software Benchmarking
Standards
Group).
In fact, software quality is
not very well done by
the entire
software
industry, including some
major players such as
Microsoft.
Companies
such as IBM that do take
quality seriously measure
all
defects
from requirements through
development and out into
the field.
The
data is used to create
benchmarks of two very
important metrics:
defect
potentials and defect
removal efficiency. The term
defect
poten-
tials
refers
to the sum total of defects
that are likely to be found
in
software.
The term defect
removal efficiency refers
to the percentage
of
defects found and removed by
every single review,
inspection, static
analysis
run, and test
stage.
In
addition, quality benchmarks
may also include topics
such as
complexity
measured using cyclomatic
and essential complexity;
test
coverage
(percentage of code actually
touched by test cases); and
defect
severity
levels. There is a shortage of
industry data on many
quality
topics,
such as bugs or errors in
test cases
themselves.
Project
Management and Software
Engineering
423
In
general, the software
industry needs more and
better quality and
test
coverage benchmarks. The
test literature is very
sparse with infor-
mation
such as numbers of test
cases, numbers of test runs,
and defect
removal
efficiency levels.
A
strong caution about quality
benchmarks is that "cost per
defect"
is
not a safe metric to use
because it penalizes quality.
The author
regards
this metric as approaching
professional malpractice. A
better
metric
for quality economics is
that of defect removal cost
per func-
tion
point.
Cost
of quality (COQ) benchmarks
It is
unfortunate that such an
impor-
tant
idea as the "cost of
quality" has such an
inappropriate name.
Quality
is not only "free" as
pointed out by Phil Crosby
of ITT, but it
also
has economic value. The
COQ measure should have
been named
something
like the "cost of defects."
In any case, the COQ
approach is
older
than the software and
computing industry and
derives from a
number
of pioneers such as Joseph
Juran, W. Edwards Deming,
Kaoru
Ishikawa,
Genichi Taguchi, and
others.
The
traditional cost elements of
COQ include prevention,
appraisal,
and
failure costs. While these
are workable for software,
software COQ
often
uses cost buckets such as
defect prevention, inspection,
static
analysis,
testing, and delivered
defect repairs. The ideas
are the same,
but
the nomenclature varies to
match software
operations.
Many
companies perform COQ
benchmark studies of both
software
applications
and engineered products.
There is a substantial
literature
on
this topic and dozens of
reference books.
"Six
Sigma" is a mathematical expression
that
Six
Sigma benchmarks
deals
with limiting defects to no more
than 3.4 per 1 million
opportuni-
ties.
While this quantitative
result appears to be impossible
for software,
the
philosophy of Six Sigma is readily
applied to software.
The
Six Sigma approach uses a
fairly sophisticated and
complex suite
of
metrics to examine software
defect origins, defect
discovery methods,
defects
delivered to customers, and
other relevant topics.
However, the
Six
Sigma approach is also about
using such data to improve
both defect
prevention
and defect detection.
A
number of flavors of Six Sigma
exist, but the most
important flavor
circa
2009 is that of "Lean Six
Sigma," which attempts a
minimalist
approach
to the mathematics of defects
and quality analysis.
The
Six Sigma approach is not an
actual benchmark in the
tradi-
tional
sense of the word. As commonly
used, a benchmark is a
discrete
collection
of data points gathered in a
finite period, such as
collecting
data
on 50 applications developed in 2009 by a
telecommunications
company.
424
Chapter
Six
The
Six Sigma approach is not
fixed in time or limited in
number
of
applications. It is a continuous loop of
data collection, analysis,
and
improvement
that continues without
interruption once it is initiated.
Although
the ideas of Six Sigma are
powerful and often
effective,
there
is a notable gap in the
literature and data when Six
Sigma is
applied
to software. As of 2009, there is
not a great deal of
empirical data
that
shows the application of Six
Sigma raises defect removal
efficiency
levels
or lowers defect
potentials.
The
overall U.S. average for
defect potentials circa 2009
is about 5.00
bugs
per function point, while
defect removal efficiency
averages about
85
percent. This combination
leaves a residue of 0.75 bug
per function
point
when software is delivered to
users.
Given
the statistical nature of Six
Sigma metrics, it would be
inter-
esting
to compare all companies
that use Lean Six Sigma or
Six Sigma
for
software against U.S.
averages. If so, one might
hope that defect
potentials
would be much lower (say
about 3.00 bugs per
function point),
while
removal efficiency was much
higher (say greater than 95
percent).
Unfortunately,
this kind of data is sparse
and not yet available in
suf-
ficient
quantity for a convincing
statistical study.
As
it happens, one way of achieving Six
Sigma for software
would
be
to achieve a defect removal
efficiency rate of 99.999
percent, which
has
actually never occurred.
However, it would seem
useful to compare
actual
levels of defect removal
efficiency against this Six
Sigma theo-
retical
target.
From
a historical standpoint, defect
removal efficiency
calculations
did
not originate in the Six
Sigma domain, but rather
seemed to origi-
nate
in IBM, when software inspections
were being compared with
other
forms
of defect removal activities in
the early 1970s.
Organizations
that needed certification
for
ISO
quality benchmarks
the
ISO 9000-9004 quality standards or
for other newer relevant
ISO
standards
undergo an on-site examination of
their quality methods
and
procedures, and especially
the documentation for
quality control
approaches.
This certification is a form of
benchmark and actually
is
fairly
expensive to carry out.
However, there is little or no
empirical data
that
ISO certification improves software
quality in the
slightest.
In
other words, neither defect
potentials nor defect
removal efficiency
levels
of ISO certified organizations seem to be
better than similar
uncertified
organizations. Indeed there is
anecdotal evidence that
aver-
age
software quality for
uncertified companies may be
slightly higher
than
for certified
companies.
With
the exception of studies by
Homeland
Security
benchmarks
Security,
the FBI, and more
recently, the U.S. Congress,
there is almost
Project
Management and Software
Engineering
425
a
total absence of security benchmarks at
the corporate level. As
the
recession
lengthens and security
attacks increase, there is an
urgent
need
for security benchmarks that
can measure topics such as
the resis-
tance
of software to attack; numbers of
attacks per company and
per
application;
costs of security flaw
prevention; costs of recovery
from
security
attacks and denial of
service attacks; and
evaluations of the
most
effective forms of security
protection.
The
software journals do include
benchmarks for antivirus and
anti-
spyware
applications and firewalls
that show ease of use
and viruses
detected
or viruses let slip through.
However, these benchmarks
are
somewhat
ambiguous and casual.
So
far as can be determined,
there are no known
benchmarks on topics
such
as the number of security
attacks against Microsoft
Vista, Oracle,
SAP,
Linux, Firefox, Internet
Explorer, and the like. It
would be useful to
have
monthly benchmarks on these
topics. The lack of
effective security
benchmarks
is a sign that the software
industry is not yet fully up
to
speed
on security issues.
Software
personnel and skill
benchmarks Software
personnel and skills
inventory
benchmarks in the context of
software are a fairly new
arrival
on
the scene. Software has
become one of the major
factors in global
business.
Some large corporations have
more than 50,000 software
per-
sonnel
of various kinds, and quite
a few companies have more
than 2500.
Over
and above the large
numbers of workers, the
total complement of
specific
skills and occupation groups
associated with software is
now
approaching
90.
As
discussed in earlier chapters,
large enterprises have many
dif-
ferent
categories of specialists in addition to
their general
software
engineering
populations: For example,
quality assurance
specialists,
integration
and test specialists, human
factors specialists,
performance
specialists,
customer support specialists,
network specialists,
database
administration
specialists, technical communication
specialists, main-
tenance
specialists, estimating specialists,
measurement specialists,
function
point counting specialists,
and many others.
There
are important questions in
the areas of how many
specialists
of
various kinds are needed,
how they should be
recruited, trained,
and
perhaps
certified in their area of
specialization. There are
also questions
dealing
with the best way of placing
specialists within the
overall software
organization
structures. Benchmarking in this
domain involves
collecting
information
on how companies of various
sizes in various industries
deal
with
the increasing need for
specialization in an era of downsizing
and
business
process reengineering due to the
continuing recession.
A
new topic of increasing
importance due to the
recession is the
distribution
of foreign software workers
who are working in
the
426
Chapter
Six
United
States on temporary work-related
visas. This topic has
recently
been
in the press when it was
noted that Microsoft and
Intel were laying
off
U.S.
workers at a faster rate
than they were laying
off foreign workers.
Compensation
benchmarks have
Software
compensation benchmarks
been
used for more than 25
years for nonsoftware
studies, and they
soon
added
software compensation to these
partly open or blind
benchmarks.
The
way compensation benchmarks
work is that many
companies
provide
data on the compensation
levels that they pay to
various workers
using
standard job descriptions. A
neutral consulting company
analyzes
the
data and reports back to
each company. Each report
shows how spe-
cific
companies compare with group
averages. In the partly open
form,
the
names of the other companies
are identified but of course
their actual
data
is concealed. In the blind
form, the number of
participating compa-
nies
is known, but none of the
companies are identified.
There are legal
reasons
for having these studies
carried out in blind or
partly open forms,
which
involve possible antitrust
regulations or conspiracy
charges.
Software
turnover and attrition
benchmarks This
form of benchmark was
widely
used outside of software
before software became a
major business
function.
The software organizations
merely joined in when they
became
large
enough for attrition to
become an important
issue.
Attrition
and turnover benchmarks are
normally carried out
by
human
resource organizations rather
than software
organizations.
They
are classic benchmarks that
are usually either blind or
partly
open.
Dozens or even hundreds of
companies report their
attrition
and
turnover rates to a neutral
outside consulting group,
which then
returns
statistical results to each
company. Each company's rate
is
compared
with the group, but the
specific rates for the
other partici-
pants
are concealed.
There
are also internal attrition
studies within large
corporations such
as
IBM, Google, Microsoft, EDS,
and the like. The
author has had
access
to
some very significant data
from internal studies. The
most important
points
were that software engineers
with the highest appraisal
scores
leave
in the greatest numbers. The
most common reason cited
for leav-
ing
in exit interviews is that
good technical workers don't
like working
for
bad managers.
Software
performance benchmarks Software
execution speed or perfor-
mance
is one of the older forms of
benchmark, and has been
carried out
since
the 1970s. These are
highly technical benchmarks
that consider
application
throughput or execution speed for
various kinds of
situa-
tions.
Almost every personal
computer magazine has
benchmarks for
Project
Management and Software
Engineering
427
topics
such as graphics processing,
operating system load times,
and
other
performance issues.
This
form of benchmark is
probably
Software
data center
benchmarks
the
oldest form for the
computing and software
industry and has
been
carried
out continuously since the
1960s. Data center
benchmarks are
performed
to gather information on topics
such as availability of
hard-
ware
and software, mean time to
failure of software applications,
and
defect
repair intervals. The new
Information Technology
Infrastructure
Library
(ITIL) includes a host of
topics that need to be
examined so they
can
be included in service
agreements.
While
data center benchmarks are
somewhat separate from
software
benchmarks,
the two overlap because
poor data center
performance
tends
to correlate with poor quality
levels of installed
software.
Customer
satisfaction benchmarks Formal
customer satisfaction
surveys
have
long been carried out by
computer and software
vendors such as
IBM,
Hewlett-Packard, Unisys, Google,
and some smaller
companies,
too.
These benchmark studies are
usually carried out by the
market-
ing
organization and are used to
suggest improvements in
commercial
software
packages.
There
are some in-house benchmarks
of customer satisfaction
within
individual
companies such as insurance
companies that have
thousands
of
computer users. These
studies may also be
correlated to data
center
benchmarks.
Software
usage benchmarks As
software becomes an important
business
and
operational tool, it is obvious
that software usage tends to
improve
the
performance of various kinds of
knowledge work and clerical
work.
In
fact, prior to the advent of
computers, the employment
patterns of
insurance
companies included hundreds of
clerical workers who
han-
dled
applications, claims, and
other clerical tasks. Most
of these were
displaced
by computer software, and as a
result the demographics
of
insurance
companies changed
significantly.
Function
point metrics can be used to
measure consumption of
soft-
ware
just as well as they can
measure production of software.
Although
usage
benchmarks are rare in 2009,
they are likely to grow in
impor-
tance
as the recession
continues.
Usage
benchmarks of software project
managers, for example,
indi-
cate
that managers who are
equipped with about 3000
function points
of
cost estimating tools and
3000 function points of
project management
tools
have fewer failures and
shorter schedules for their
projects than
managers
who attempt estimating and
planning by hand.
428
Chapter
Six
Usage
studies also indicate that
many knowledge workers who
are
well
equipped with software outperform
colleagues who are not so
well
equipped.
This is true for knowledge
work such as law, medicine,
and
engineering,
and also for work
where data plays a
significant role such
as
marketing, customer support,
and maintenance.
Software
consumption benchmark studies
are just getting
started
circa
2009, but are likely to
become major forms of
benchmarks within
ten
years, especially if the
recession continues.
Software
litigation and failure
benchmarks In
lawsuits for breach of
con-
tract,
poor quality, fraud, cost
overruns, or project failure,
benchmarks
play
a major role. Usually in
such cases software expert
witnesses are
hired
to prepare reports and
testify about industry norms
for topics
such
as quality control, schedules
costs, and the like.
Industry experts
are
also brought in for tax
cases if the litigation
involves the value or
replacement
costs of software
assets.
The
expert reports produced for
lawsuits attempt to compare
the
specifics
of the case against industry
background data for topics
such
as
defect removal efficiency
levels, schedules, productivity,
costs, and
the
like.
The
one key topic where
litigation is almost unique in
gathering
data
is that of the causes of
software failure. Most
companies that have
internal
failures don't go to court. But
failures where the software
was
developed
under contract go to court with
high frequency. These
law-
suits
have extensive and thorough
discovery and deposition
phases, so
the
expert witnesses who work on
such cases have access to
unique data
that
is not available from any
other source.
Benchmarks
based on litigation are perhaps
the most complete
source
of
data on why projects are
terminated, run late, exceed
their budgets,
or
have excessive defect
volumes after
release.
There
are a number of organizations
that offer
Award
benchmarks
awards
for outstanding performance.
For example, the Baldrige
Award
is
well known for quality
and customer service. The
Forbes Annual issue
on
the 100 best companies to
work for is another kind of
award. J.D.
Power
and Associates issues awards
for various kinds of service
and
support
excellence. For companies
that aspire to "best in
class" status,
a
special kind of benchmark
can be carried out dealing
with the criteria
of
the Baldrige Awards.
If
a company is a candidate for
some kind of award, quite a
bit of work
is
involved in collecting the
necessary benchmark information.
However,
only
fairly sophisticated companies
that are actually doing a
good job
are
likely to have such
expenses.
Project
Management and Software
Engineering
429
As
of 2009, probably at least a
dozen awards are offered by
vari-
ous
corporations, government groups,
and software journals. There
are
awards
for customer service, for
high quality, for innovative
applica-
tions,
and for many other
topics as well.
Types
of Software Benchmark
Studies
Performed
There
are a number of methodologies
used to gather the data
for bench-
mark
studies. These include
questionnaires that are
administered by
mail
or electronic mail, on-site
interviews, or some combination
of
mailed
questionnaires augmented by
interviews.
Benchmarking
studies can also be "open"
or "blind" in terms of
whether
the participants know who
else has provided data
and infor-
mation
during the benchmark
study.
Open
benchmarks In a
fully open study, the
names of all participating
organizations
are known, and the
data they provide is also
known. This
kind
of study is difficult to do between
competitors, and is
normally
performed
only for internal benchmark
studies of the divisions
and
locations
within large
corporations.
Because
of corporate politics, the
individual business units
within a
corporation
will resist open benchmarks. When IBM
first started
software
benchmarks,
there were 26 software
development labs, and each
lab man-
ager
claimed that "our work is so
complex that we might be
penalized."
However,
IBM decided to pursue open
benchmarks, and that was a
good
decision
because it encouraged the business
unit to improve.
One
of the common variations of an
open study
Partly
open benchmarks
is
a limited benchmark, often
between only two companies.
In a two-com-
pany
benchmark, both participants
sign fairly detailed
nondisclosure
agreements,
and then provide one another
with very detailed
informa-
tion
on methods, tools, quality
levels, productivity levels,
schedules, and
the
like. This kind of study is
seldom possible for direct
competitors, but
is
often used for companies
that do similar kinds of
software but operate
in
different industries, such as a
telecommunications company
sharing
data
with a computer manufacturing
company.
In
partly open benchmark
studies, the names of the
participating
organizations
are known, even though
which company provided
specific
points
of data is concealed. Partly
open studies are often
performed
within
specific industries such as
insurance, banking,
telecommuni-
cations,
and the like. In fact,
studies of this kind are
performed for a
variety
of purposes besides software
topics. Some of the other
uses of
partly
open studies include
exploring salary and benefit
plans, office
430
Chapter
Six
space
arrangements, and various
aspects of human relations
and
employee
morale.
An
example of a partly open
benchmark is a study of the
productivity
and
quality levels of insurance
companies in the Hartford,
Connecticut,
area
where half a dozen are
located. All of these companies
are com-
petitors,
and all are interested in
how they compare with the
others.
Therefore,
a study gathered data from
each and reported back on
how
each
company compared with the
averages derived from all of
the com-
panies.
But information on how a company
such as Hartford
Insurance
compared
with Aetna or Travelers would
not be provided.
In
blind benchmark studies,
none of the
participants
Blind
benchmarks
know
the names of the other
companies that participate. In
extreme
cases,
the participants may not
even know the industries
from which
the
other companies were drawn.
This level of precaution
would only
be
needed if there were very
few companies in an industry, or if
the
nature
of the study demanded
extraordinary security measures, or
if
the
participants are fairly
direct competitors.
When
large corporations first
start collecting benchmark
data, it is
obvious
that the top executives of
various business units will be
con-
cerned.
They all have political
rivals, and no executive
want his or her
business
unit to look worse than a
rival business unit.
Therefore, every
executive
will want blind benchmarks
that conceal the results of
spe-
cific
units. This is a bad
mistake, because nobody will take
the data
seriously.
For
internal benchmark and
assessment studies within a
company,
it
is best to show every unit
by name and let corporate
politics serve as
an
incentive to improve. This
brings up the important
point that bench-
marks
have a political aspect as
well as a technical
aspect.
Since
executives and project
managers have rivals, and
corporate
politics
are often severe, nobody
wants to be measured unless
they are
fairly
sure the results will
indicate that they are
better than average,
or
at least better than their
major political
opponents.
Benchmark
Organizations Circa
2009
A
fairly large number of
consulting companies collect
benchmark data of
various
kinds. However, these
consulting groups tend to be
competitors,
and
therefore it is difficult to have
any kind of coordination or
consolida-
tion
of benchmark information.
As
it happens, three of the
more prominent benchmark
organizations do
collect
activity-level data in similar
fashions: The David
Consulting Group,
Quality
and Productivity Management
Group (QPMG), and
Software
Productivity
Research (SPR). This is due
to the fact the principals
for all
Project
Management and Software
Engineering
431
TABLE
6-13
Examples
of Software Benchmark
Organizations
1.
Business
Applications Performance Corporation
(BAPco)
2.
Construx
3.
David
Consulting Group
4.
Forrester
Research
5.
Galorath
Associates
6.
Gartner
Group
7.
Information
Technology Metrics and
Productivity Institute (ITMPI)
8.
International
Software Benchmarking Standards
Group (ISBSG)
9.
ITABHI
Corporation
10.
Open
Standards Benchmarking Collaborative
(OSBC)
11.
Process
Fusion
12.
Quality
and Productivity Management
Group (QPMG)
13.
Quality
Assurance Institute
(QAI)
14.
Quality
Plus
15.
Quantitative
Software Management
(QSM)
16.
Software
Engineering Institute
(SEI)
17.
Software
Productivity Research
(SPR)
18.
Standard
Performance Evaluation Corporation
(SPEC)
19.
Standish
Group
20.
Total
Metrics
three
organizations have worked
together in the past.
However, although
the
data collection methods are
similar, there are still
some differences.
But
the total volume of data
among these three is
probably the largest
collection
of benchmark data in the
industry. Table 6-13 shows
examples
of
software benchmark
organizations.
For
all of these 20 examples of
benchmark organizations, IFPUG
function
points are the dominant
metric, followed by COSMIC
function
points
as a distant second.
Reporting
Methods for
Benchmark
and
Assessment Data
Once
assessment and benchmark
data has been collected,
two interest-
ing
questions are who gets to
see the data, and
what is it good for?
Normally,
assessment and benchmarks
are commissioned by an
exec-
utive
who wants to improve
software performance. For
example, bench-
marks
and assessments are
sometimes commissioned by the
CEO of a
corporation,
but more frequently by the
CIO or CTO.
The
immediate use of benchmarks
and assessments is to show
the
executive
who commissioned the study
how the organization
compares
432
Chapter
Six
against
industry data. The topics of
interest at the executive
level
include
Benchmark
Contents (standard
benchmarks)
Number
of projects in benchmark
sample
Country
and industry identification
codes
Application
sizes
Methods
and tool used
Growth
rate of changing
requirements
Productivity
rates by activity
Net
productivity for entire
project
Schedules
by activity
Net
schedule for entire
project
Staffing
levels by activity
Specialists
utilized
Average
staff for entire
project
Effort
by activity
Total
effort for entire
project
Costs
by activity
Total
costs for entire
project
Comparison
to industry data
Suggestions
for improvements based on
data
Once
an organization starts collecting
assessment and
benchmark
data,
they usually want to
improve. This implies that
data collection
will
be an annual event, and that
the data will be used as
baselines to
show
progress over multiple
years.
When
improvement occurs, companies will
want to assemble an
annual
baseline report that shows
progress for the past
year and the
plans
for the next year.
These annual reports are
produced on the same
schedule
as corporate annual reports
for shareholders; that is,
they are
created
in the first quarter of the
next fiscal year.
The
contents for such an annual
report would include
Annual
Software Report for
Corporate Executives and
Senior Management
CMMI
levels by business
group
Completed
software projects by
type
IT
applications
Project
Management and Software
Engineering
433
Systems
software
Embedded
applications
Commercial
packages
Other
(if any)
Cancelled
software projects (if
any)
Total
costs of software in current
year
Unbudgeted
costs in current year
Litigation
Denial
of service attacks
Malware
attacks and recovery
Costs
by type of software
Costs
of development versus
maintenance
Customer
satisfaction levels
Employee
morale levels
Average
productivity
Ranges
of productivity
Average
quality
Discovered
defects during
development
Delivered
defects reported by clients in 90
days
Cost
of quality (COQ) for current
year
Comparison
of local results to ISBSG
and other external
benchmarks
Most
of the data in the annual
report would be derived from
assess-
ment
and benchmark studies.
However, a few topics such
as those deal-
ing
with security problems such as
denial of service attacks
are not part
of
either standard benchmarks or
standard assessments. They
require
special
studies.
Summary
and Conclusions
Between
about 1969 and today in
2009, software applications
have
increased
enormously in size and
complexity. In 1969, the
largest appli-
cations
were fewer than 1000
function points, while in
2009, they top
100,000
function points in
size.
In
1969, programming or coding
was the major activity
for software
applications
and constituted about 90
percent of the total effort.
Most
applications
used only a single
programming language. The
world total
434
Chapter
Six
of
programming languages was
fewer than 25. Almost
the only spe-
cialists
in 1969 were technical
writers and perhaps quality
assurance
workers.
Today
in 2009, coding or programming is
less than 40 percent of
the
effort
for large applications, and
the software industry now
has more
than
90 specialists. More than
700 programming languages
exist, and
almost
every modern application
uses at least two
programming lan-
guages;
some use over a
dozen.
As
the software industry
increased in numbers of personnel,
size of
applications,
and complexity of development,
project management
fell
behind.
Today in 2009, project
managers are still receiving
training that
might
have been effective in 1969,
but it falls short of what
is needed in
today's
more complicated
world.
Even
worse, as the recession increases in
severity, there is an
urgent
need
to lower software costs. Project
managers and software
engineers
need
to have enough solid
empirical data to evaluate
and understand
every
single cost factor associated with
software. Unfortunately, poor
mea-
surement
practices and a shortage of
solid data on quality,
security, and
costs
have put the software
industry in a very bad
economic position.
Software
costs more than almost
any other manufactured
product;
it
is highly susceptible to security
attacks; and it is filled with
bugs or
defects.
Yet due to the lack of
reliable benchmark and
quality data, it is
difficult
for either software
engineers or project managers to
deal with
these
serious problems
effectively.
The
software industry needs
better quality, better
security, lower
costs,
and shorter schedules. But
until solid empirical data
is gathered
on
all important projects, both
software engineers and
project manag-
ers
will not be able to plan
effective solutions to industrywide
problems.
Many
process improvement programs
are based on nothing more
than
adopting
the methodology du
jour,
such as Agile in 2009,
without any
empirical
data on whether it will be effective.
Better measurements
and
better
benchmarks are the keys to
software success.
Readings
and References
Abran,
Alain and Reiner R. Dumke.
Innovations
in Software Measurement. Aachen,
Germany:
Shaker-Verlag, 2005.
Abran,
Alain, Manfred Bundschuh,
Reiner Dumke, Christof
Ebert, and Horst
Zuse.
Software
Measurement News, Vol.
13, No. 2, Oct.
2008.
Boehm,
Dr. Barry. Software
Engineering Economics. Englewood
Cliffs, NJ: Prentice
Hall,
1981.
Booch,
Grady. Object
Solutions: Managing the
Object-Oriented Project.
Reading, MA:
Addison
Wesley, 1995.
Brooks,
Fred. The
Mythical Man-Month. Reading,
MA: Addison Wesley, 1974,
rev. 1995.
Bundschuh,
Manfred and Carol Dekkers.
The
IT Measurement Compendium. Berlin:
Springer-Verlag,
2008.
Project
Management and Software
Engineering
435
Capability
Maturity Model Integration.
Version 1.1. Software
Engineering Institute,
Carnegie-Mellon
Univ., Pittsburgh, PA. March
2003. www.sei.cmu.edu/cmmi/
Charette,
Bob. Application
Strategies for Risk
Management. New
York: McGraw-Hill,
1990.
Charette,
Bob. Software
Engineering Risk Analysis
and Management. New
York:
McGraw-Hill,
1989.
Cohn,
Mike. Agile
Estimating and Planning. Englewood
Cliffs, NJ: Prentice Hall
PTR,
2005.
DeMarco,
Tom. Controlling
Software Projects. New
York: Yourdon Press,
1982.
Ebert,
Christof and Reiner Dumke.
Software
Measurement: Establish,
Extract,
Evaluate,
Execute. Berlin:
Springer-Verlag, 2007.
Ewusi-Mensah,
Kweku. Software
Development Failures. Cambridge,
MA: MIT Press,
2003.
Galorath,
Dan. Software
Sizing, Estimating, and Risk
Management: When
Performance
is
Measured Performance
Improves.
Philadelphia: Auerbach Publishing,
2006.
Garmus,
David and David Herron.
Function
Point Analysis--Measurement Practices
for
Successful
Software Projects. Boston:
Addison Wesley Longman,
2001.
Garmus,
David and David Herron.
Measuring
the Software Process: A Practical
Guide
to
Functional Measurement. Englewood
Cliffs, NJ: Prentice Hall,
1995.
Glass,
R.L. Software
Runaways: Lessons Learned
from Massive Software
Project
Failures.
Englewood
Cliffs, NJ: Prentice Hall,
1998.
Harris,
Michael, David Herron, and
Stacia Iwanicki. The
Business Value of
IT:
Managing
Risks, Optimizing Performance,
and Measuring Results. Boca
Raton, FL:
CRC
Press (Auerbach),
2008.
Humphrey,
Watts. Managing
the Software Process. Reading,
MA: Addison Wesley,
1989.
International
Function Point Users Group
(IFPUG). IT
Measurement--Practical Advice
from
the Experts. Boston:
Addison Wesley Longman,
2002.
Johnson,
James, et al. The
Chaos Report. West
Yarmouth, MA: The Standish
Group, 2000.
Jones,
Capers. Assessment
and Control of Software
Risks. Englewood
Cliffs, NJ:
Prentice
Hall, 1994.
Jones,
Capers. Estimating
Software Costs. New
York: McGraw-Hill,
2007.
Jones,
Capers. Patterns
of Software System Failure
and Success. Boston:
International
Thomson
Computer Press, December
1995.
Jones,
Capers. Program
Quality and Programmer
Productivity. IBM
Technical Report
TR
02.764, IBM. San Jose, CA.
January 1977.
Jones,
Capers. Programming
Productivity.
New York: McGraw-Hill,
1986.
Jones,
Capers. Software
Assessments, Benchmarks, and Best
Practices. Boston:
Addison
Wesley
Longman, 2000.
Jones,
Capers. "Software Project
Management Practices: Failure
Versus Success."
CrossTalk,
Vol.
19, No. 6 (June 2006):
48.
Jones,
Capers. "Why Flawed Software
Projects are not Cancelled
in Time." Cutter
IT
Journal,
Vol.
10, No. 12 (December 2003):
1217.
Laird,
Linda M. and Carol M.
Brennan. Software
Measurement and Estimation:
A
Practical
Approach. Hoboken,
NJ: John Wiley & Sons,
2006.
McConnell,
Steve. Software
Estimating: Demystifying the
Black Art. Redmond,
WA:
Microsoft
Press, 2006.
Park,
Robert E., et al. Checklists
and Criteria for Evaluating
the Costs and
Schedule
Estimating
Capabilities of Software Organizations.
Technical
Report CMU/SEI 95-
SR-005.
Pittsburgh, PA: Software
Engineering Institute, January
1995.
Park,
Robert E., et al. Software
Cost and Schedule
Estimating--A Process
Improvement
Initiative.
Technical
Report CMU/SEI 94-SR-03. Pittsburgh,
PA: Software
Engineering
Institute, May 1994.
Parthasarathy,
M.A. Practical
Software Estimation--Function Point
Metrics for
Insourced
and Outsourced Projects. Upper
Saddle River, NJ: Infosys
Press, Addison
Wesley,
2007.
Putnam,
Lawrence H. and Ware Myers.
Industrial
Strength Software--Effective
Management
Using Measurement. Los
Alamitos, CA: IEEE Press,
1997.
Putnam,
Lawrence H. Measures
for Excellence Reliable
Software On Time,
Within
Budget.
Englewood
Cliffs, NJ: Yourdon Press,
Prentice Hall, 1992.
436
Chapter
Six
Roetzheim,
William H. and Reyna A.
Beasley. Best
Practices in Software Cost
and
Schedule
Estimation.
Saddle River, NJ: Prentice Hall
PTR, 1998.
Stein,
Timothy R. The
Computer System Risk
Management Book and
Validation Life
Cycle.
Chico,
CA: Paton Press,
2006.
Strassmann,
Paul. Governance
of Information Management: The
Concept of an
Information
Constitution,
Second Edition. (eBook).
Stamford, CT: Information
Economics
Press, 2004.
Strassmann,
Paul. Information
Payoff. Stamford,
CT: Information Economics
Press,
1985.
Strassmann,
Paul. Information
Productivity. Stamford,
CT: Information Economics
Press,
1999.
Strassmann,
Paul. The
Squandered Computer. Stamford,
CT: Information Economics
Press,
1997.
Stukes,
Sherry, Jason Deshoretz,
Henry Apgar, and Ilona
Macias. Air
Force Cost
Analysis
Agency Software Estimating
Model Analysis. TR-9545/008-2.
Contract
F04701-95-D-0003,
Task 008. Management
Consulting & Research, Inc.
Thousand
Oaks,
CA. September 30
1996.
Stutzke,
Richard D. Estimating
Software-Intensive Systems.
Upper Saddle River,
NJ:
Addison
Wesley, 2005.
Symons,
Charles R. Software
Sizing and Estimating--Mk II
FPA (Function
Point
Analysis).
Chichester,
UK: John Wiley & Sons,
1991.
Wellman,
Frank. Software
Costing: An Objective Approach to
Estimating and
Controlling
the Cost of Computer
Software. Englewood
Cliffs, NJ: Prentice
Hall,
1992.
Whitehead,
Richard. Leading
a Development Team. Boston:
Addison Wesley, 2001.
Yourdon,
Ed. Death
March--The Complete Software
Developer's Guide to
Surviving
"Mission
Impossible" Projects. Upper
Saddle River, NJ: Prentice Hall
PTR, 1997.
Yourdon,
Ed. Outsource:
Competing in the Global
Productivity Race. Englewood
Cliffs,
NJ:
Prentice Hall PTR,
2005.
Table of Contents:
|
|||||