|
|||||
Lecture
No. 32
DWH
Lifecycle: Methodologies
Lay
Out the Project
A data warehouse
project is more like scientific
research than anything in traditional
IS!
The normal
Information System (IS)
approach emphasizes on knowing what
the expected results
are
before committing to action. In scientific
research, the results are
unknown up front, and
emphasis is
placed on developing a rigorous,
step-by-step process to uncover
the truth. The
activities
involve regular interactions
between the scientist and
the subject and also
among the
project
participants. It is advised to adopt an exploratory,
hands -on process involving
cross-
disciplinary
participation.
Building a
data warehouse is a very challenging
job because unlike software engineering
it is
quite a young discipline,
and therefore, does not yet
has well-established strategies
and
techniques
for the development process.
Majority of projects fail
due to the complexity of
the
development
process. To date there is no
common strategy for the
development of data
warehouses;
they are more of an art than
science. Current data warehouse
development methods
can
fall within three basic
groups: data -driven,
goal-driven and
user-driven.
Implementation
strategies
· Top
down approach
· Bottom Up
approach
Development
methodologies
· Waterfall
model
· Spiral
model
· RAD
Model
· Structured
Methodology
· Data
Driven
· Goal
Driven
· User
Driven
Implementation
Strategies
Top
Down & Bottom Up approach
: A
Top Down approach is generally useful
for projects where
the
technology is mature and
well understood, as well as where
the business problems that
must
be solved are
clear and well understood. A
Bottom Up approach is useful, on the other
hand, in
making
technology assessments and is a good
technique for organizations
that are not leading
edge technology
implementers. This approach is
used when the business
objectives that are to be
met by
the data warehouse are
unclear, or when the current or proposed
business process will
be
affected by the
data warehouse.
Development
Methodologies
A Development
Methodology describes t e
expected evolution and
management of the
h
engineering
system.
270
Waterfall
Model: The model is a
linear sequence of activities like
requirements definition,
system
design, detailed design, integration
and testing, and finally
operations and
maintenance.
The model is
used when the system
requirements and objectives are
known and clearly
specified.
While
one can use the
traditional waterfall approach to
developing a data warehouse,
there are
several
drawbacks. First and
foremost, the project is likely to
occur over an extended
period of
time,
during which the users
may not have had an opportunity to
review what will be
delivered.
Second, in
today's demanding competitive environment
there is a need to produce
results in a
much
shorter timeframe.
Spiral
Model: The model is a
sequence of waterfall models
which corresponds to a risk
oriented
iterative
enhancement, and recognizes
that requirements are not
always available and clear
when
the
system is first implemented.
Since designing and building
a data warehouse is an iterative
process,
the spiral method is one of
the development methodologies of
choice.
RAD:
Rapid
Application Development (RAD) is an
iterative model consisting of stages
like
scope,
analyze, design, construct,
test, implement, and review. It is
much better suited to
the
development of a
data warehouse because of
its iterative nature and
fast iterations. User
requirements
are sometimes difficult to
establish because business
analysts are too close to
the
existing infra-structure to
easily envision the larger
empowerment that data
warehousing can
offer.
Development and delivery of early
prototypes will drive future
requirements as business
users
are given direct access to
information and the ability
to manipulate it. Management of
expectations
requi res that the
content of the data
warehouse be clearly communicated for
each
iteration.
There
are 5 keys to a successful
rapid prototyping methodology:
1. Assemble a
small, very bright team of
database programmers, hardware
technicians,
designers,
quality as surance technicians,
documentation and decision
support specialists,
and a
single manager.
2. Define
and involve a small "focus
group" consisting of users
(both novice and
experienced)
and managers (both line
and upper). These are the
people who will
provide
the
feedback necessary to drive
the prototyping cycle. Listen to
them carefully.
3. Generate a
user's manual and user
interface first. These will
prove to be amazing in
terms
of user
feedback and requirements
specification.
4. Use
tools specifically designed for rapid
prototyping. Stay away from C,
C++, COBOL,
SQL, etc.
Instead use the visual
development tools included with the
database.
5. Remember a
prototype is NOT the final application.
It servers a means of making
the
user
more expressive about
requirements and developing in
them a clear
understanding
and
vision of the system.
Prototypes are meant to be
copied into production models.
Once
the
prototypes are successful, begin
the development processing using
development tools,
such as C,
C++, Java, SQL,
etc.
Structured
Development: When a project
has more than 10 people
involved or when
multiple
companies
are performing the development, a
more structured development
management
approach is
required. Note that rapid
prototyping can be a subset of
the struct ured
development
approach.
This approach applies a more
disciplined approach to the data
warehouse development.
Documentation
requirements are larger, quality
control is critical, and the number of
reviews
271
increases.
While some parts may
seem like overkill at the
time, they can save a
project from
problems,
especially late in the development
cycle.
Data-Driven
Methodologies: Bill
Inmon, the founder of data
warehousing argues that
data
warehouse
environments are data driven, in
comparison to classical systems,
which have a
requirement
driven development lifecycle. According to Inmon,
requirements are the last
thing to
be considered in
the decision support development
lifecycle. Requirements are
understood
AFTER
the data warehouse has
been populated with data and
results of queries have
been
analyzed by
the end users. Thus the
data warehouse development strategy is
based on the analysis
of the
corporate data model and
relevant transactions. This is an
extreme approach ignoring
the
needs of
data warehouse users a
priori. Consequently company
goals and user requirements
are
not reflected at
all in the first cycle,
and are integrated in the
second cycle.
Goal-Driven
Methodologies: In order to
derive the initial data
warehouse structure,
Böhnlein
and
Ulbrich-vom Ende have
presented a four-stage approach
based on the SOM (Semantic
Object
Model)
process modeling technique.
The first stage determines
goals and services the
company
provides to
its customers. In the second
step, the business process
is analyzed by applying the
SOM interaction
schema that highlights the
customers and their
transactions with the
process
under study. In
third step, sequences of
transactions are transformed into
sequences of existing
dependencies
that refer to information
systems. The last step
identifies measures and
dimensions,
by enforcing
(information request) transactions,
from existing dependencies.
This approach is
suitable
only well when business
processes are designed throughout
the company and
are
combined
with business goals.
Kimball
also proposes a four-step approach
where he starts to choose a
business process,
takes
the
grain of the process, and
chooses dimensions and
facts. He defines a business process as
a
major
operational process in the organization
that is supported by some
kind of legacy system
(or
systems). We
will discuss this in great
detail in lectures 33-34.
User-Driven
Methodologies: Westerman
describes an approach that
was developed at Wal-Mart
and
has its main focus on implementing
business strategy. The methodology
assumes that the
company goal is
the same for everyone and
the entire company will
therefore be pursuing the
same direction.
It is proposed to set up a first
prototype based on the needs of
the business.
Business
people define goals and
gather, priorities as well as
define business questions
supporting
these
goals. Afterwards the
business questions are
prioritized and the most
important business
questions
are defined in terms of data
elements, including the
definition of hierarchies.
Although
t h e Wal-Mart
approach focuses on business
needs, business goals that
are defined by the
organization are
not taken into consideration at
all.
Poe
proposes a catalogue for
conducting user interviews in
order to collect end user
requirements.
She
recommends int erviewing
different user groups in order to
get a complete understanding
of
the
business. The questions
should cover a very broad
field including topics like
job
responsibilities.
WHERE DO
YOU START?
The
majority of successful data
warehouses have started with
a clear understanding of a
business
problem and
the user requirements for
information analysis. It is strongly
recommended that the
team
assembled to create a data
warehouse be comprised of IT
professionals and business
users.
Projects
must have a clearly defined scope
for managing economic and
operational limitations.
272
The
process will be highly
iterative as IT and end
users work toward a
reasonable aggregation
level
for data in the
warehouse.
What
specific Problems the DWH
will solve?
Write
down all the problems. The
problems should be precise, clearly
stated and testable
i.e.
success criteria
is known or can easily be
specified. Make sure to get
user and management
feedback by
publicizing these written
problems.
What
criteria to use to measure
success?
This is an
often overlooked step in the
problem definition. For every problem
stated, you must
define a
means for determining the
success of the solution. If you can't
think of a success
criterion, then
the problem is not defined specifically en
ough. Stay away from
problem
statements
such as "The data warehouse
must hold all our
accounting data." Restate
the problem
in quantifiable
terms, like "The data
warehouse must handle the
current 20GB of accounting
data
including all
metadata and replicated data
with an expected 20% growth
per year."
How to
manage time and
money?
The
first data warehouse (first
iteration's output) should cover a
single subject area and
be
delivered at a
relatively low cost. To minimize
risk, the target platform
should be one where IT
has developed
some infrastructure. Existing technical
skills, operational skills and
database
experience
will help tremendously. The project
must be time boxed, with
guaranteed deliverables
every 90
days, and a project end date
in six to nine m nths. The
overall cost of the first
data
o
warehouse
should be in the $200K to
$500K range, with prototypes
completed for $10K to
$150K in 30 to
60 days (since local companies
keep their costs secret,
costs in dollar are
given
here as an
example). Increment al successes will
drive expansion of existing
data warehouses and
the
funding and creation of
additional ones.
What skills
are required?
The
level of complexity involved in
successfully designing and implementing a
data warehouse
must not be
underestimate d. Time must be
spent to acquire and develop
additional skills for
data
warehousing
developers and users. Some
options are:
· Invest in
just-in-time training (provided by data
warehousing tool vendors)
· Use
pilot projects as seeds for
new technology training
· Develop reward
systems that encourage
experimentation
· Use
outside system integrators
and individual
consultants
As additional
motivation for data warehousing
team members, a new class of
job titles is being
created.
Companies are beginning to
use dedicated titles such
as: Data Warehouse Steward,
Data
Warehouse
Architect, Data Quality Engineer
and Data Warehouse
Auditor.
273
Figure-32.1:
DWH Development
Cycle
Although
specific vocabularies vary from
organization to organization, the data
warehousing
industry is in
agreement of the fundamental data
warehouse lifecycle model as shown in
Figure
32.1.
The cyclic model consists of
5 major steps described as
follows
1.
Design: It involves
the development of robust star-schema
-based dimensional data
models
from
both available data and user
requirements. It is thought that
the best data
warehousing
practitioners
even work with available organizational
data and incompletely expressed
user
requirements.
Key activities in the phase
typically include end -user
interview cycles,
source
system
cataloguing, definition of key
performance indicators and other critical
business
definitions, and
logical and physical schema
design tasks which feed the
next phase of the
model
directly.
2.
Prototype: In this step a
working model of a data warehouse or
data mart design, suitable
for
actual
use, is deployed for a select
group of end users. The
prototyping purpose shifts, as
the
design
team moves design -prototype-design
sub-cycle. Primary objective is to
constrain and /or
reframe end-user
requirements by showing them precisely
what they had asked for in
the previous
iteration. As
difference between stated
needs and actual needs
narrows down over iterations
the
prototyping
shifts towards gaining
commitment to the project at hand
from opinion leaders in
the
end-user
communities to the design,
and soliciting their assistance in gaining
similar
commitment.
3. Deploy:
The
step includes traditional IT system
deployment activities like formalization
of
user
authenticated prototype fo r actual
production use, document development, and
training etc.
Deployment
involves two separate
deployments (i) prototype deployment
into a production test
environment
(ii) Stress- and performance-
tested production configuration
deployment into an
actual
production environment. The phase
also contains the most
important and often
neglected
component of
documentation. Lack of documentation may
stall system operations as
management
274
people
can not manage what they
don't know. Also, it may ultimately be
used for educating
the
end
users, prior to roll
out.
4.
Operation: The
phase includes data
warehouse/mart daily maintenance
and management
activities. The
operations are performed to maintain
data delivery services and
access tools, and
manage
ETL processes that keep
the data warehouse/mart
current with respect to the
authoritative
source
system.
5.
Enhancement: The
step involves modifications of physical
technological components,
operations
and management processes
(ETL etc.) and logical
schema diagrams in response
to
changing
business requirements. In situations of
discontinuous changes, enhancement
moves
back
into the fundamental design
phase.
275
Table of Contents:
|
|||||