|
|||||
Lecture
No. 33
DWH
Implementation: Goal Driven
Approach
(Lecture based
on "The data warehousing toolkit by
Ralph Kimball and Margy
Ross, 2nd
Edition)
Business
Dimensional Lifecycle: The
Road Map Ralph Kimball's
Approach
Implementing a
data warehouse requires
tightly integrated activities. As we
discussed earlier,
there
are differe nt DWH
implementation strategies, we will be
following Kimball's
Approach.
Kimball is
considered as an authority in the DWH
field, and his goal driven
approach is a result
of decades of
practical experience. This presentation
is a an overview of a data warehouse
project
lifecycle, based
on this approach, from inception through
ongoing maintenance, identifying
best
practices at
each step, as well as potential
vulnerabilities. It is believed that
everyone on the
project team,
including the business
analyst, architect, database
designer, data stager, and
analytic
application developer,
needs a high-level understanding of
the complete lifecycle of a
data
warehouse.
Technical
Product
Architecture
Selection
&
Design
Installation
Maintenance
Deployment
Business
Project
Physical
Dimensional
Data
Staging
and
Requirement
Design &
Modeling
Planning
Design
Growth
Development
Definition
Analytic
Analytic
Application
Application
Specification
Developmen
Project
Management
Figure
-33.1: Business Dimensional
Lifecycle (Kimball's Approach)
The
business dimensional lifecycle framework,
as shown in Figure 33.1, is depicted as a
road
map,
that is extremely useful if we're
about to embark on the
unfamiliar journey of data
warehousing.
The Kimball's iterative data
warehouse development approa ch drew on
decades of
experience to
develop the business
dimensional lifecycle. The name
was because it reinforced
several of
key tenets for successful
data warehousing. First and
foremost, data warehouse
projects
must
focus on the needs of the
business. Second, the data
presented to the business
users must be
dimensional.
Finally while data
warehousing is an ongoing process, each
implementation project
should
have a finite cycle with a
specific beginning and end.
Ongoing project management
serves
as a foundation
to keep the remainder of the
lifecycle on track.
276
DWH
Lifecycle: Key
steps
1. Project
Planning
2.
Business Requirements
Definition
3.
Parallel Tracks
3.1
Lifecycle Technology
Track
3.1.1
Technical
Architecture
3.1.2
Product Selection
3.2
Lifecycle Data
Track
3.2.1
Dimensional
Modeling
3.2.2
Physical Design
3.2.3
Data Staging design and
development
3.3
Lifecycle Analytic Applications
Track
3.3.1 Analytic
application specification
3.3.2 Analytic
application development
4.
Deployment
5.
Maintenance
Lifecycle
Key Steps
Lifecycle
begins with project planning during
which we assess the
organization's readiness for
a
data
warehouse initiative, establish
the preliminary scope and
justification, obtain resources,
and
launch
the project.
The
second major task focuses on
business requirements definition.
The two -way arrow
between
project planning
and business requirements
definition (as shown in Figure 33.1)
shows the much
interplay
between these two activities.
Data warehouse designers
must understand the needs
of
the
business and translate them
into design considerations.
Business users and their
requirements
have an
impact on almost every
design and implementation decision
made during the course of
a
warehouse
project. In road map, this
is reflected by the three
parallel tracks that
follow.
The top
track deals with technology.
Technical architecture design establishes
the overall
framework to
support the integration of multiple
technologies. Using the capabilities
identified in
the
architecture des ign as a
shopping list, we then
evaluate and select specific
products.
The
middle track emanating from
business requirements definition
focuses on data. We begin by
translating
the requirements into a dimensional model
which is then transformed into a
physical
structure.
Physical design activities focus on
performance tuning strategies, such as
aggregation,
indexing, and
partitioning. Last but not
least, data staging Extract
-Transform-Load (ETL)
processes
are designed and
developed.
The
final set of tasks spawned
by the business requirements
definition is the design
and
development of
analytic applications. The
data warehouse project isn't
done when we deliver
data.
Analytic applications, in the
form of parameter-driven templates and
analyses, will satisfy
a
large percentage
of the analytic needs of
business users.
277
Note:
1. Equally sized
boxes (as shown in Figure 33.1)
don't represent equally sized
efforts, there
is a vast
difference in the time and
effort required for each major
activity
2. Data
warehousing is an ongoing process, each
implementation project should have
a
cycle
with a specific beginning and an
end.
DWH
Lifecycle- Step 1:
Project Planning
§
Assessing
Readiness
§
Factors
§
Business
sponsor
§
Business
motivation
§
Feasibility
§
Business/IT
relationship
§
Culture
§
Scoping
The
DWH lifecycle begins with
the project planning phase. It
consists of multiple activities
that
must be
performed before proceeding
ahead in the lifecycle. Let's
discuss the planning phase
in
detail;
Readiness
and risk assessment: Before
proceeding ahead with significant
data warehouse
expenditures, it
is prudent to assess the
organization's readiness to proceed.
Five factors have
been
identified as leading indicators of data
warehouse success; any shortfalls
represent ris ks or
vulnerabilities.
Brief description in rank order of
importance follows.
Business
Sponsor: It is the
most critical factor for successful
data warehousing.
Business
sponsors should
have a clear vision for
the potential impact of a data warehouse
o the
n
organization.
They should be passionate
and personally convinced of the
project's value while
realistic at
the same time. Optimally,
the business sponsor has a
track record of success with
other
internal
initiatives. He or she should be a
politically astute leader
who can convince his or
her
peers to
support the
warehouse.
Business
motivation: The
second readiness factor is having a
strong, compelling business
motivation
for building a data
warehouse. This factor often
goes hand in hand with
sponsorship.
A data
warehouse project can't merely deliver a
nice-to-have capability; it needs to solve
critical
business
problems in order to garner the resources
required for a successful launch
and healthy
lifespan.
Feasibility:
There
are several aspects of feasibility,
such as technical or resource
feasibility, but
data feasibility
is the most crucial. Are we collecting
real data in real operational source
systems
to support
the business requirements?
Data feasibility is a major
concern because there is
no
short
-term fix if we're not
already collecting reasonably clean
source data at the right
granularity.
278
Business/IT
relationship: The
fourth factor focuses on the
relationship between the
busi ess and
n
IT
organizations. In your company, does
the IT organization understand and
respect the business?
Conversely,
does the business understand
and respect the IT organization?
The inability to
honestly
answer yes to these
questions doesn't mean that
you can't proceed. Rather,
the data
warehouse
initiative can be an opportunity to mend
the fence between these
organizations,
assuming
that both deliver.
Culture: The
final aspect of readiness is
the current
analytic culture within your
company. Do
business
analysts make decisions
based on facts and figures, or
are their decisions based
on
intuition,
anecdotal evidence, expert
judgment or gut feeling?
Scoping:
Requires
the joint input of both the
IT organization and business management.
The
scope should be
both meaningful in terms of its
value to the organization and
manageable. When
you are
first getting started, you should focus
on data from a single business
process. Save the
more
challenging, cross-process projects for a
later phase. Sometimes project teams
feel that the
delivery
schedule is cast in concrete before
project planning is even initiated. The
prioritization
process
can be used to convince IT and
business management that
adjustments are required.
Finally,
remember to avoid the law of
too when
scoping -too firm of a
commitment to too brief of
a timeline
involving too many source
systems and too many
users in too many locations
with too
diverse analytic
requirements.
DWH
Lifecycle- Step 1:
Project Planning
§
Justification
(cost vs. benefit)
§
Team
development
§
Project
Plan
§ Identifying
all tasks.
§ User
acceptance, milestones and
deliverables.
§ Making
and following a communication
matrix.
§ Avoiding
scope creep.
§ Partnership
with business user.
§
Keys to project
planning & Management
§ Business
sponsor
§ Scope
§ Best
team
§ Excellent
project manager
Justificati
on requires an
estimation of the benefits and
costs associated with a data
warehouse.
The
anticipated benefits grossly
outweigh the costs. IT
usually is responsible for
deriving the
expenses.
You need to determine
approximate costs for the
requisite hardwa re and software.
Data
warehouses
tend to expand rapidly, so be
sure the estimates allow
some room for
short-term
growth.
Team
Development (Staffing): Data
warehouse projects require the
integration of a cross
functional
team with resources from
both the business and IT
communities. From the
business
side of
the house, the roles
needed are business sponsor,
business driver, business
lead and
279
business
users. Several other positions are
staffed from either the business or IT
organizations.
These
are business system analyst,
business subject matter
expert, analytic application
developer
and
data warehouse educator. The
roles typically staffed from the IT
organization (or an external
consulting
firm) are project manager,
technical architect, technical
support specialists,
data
modeler,
database administrator, metadata
coordinator, data steward, data
staging designer,
data
staging
developer, and data warehouse
support.
Establishing
the core team
Hardest to
find talent.: The
talent you need on the core
team is the hardest to
find
1. Part-time:
Data modeler
and database analyst
(DBA)
2. Full-time:
Experienced
& educated people with
least one successful
implementation.
3. Full-time:
(i)
DSS Data analyst (ii)
4GL programmer with information
center or end -user
support
background (iii) experienced
datacentric developer.
·
Data
engineers to get started and
create potential.
·
Data
analysts or data usage
specialists to stay in
business.
So,
who should be on this small
core team? Typically, the
firs t talent hired is a data modeler
and
the
second is a database analyst or
DBA. While you need these
data specialists, you don't
need
them as
fulltime, permanent participants of
the core team. The
best individuals at the core
of your
team
are people who intimately understand
the broad issues of a complete
data warehouse
program.
Ideally, they studied the
field extensively and participated in at
least one successful
project. These
folks are hard to come
by.
Your
next choice should be those
with experience of packaging
data for consumer use: a
data
analyst
with DSS experience, a 4GL
programmer with information center or
end -user support
background, a
data centric developer with real
experience using spiral or interactive
methods.
You
need data engineers to get
started and to add new data
resources to the warehouse.
They
create
the potential. You will need
data analysts or data usage
specialists to stay in business.
They
deliver
results.
Project
Plan: Developing
the data warehouse project plan
involves identification of all the
tasks
necessary to
implement the data warehouse.
The project plan should identify a
user acceptance
checkpoint after
every major milestone and
deliverable to ensure that
the project is still on
track
and
that the business is still
intimately involved. The
data warehouse project demands
broad
communication.
During the project planning
phase a communication matrix is helpful
to make
certain
that the communication strategy is
executed. Data warehouse
projects are vulnerable
to
scope
creep largely due to our
strong desire to satisfy
users' requirements. We need to be
most
watchful
about the accumulation of
minor changes that snowball.
The most important thing
to
remember
about scope changes is that
they shouldn't be made in an IT vacuum.
The right answer
280
depends on
the situation. Now is the
time to leverage your partnership with
the business to
arrive
at an answer
with which everyone can
live.
The
keys to data warehouse project planning
and management include:
1. Having a
solid business sponsor
2. Balancing
high value and doability to
define the scope
3. Working
with the best team
possible to develop a detailed project
plan
4. Being an
excellent project manager by motivating,
managing, and communicating
up,
down,
and across the organization
.
DWH
Lifecycle-
Step: 2 Requirements
Definition
§
Requirements
preplanning
§
Requirements
collection
§
Post
collection
The
second phase in the DWH
lifecycle is requirements definition. Performing a
requirements
analysis is
critical to the success of
any project. W ithout a clear goal in mind,
success is dubious.
Establishing a
broad view of the business'
requirements should always be the
first step. The
understanding
gained will guide everything
that follows, and the
details can be filled in for
each
phase in turn.
The double headed arrow
between the planning and
requirements definition
phase
indicates
that user requirements
drives not only the
succeeding phases but the preceding
phase as
wee.
The changes in user
requirements may effect the
project plan. This phase is accomplished
in
three
steps
Requirements
preplanning: This
phase consists of activities like
choosing the forum,
identifying
and preparing the requirements
team and finally selecting,
scheduling and preparing
the
business representatives.
Business
requirements collection: The
requirements collection process flows
from an
introduction
through structured questioning to a final
wrap-up. The major activities involved
are,
launching,
determining interview flow, wrapping up
and conducting data centric
interviews.
Post
collection: The
phase consists of steps like
debriefing, documentation, prioritization
and
consensus.
Each of
the phases is discussed in detail in
the following slides
DWH
Lifecycle-
Step: 2 Requirements
Definition
Requirements
Preplanning
§
Forum
§ Interviews
§ Facilitated
sessions
§ Hybrid
281
§
Requirements
team
§
Business
representatives
§ Selecting
§ Scheduling
§ Preparing
Requirements
Preplanning
Before sitting
down with the business
community to gather requirements, it is
suggested to set
you up for a
productive session by considering
the following:
Choose
the Forum: There
are two primary techniques
for gathering requirements i.e.
interviews
or facilitated
sessions. Both have their
advantages and disadvantages.
Interviews encourage
lots
of individual
participation. They are also
easier to schedule. Facilitated
sessions may reduce
the
elapsed time to
gather requirements, although they
require more time commitment from
each
participant.
Kimball prefers using a
hybrid approach with
interviews to gather the
gory details
and
then facilitation to bring
the group to consensus.
However, the forum choice
depends on the
team's
skills, the organization's culture,
and what you've already
subjected your users to.
This is a
case in
which one size definitely
does not fit all.
Identify
and Prepare the Requirements
Team: Regardless of
the approach, you need to
identify
and
prepare the project team
members who are involved. If
you're doing interviews, you need
to
identify a
lead interviewer whose
primary responsibility is to ask the
great open -ended
questions.
Meanwhile,
the interview scribe takes
copious notes. Before you
sit down with users, you
need to
make
sure you're approaching the
sessions with the right
mindset. Since the key to
getting the
right
answers is asking the right
questions, we recommend that
questionnaires be formulated
before
user meetings. It is a tool to organize
your thoughts and serve as a
fallback device in
case
your
mind goes blank during the
interview session.
Business
Representatives Soliciting: If this is
first foray into data
warehousing, talk to
business
people
that represent horizontal breadth
across the organization. This coverage is
critical to
formulating
the data warehouse bus
matrix blueprint. Within the target
user c mmunity, one
o
should
cover the organization
vertically.
Scheduling:
Schedule
representatives nicely. The scheduler
needs to allow ½ hour
between
meetings
for debriefing and other
necessities. Interviewing is extremely
taxing because you must
be completely
focused for the duration of
the session. Consequently, it is
recommended to
schedule
three to four sessions in a
day because the interviewers
get very tired after that,
and
productivity
goes down.
Preparing:
The
optimal approach is to conduct a project
launch meeting with the
users. The
launch
meeting disseminates a consistent
message about the project.
The interview team
must
prepare
the interviewees by highlighting the
topics to be covered in the upcoming
session. It is
advised that do
not include a copy of the questionnaire,
which is not intended for
public
dissemination.
One can ask the interviewees
to bring copies of their key
reports and analyses.
DWH
Lifecycle-
Step : 2 Requirements
Definition
Requirements
Collection
282
§
Launch
§
Interview
flow
§ Rules
for effective
interviewing
§ Role
§ Flow
§ Consistency
§ Vocabulary
§
Wrap
up
§
Conducting data
centric interviews
Launch:
Responsibility
for introducing the interview
should be established prior to gathering
in a
conference room.
The designated kicko ff person should
script the primary points to
be conveyed
in the
first couple of minutes. The
introduction should convey a crisp,
business-centric message
rather
than rambling on about the
hardware, software, and other
technical jargon.
Interview Flow:
The
objective of an interview is to get
business users to talk about
what they do,
and
why they do it. Determining how
they track progress and
success translates directly into
the
dimensional
model. If we're meeting with a
person who has more
hands-on data experience,
we
indirectly probe
to better understand the dimensionality
of the business, along with
hierarchical
roll
-ups. If the interviewee is
more analytic, we ask about
the types of analyses currently
being
performed. If we
are meeting with business
executives , we usually don't
delve into the
details,
and
just ask about their vision
for better leveraging
information in the
organization.
Rules
for Effective interviewing
include: (i)
Remember your interview role
and listen and
absorb
like a
sponge. (ii) Strive for a
conversational flow, don't dive too
quickly. (iii) Verify
communication
and capture terminology
precisely because mostly its
inconsistent (iv) establish
a
peer
basis with the interviewee;
use his or her
vocabulary.
Wrap-up:
At
the conclusion of interview,
ask each interviewee about
his success criteria for
the
project which
should be measurable. At this point,
the interviewees must understand
that just
because we
discussed a capability in the meeting
doesn't guarantee that it'll
be included in the first
phase of
the project. We thank interviewees
for their brilliant insights
and let them know
what's
happening next
and what their involvement
will be.
Conducting
Data -Centric Interviews:
While
we're focused on understanding
the requirements o f
the
business, it is helpful to intersperse
sessions with the source
system data gurus or
subject
matter
experts to evaluate the feasibility of
supporting the business needs.
These data-focused
interviews
are quite different from the
ones just described. The
goal is to assess that the
necessary
core
data exists before momentum
builds behind the requirements. A more
complete data audit
will
occur during the dimensional
modeling process. We're trying to learn
enough at this point
to
manage
the organization's expectations
appropriately.
DWH
Lifecycle-
Step : 2 Requirements
Definition
§
Post
Collection
§
Debriefing
§ Same
"frequency"
283
§
Fill-in
gaps
§
Review
reports/material
§
Documentation
§ Meaningful
write-up of each individual
interview.
§ Identify
key business process
§ Requirements
of key processes
§ Process x
group/user matrix
Debriefing:
Immediately
following the interview, the
interview team should
debrief. You want to
ensure
that you're on the same page
about what was learned, as
well as being prepared fo r
any
surprises or
inconsistencies. It is also helpful to
review your notes quickly to
fill in any gaps
while
the interview is still fresh
in your memory. Likewise, you
should examine the
reports
gathered to gain
further offline insight into
the dimensionality that must be
supported in the data
warehouse.
Documentation:
While
documentation is everyone's least
favorite activity, it is critical for
both
user
validation and project team
reference materials. There are
two levels of documentation
that
typically result
from the requirements
process. The first is to
write up each individual
interview.
This
activity can be quite time -consuming
because the write-up should not be merely
a stream -of-
consciousness
transcript but should make sense to
someone who wasn't in the
interview. The
second
level of documentation is a consolidated
findings document. Organize
the document by
first
identifying the key business
processes. Consequently, it is logical to
organize the
requirements of
the business into the
same buckets that will, in
turn, become implementation
efforts.
Sometimes the processes are
brought together in a matrix to convey the
opportunities
across
the organization. The rows of
the opportunity matrix
identify the business
processes while
the columns
identify th organizational groups or
functions.
e
Prioritization
and Consensus: Four
Cell Quadrant
Technique
The
requirements findings document
serves as the basis for
presentations back to
senior
management
representatives. The requirements
wrap-up presentation is positioned as a
findings
review
and prioritization meeting.
Once the findings have been
reviewed, it is time to
prioritize.
The
four-cell quadrant technique (shown in
Figure 33.2) is an effective tool for
reaching consen-
sus on a
data warehouse developme nt plan that
focuses on the right initial
opportunities. The
quadrant's
vertical axis refers to the potential
impact or value to the
business. The horizontal
axis
conveys
feasibility of each of the
findings.
284
Figure
-33.2: The quadrant
method
So the proc
ess having higher
feasibility and impact is
given higher priority over
the process
having
lower feasibility and impact. In
example of Figure 33.2, process A
has highest priority
while
the process D has lowest
priority.
285
Table of Contents:
|
|||||