|
|||||
Lecture
No. 35
DWH
Life Cycle: Pitfalls,
Mistakes, Tips
In this
lecture we will discusses
the problems, troubles and
mistakes that commonly occur
while
building a
data warehouse. We will
discuss things to do, and
also things not to do. We
will
discuss
ways to avoid common
mistakes that may halt data
ware housing process.
5 Signs of
trouble
1.
Project
proceeded for two months
and nobody has touched the
data.
2.
End
users are not involved
hands-on from day one
throughout the program.
3.
IT team
members doing data design
(modelers and DBAs) have
never used the acc
ess
tools.
4.
Summary
tables defined before raw atomic
data is acquired and base
tables have been
built.
5.
Data
design finished before participants have
experimented with tools and
live data.
Signs of
trouble
First of all we
will discuss "5 signs of
trouble". Any of these signs
if present, will serve as a
key
indicator that
the data warehousing project is under
threat. The following
situations indicate a
project in
trouble:
·
The project
has proceeded for two
months and nobody has even
touched the data. Befo
re
even embarking
on the project, the team
should have had a thorough
understanding and
look
and feel of the data. As I
always tell my students,
"know your data
intimately".
·
The
future consumers are not
involved hands-on from day
one throughout the program.
Working in
isolation will result in systems
that no one will accept,
and consequently none
is going to
use, resulting in negative marketing for you
and your company. Thus
avoid
this at
all costs.
·
The
team members doing data
design (modelers and DBAs)
have never used the
access
tools.
You need experienced
campaigners not "green apples" or raw
hands. You need
people
who know the job, not
those who will learn on the
job, and in the process
sink the
ship.
·
The
summary tables are defined
before the raw atomic data
is acquired and base
tables
have
been built. A converse
process has been followed,
which again is a recipe
for
disaster.
·
The
data design is finished
before participants have
experimented with the tools
and live
data. As we
have discussed at length in lecture
no. 33, involve the
business users from
the
very beginning,
get user requirement
definition, record it and
follow it.
11-Possible
Pitfalls
§
1. Weak
business sponsor: Getting stuck by
office politics, need CXO on
your side.
§
2. Not
having multi ple servers : Penny
wise pound Foolish. (i) Both
going down (ii)
Performance
degradation
299
§
3. Modeling
without domain
expert:
§
4. Not
enough time for ETL:
Giving too
little time or over complicating by
including
everything
conceivable. Users will
forgive:
§ Less
Formatting, slow system, few
features, few reports BUT
NOT incorrect
results
11-Possible
pitfalls in DWH Life Cycle
& Development
Many early
data warehousing projects failed,
having fallen into one or
more of the traps we
will
discuss.
These pitfalls are still
difficult to avoid, unless
those steering the project
are able to
understand
and anticipate the
associated risks.
1. Weak
business sponsor
This
phase often turns out to be
the trickiest phase of the
data warehousing implementation and
is
also
the Part-II(a) of your
semester project. Because
data warehousing by definition includes
data
from
multiple sources spanning
many different departments
within the enterprise.
Therefore,
there
are often political battles
that center on the
willingness of information sharing.
Even though
a successful
data warehouse benefits the
enterprise, there are
occasions where departments
may
not feel
the same way. As a result of
unwillingness of certain groups to
release data or to
participate in
the data warehousing requirements
definition, the data
warehouse effort either
never
gets
off the ground, or could not get
started in the right
direction defined originally.
When this
happens, it
would be ideal to have a strong business
sponsor. If the sponsor is at
the CXO level
(X:
Information, Knowledge, Financial etc),
he/she can often exert
enough influence to make
sure
everyone
cooperates.
2. Not
having multiple
servers
This is a
classical example of penny
wise and pound foolish. To
save capital, often
data
warehousing
teams will decide to use
only a single database and a
single server for the
different
environments
i.e. development and production.
Environment separation is achieved by
either a
directory
structure or setting up distinct
instances of the database.
This is awkward for
the
following
reasons:
·
Sometimes it is
possible that the server
needs to be rebooted for the
development
environment.
Having a separate development environment
will prevent the
production
environment
from being effected by this.
·
There
may be interference while
having different database environments on
a single
server. For
example, having multiple
long queries running on the
development server
could affect
the performance on the production server, as both
are same.
3. Modeling
without domain
expert
It is essential
to have a subject -matter expert as
part of the data modeling
team. This person
can
be an outside
consultant or can be someone in
-house with extensive industry
experience. Without
this
person, it becomes difficult to
get a definitive answer on
many of the questions, and
the entire
project gets
dragged out, as the end
users may not always be
available.
4. Not
enough time for
ETL
This is
common everywhere, ETL getting
the least time, remember
data is always dirtier than
you
think. There is
a tendency to give this particular
phase of DWH too little time
and other
300
resources.
This can prove suicidal to
the project, as the end
users will usually tolerate
less
formatting,
longer time to run reports,
less functionality (slicing
and dicing), or fewer
delivered
reports;
one thing that they will
never ever tolerate is wrong
information.
A second
common problem is that some
people unnecessarily make the
ETL process
complicated. In
ETL design, the primary goal
should be to optimize load speed
without
sacrificing on
quality. This is, however,
sometimes not followed. There
are cases when the
design
goal is to cover
all possible future uses
and possible scenarios, some
of which may be
practical,
while
others just plain impractical.
When this happens, ETL
perfo rmance suffers, and
often so
does
the performance of the entire data
warehousing system.
11-Possible
Pitfalls (continued)
§
5. Low
priority for OLAP Cube
Construction: Giving it
the lowest priority.
§
6. Fixation
with technology: End
users impressed by timely
information NOT
advanced
infra-structure.
§
7. Wrong
test bench: Required
performance NOT on fast production
level machines.
§
8. QA people
NOT DWH literate: Ensure QA
people are educated about
DWH.
5. Low
priority for OLAP Cube
Construction
Make
sure your OLAP cube-building or
pre -calculation process is optimized
and given the
right
priority. It is
common for the data
warehouse to be on the bottom of the
nightly batch loads,
and
after
the loading the DWH, usually
there isn't much time left
for the OLAP cube to be
refreshed.
As a result, it
is worthwhile to experiment with
the OLAP cube generation
paths to ensure
optimal
performance.
6. Fixation
with technology
Just
remember that the end
users do not care how
complex or how technologically advanced
you r
front
end (or for that
matter back-end) infrastructure is.
All they care is that
they should receive
their
information in a timely manner and in
the way they
specified.
7. Wrong
test bench
Make sure
the development environment is very
similar to the production environment as
much
as possible -
Performance enhancements seen on
less powerful machines
sometimes do not
happen on
the larger, production -level
machines.
8. QA
people NOT DWH
literate
As mentioned
above, usually the QA team
members know little abo ut
data warehousing, and
some of
them may even resent
the need to have to learn
another tool or tools. Make sure
the QA
team
members get enough education
about data warehousing so
that they can complete
the testing
themselves.
11-Possible
Pitfalls (continued)
§
9.
Uneducated user: Take
care and address end
-user education needs.
Intuition does not
work.
301
§
10.
Improper documentation: Complete
documentation before developers
leave.
§
11.
Doing incremental enhancements:
Definite no-no.
Dev => QA =>Production
9.
Uneducated user
Take
care and address the
user education needs. There is nothing
more frustrating to spend
several
months to develop and QA the
data warehousing system, only to have
little usage because
the
users are not properly trained
and educated. Regardless of
how intuitive or easy the
interface
may
be, it is always a good idea to
send the users to at least a
one -day training course to
let them
understand what
they can achieve by properly
using the data
warehouse.
10.
Improper documentation
Usually by
this time most, if not all, of
the developers will have
left the project, so it is
essential
that proper
documentation is left for
those who are handling
production maintenance. There is
nothing more
frustrating than staring at
something another person
did, yet unable to figure it
out
due to
the lack of proper
documentation.
Another
pitfall is that the
maintenance phase is usually
boring. So, if there is
another phase of the
data
warehouse planned, start on
that as soon as
possible.
11. Doing
incremental enha
ncements
Because a
lot of times the changes
are simple to make, it is
very tempting to just go ahead
and
make
the change in production. This is a
definite no -no. Many unexpected
problems will pop up
if this is
done. It is very strongly recommend
that the typical cycle of Development
→
QA
→
Production
be
followed, regardless of how
simple the change may
seem.
Pitfall:
Searching for the "Silver
Bullet"
Abandon
completely even the desire
to find a silver bullet.
§
Wasting time on
that one access tool
that will handle all needs
of all users.
§
Many
tools in the market, instead
do a match-making.
§
Beware: Vendors
use ambiguous, confusing, non-standard
nomenclature, which
sometimes
serves their own
purpose.
§
For
any meaningful comparison, evaluate
tools by classifying on the basis
of
functionality.
Pitfall:
Searching for the "Silver
Bullet": Pitfalls of
Selecting the Tools
Abandon
completely even the desire
to find a silver bullet.
Many
data warehouse project teams
waste enormous amounts of time
searching in vain for
a
silver
bullet i.e. a panacea or Amratdhara
.
They believe their mission
is to find the one
access
tool
that will handle all
the needs of all their
users. Don't even try. It
can't be done. One size
does
not fit
all.
302
There is a wide
and eve r-growing diversity
of tools for data access,
exploration, analysis, and
presentation.
The appropriate mission of a data
warehouse or decision support
team is to
understand
these diverse alternatives and properly
match the tool to the
intended usage.
The
names typically applied to tools are
ambiguous and confusing. Generally every
new name
only
obscures the issue more.
What is the definitive
definition of ad hoc query tool,
decision
support
system, executive information system or
online analytic processing?
Some terms, like
EIS,
carry the burden of years of
overzealous selling and underwhelming
results.
To evaluate
tools, you need to slot the
alternative into categories that
allow for meaningful
comparison.
Since the traditional terms
add little discrimi natory power, where do you
turn? The
first
part of the answer is to
create purely functional
categories.
Pitfall:
Extremes of Tech. Arch.
Design
Common
mistake: Attacking the problem from
two extremes, neither is
correct.
§
Focusing on
data warehouse delivery,
architecture feels like a
distraction and
impediment to
progress and often end up
rebuilding.
§
Investing
years in architecture, forgetting
primary purpose is to solve
business
problems, not to
address any plausible (and not so plausible)
technical c hallenge.
Pitfall:
Extremes of Tech. Arch.
Design Data
warehouse teams approach the
technical
architecture
design process from opposite
ends of the spectrum. Some
teams are so focused
on
data
warehouse delivery that the
architectures feels like a distraction
and impediment to progress
and eventually,
these teams often end up
rebuilding. At the other extreme,
some teams want to
invest two
years designing the architecture
while forgetting that the
primary purpose of a
data
warehouse is to
solve business p oblems, not
address any plausible (and
not so plausible)
r
technical
challenge. Neither end of
the architecture spectrum is
healthy; the most
appropriate
response
lies somewhere in the
middle.
Top
10-Common Mistakes to
Avoid...
§
Mistake 1
:
Not
interacting directly with
the end users.
§
Mistake 2
:
Promising an
ambitious data mart as the
first deliverable.
§
Mistake 3
:
Never
freezing the requirements
i.e. being an accommodating
person.
§
Mistake 4:
Working
without senior executives in loop,
waiting to include them after
a
significant
success.
§
Mistake 5:
Doing a
very comprehensive and detailed first
analysis to do the DWH
right
the
very first time.
10-Common
Data warehouse mistakes to
avoid
303
So far you
have been told what to do,
however now we'll balance t
hose recommendations with
a
list of what not
to do. When building and
managing a data warehouse, the
common mistakes to
avoid
are listed. These mistakes
are described as a series of negative
caricatures. The goal is
for
you to learn
from these as George S
ntayana said, "Those who
cannot remember the past
are
a
condemned to
repeat it." Let's all agree
not to repeat any of these
mistakes. Each of the
mistakes
will be
discussed one by one.
Mistake 1:
Not interacting
directly with the end
users; your
job is to be the publisher of the
right
data. To achieve
your job objectives, you must
listen to the business
users, who are always
right.
Nothing
substitutes for direct interaction with
the users. Develop the
ability to listen.
Mistake 2:
Promising an
ambitious data mart as the
first deliverable; these
kinds of data marts
are
'consolidated, second-level marts with
serious dependencies on multiple
sources of data.
Customer
profitability requires all
the sources of revenue and
all the sources of cost, as
well as an
allocation
scheme to map costs onto the
revenue! For the first
deliverable, focus instead on
a
single
source of data, and do the
more ambitious data marts
later.
Mistake 3:
Never
freezing the requirements
i.e. being an accommodating
person ; You
need to
think
like a software developer and
manage three very visible
stages of developing each
data
mart: (1)
the business requirements gathering
stage, where every suggestion is
considered
seriously,
(2) the implementation stage, where
changes can be accommodated~
but must be
negotiated
and generally will cause the
schedule to slip, and (3)
the rollout stage, where
project
features
are frozen. In the second
and third stages, you must
avoid insidious scope creep
(and stop
being such an
accommodating person).
Mistake 4:
Working
without senior executives in loop,
waiting to include them after a
significant
success;
the
senior executives must
support the data warehouse
effort from the very
beginning. If
they don't,
your organization likely will not be
able to use the data
warehouse effectively.
Get
their
support prior to launching the
project.
Mistake 5:
Doing a
very comprehensive and detailed first
analysis to do the DWH right
the very
first
time; Very
few organizations and human
beings can develop the
perfect comprehensive plan
for a
data warehouse upfront. Not
only are the data
assets of an organization too vast and
complex
to describe
completely, but also the urgent
business drivers will change
significantly over the
life
of the
data warehouse. Start with
lightweight data warehouse
bus architecture of conformed
dimensions
and conformed facts, and
then build your data
warehouse iteratively. You
will keep
altering and
building it forever.
Top
10-Common Mistakes to
Avoid
§
Mistake 6:
Assuming
the business users will
develop their own "killer
application" on
their
own.
§
Mistake 7:
Training
users on the detailed features of
the tool using dummy data
and
consider it a
success.
§
Mistake 8:
Isolating
the IT support people from
the end or business
users.
§
Mistake 9:
After DWH i s
finished, holding a planning and
communications meeting
with
end
users.
§
Mistake 10:
Shying
away from operational source
systems people, assuming
they are too
304
busy.
Mistake 6:
Assuming
the business users will
develop their own "killer
application" o n their
own;
Business
users are not application developers.
They will embrace the
data warehouse only if a
set
of prebuilt
analytic applications is beckoning
them.
Mistake 7:
Training
users on the detailed features of
the tool using dummy data
and consider it a
success;
Delay
training until your first
data mart is ready to go
live on real data. Keep the
first
training
session short, and focus
only on the simple uses of
the access tool. Allocate more time
to
the
data content and analytic applications
rather than to the tool.
Plan on a permanent series
of
beginning
training classes and follow
-up training classes as
well. Take credit for the
user
acceptance
milestone when your users
are still using the
data warehouse six months
after they
have
been trained.
Mistake 8:
Isolating
the IT support people from the
end or business users; Data
warehouse
support
people should be physically
located in the business
departments, and while
on
assignment,
they should spend all
their waking hours devoted to
the business content of
the
departments
they serve. Such a relationship
engenders trust and
credibility with the
business
users.
Mistake 9:
After
DWH is finished, holding a
planning and communications meeting
with end
users;
Newsletters,
training sessions, and ongoing pe rsonal
support of the business
community
should be to
gather items for the
first rollout of the data
warehouse.
Mistake 10:
Shying
away from operational source
systems people, assuming
they are too
busy;
Certainly, they
cannot alter their operational proc
edures significantly for passing
data to or from
the
warehouse. If your organization really
understands and values the
data warehouse, then
the
operational
source systems should be
effective partners with you in
downloading the data
needed
and in
uploading cleaned data as
appropriate.
Top
7-Key Steps for a smooth
DWH implementation ...
§
Step-1:
Assigning a
full-time project manager, or
doing it yourself full-time.
§
Step-2:
Consider
handing-off project
management.
§
Step-3:
During
user interview don't go
after answers, let the
answers come to you.
7-Tips to a
smooth data warehouse
implementation
There's a
long list of things not to do in managing a
data warehouse project, but there
are also a
number of
positive, proactive steps
that can increase your
chances of a smooth
implementation.
Resolve to be
open to new ideas, and
seek creative inspiration in radically modifying
your tried -
and-true
practices to fit this new
way of thinking.
1.
Assigning a full-time project
manager, or doing it yourself
full-time
It's
common, and often unavoidable,
that project managers ride herd on
several projects at once.
The
economics of IT resources make
this a fact of life. When it
comes to building a
data
warehouse,
however, don't even think
about it. You are entering a
domain unlike anything
else
you
and your crew have worked on.
Everything about it--analysis, design,
programming, testing,
305
modifications,
maintenance--will be new. You, or whoever
you assign as project
manager, will
have a much
better shot at success if allowed t o
get into that "new"
mode and stay
there.
2. Consider
handing -off project
management
Because
the phases of a data
-warehouse build are so very
different, you do yourself no
disservice
by handing off
to another project manager when a
phase is complete, provided
you adhere to Step
One above.
Why is it reasonable to do this?
First, any phase of a data
warehouse implementation
can be
exhausting, from a project
management standpoint. From the
deployment of physical
storage to
implementing the Extract -Transform-Load,
from designing and
developing schemas to
OLAP,
the phases of a warehouse
build are also markedly
different from one another.
Each not
only could
use a fresh hand, management -wise, but a
fresh creative perspective. Handing
off
management not
only do esn't necessarily hurt, it
may even help.
3. During
user interviews don't go
after answers let the
answers come to
you.
This is
important enough to be an article in itself. You
must understand, going into
the design
process
that your potential warehouse
users aren't going to be
able to clearly articulate what it
is
they
want the warehouse to do for
them. They're going to have
to explore and discover it as they
go--and so
will your development team, in conducting
interviews. Make your interviews
open -
ended,
with lots of note -taking,
and have your development -team
interviewer's focus more on
the
consequences of
processes than the processes
themselves.
Since
you're conducting these interviews in
order to get some idea of
what data to store and
how
to efficiently
store it, you need to (in
partnership with your users)
come up with new ways to
look
at data,
not process it. You're
trying to find potential information
that can be gleaned, not
from
transactional
data itself, but from the
information behind it: the
rise and fall of numbers
over time,
etc.
Don't chase answers in these
interviews. Let answers come to
you.
Top
7-Key Steps for a smooth
DWH implementation
§
Step-4:
Assigning
responsibilities to oversee and ensure
continuity.
§
Step-5:
Accept
the "fact" t hat DWH
will require many iterations before it is
ready.
§
Step-6:
Assign
significant resources for
ETL.
§
Step-7:
Be a
diplomat NOT a technologist.
4.
Assigning responsibilities to oversee
and ensure
continuity.
These
don't need to be full -time
assignments, but because the
phases of a data
warehouse
implementation
differ so greatly, you're
going to need people out
there assuring continuity.
There
are
three important areas: (i)
architecture, (ii) technology,
and (iii) business. Assign
an
architecture
lead to ensure that the
generally agreed-upon architecture of
the data warehouse,
from
the physical level on up, is
maintained throughout the project. A technology lead
should be
appointed,
because your developers and
key users will all be
using tools they 've
never used
before--someone
needs to oversee the
deployment and consistent
use of these tools.
Finally,
the business needs that
will be met through use of
the warehouse must be
carefully
observed
and documented, to spur
continued development. Since
the analytics and metrics to
be
306
derived
from the process are
developed over time, by users
who will not necessarily
communicate
well
with one another, someone
must watch this development,
encourage its continuation,
and
nurture it into
progressing to higher levels.
5. Accept
the "fact" that DWH will
require many iterations
before it is ready.
A data
warehouse will never, ever be
right the first time.
Why? You don't know what
you're
really
looking for until you see
it. Or, to say it more
precisely, the ultimate users of
the system
won't
know what they're really
going to use it for until
they've used it for awhile.
As contrary as
that
may seem to all that
you've sworn by throughout your career,
it really is the way to
go:
business
intelligence is an infant scie nce,
and different for every
company.
You'll
have to fish around for the
right data in the right
format, and things will
change often. BI
is very
"personal," unique to your environment, your
market, and your
partnerships. What
does
this
mean? First of all, it means
you need to lock your
database administrator in a room
somewhere
and break the news
that the data warehouse
data structures are going to
change and
change
and change, as will the
ETL procedures. There is no
way around this. Make your
peace
with it
now, and save both yourself
and the DBA a lot of
stress.
6. Assign
significant resources for
ETL
You're
going to be stepping in it again
and again as you wade
through oceans of old data,
in old
databases, on
old magnetic tape, from
remote sources. Much of it
will be dirty. Much of it
will be
hard to get
to. You're going to be doing
a lot of this, and you're
going to be devising
ETL
procedures to
seek out and retrieve
information like this
forevermore. You do yourself
and the
project a great
service by establishing a method of
doing this right the
first time. Have
your
development
people put in the extra time to explore
old data thoroughly, characterize
"dirty" data
issues
realistically, and to design and
implement robust extraction and
transformation procedures
exhaustively.
The ETL portion of a data
warehouse can consume as
much as 80 percent of
your
total project
resources! Make sure you spend
wisely.
7. Be a
diplomat NOT a
technologist
The
biggest problem you will
face during a warehouse
implementation will be people,
not the
technology or
the development. You're going to
have senior management
complaining about
completion dates
and unclear objectives. You're
going to have development people
protesting
that
everything takes too long
and why can't they do it
the old way? You're going to
have users
with
outrageously unrealistic expectations,
who are used to systems
that require
mouse-clicking
but not much
intellectual investment on their part.
And you're going to grow
exhausted,
separating out
Needs from Wants at all
levels. Commit from the
outset to work very hard
at
communicating
the realities, encouraging
investment, and cultivating
the development of new
skills in
your team and your
users (and even your
bosses).
Most of all,
keep smiling. When all is
said and done, you'll
have a resource in place
that will do
magic,
and your grief will be
long past. Eventually, your
smile will be effortless and
real.
Conclusions
§
DWH is
not simple.
§
DWH is
very expensive.
§
DWH is
not ONLY about
technology.
307
Table of Contents:
|
|||||