|
|||||
§
DWH
designers must be capable of
working across the
organization.
§
DWH
team requires a combination of many
experiences and
expertise.
Conclusions
By now you
have realized that a
building a data warehouse is not an
easy task. DWH are
very
expensive to
build, with the average
cost of a system valued at
around US$ 2 million. Hence,
the
right
people, methodology and experience is
critical. The dependence on technology is
only a
small part in
realizing the true business
value buried within the
mountain of data collected
and
stored
within organizations business
systems and operational databases.
Data warehouses touch
the organization
at all levels, and the
people that design and
build the data warehouse
must be
capable of
working across the organization at
all leve ls as well, thus
communication skills of the
people
are of utmost importance. Thus
the key requirements are
industry and product
experience
of a diverse
team, coupled with a business
focus and proven methodology.
This will make
the
difference
between just a functional
system and true success
story.
Lecture-36
Course
Project
This will
be a group project, consisting of
not more than three
students per
group.
308
Two
Parts
·
Part -I outline
given in 1st lecture. (about
coding)
·
Part
-II b riefly mentioned in
33rd lecture. (about planning
and req definition)
Objective:
Give a
head-start by identifying the
work in advance.
Approach:
Combine both
parts.
The
course project consists of
two parts, and both
parts have already been
briefly touched upon
in
the
earlier lectures. The first
part of the project was discussed in
the very first lecture and
we call
it Part
-I, while the second
part of the project was
discussed in lecture no. 33
and we will call it
Part
-II. In this lecture, we
will discuss both parts in detail.
The purpose of an early discussion
was
to give you a
head start on the projects
so that you start thinking/working
about them, such as
polishing
your programming skills for
Part -I and identifying the
organization for Part -II.
Our
approach
will be to combine both parts of
the projects to come up with
a single semester
project.
Part
-I
1. Code
the BSN Method for finding
siblings and duplicates
(Lect-20).
2. Use
4GL or a high level programming
language.
3. Must have
GUI for I/P and
O/P
4. For input,
use data of Lahore campus
provided as part of lab
work.
5. Submit project
report_1.
Part -I of
the project basically deals
with the implementation of the
BSN (Basic Sorted
Neighborhood)
Method that we discussed in lecture
no. 20. You already have
the paper that
was
discussed in
the said lecture. As you
know that the BSN
method can be used for
dedupication i.e.
removal of
duplicates, this is one
aspect that you will have to
code and test. The other
application
of BSN
will be to identify the siblings, people
who have the same
father i.e. they are
brothers or
sisters of
each other. To implement this
part of the project, you can
either use a 4GL such
as
PL/SQL or any
other high level programming
language such as C++, or
Java. Your application
must
have a GUI (Graphical User Interface),
console based application will
not be acceptable.
The
user must have the facility
of reading from a data source,
which could be a text file or
a
database or a
spreadsheet. In case of a database or
spread sheet, OLE DB connectivity
must be
supported by
your application.
There
can be basically two options
for the data input. Either you
can use data collected as
part of
Project-II
(after approval of instructor) or use
the text files supplied
for the Lahore campus as
part
of the
SQL DTS (Data Transformation
Services) Lab Lectures. If you
use data other than
the lab
data, it
must have a dozen relevant columns
and several thousand rows
too.
309
Part-II:
Implementation Life-Cycle
The
second part of the semester project
i.e. Part -II deals
with the DWH implementation
life cycle
that we
have already discussed in
great detail in lectures 32-35. In
lectures 33 and 34 we
discussed
the Ralph Kimball's approach, you
are not required to do any development
and
deployment
work in project Pa rt-II.
During the lectures, the
DWH life cycle road -map
was
divided
into three parts, you only
have to cover these parts
i.e. (i) project planning
(ii) user
requirement
definition and (iii) three
parallel tracks. You are NOT
required to discuss or do DWH
deployment or do
any analytics
development.
Part-II(a):
Identify Organization
1. Do a complete
data warehouse implementation life
cycle study (and design) as
discussed
in lectures
32-35.
2. Identify a
large company/organization that is a
prime candidate for a
DWH.
3. What is a
large company/organization? Lot of
customers and large number of
transactions.
4. Submit
report_2 giving and explaining 4-reasons
for selecting a
company.
5. Get
the company/organization selected
approved by the instructor before
proceeding
ahead.
As stated
earlier, the lifecycle study
for project Part -II ends
after the three parallel
tracks of the
Kimball's
approach; this was also
stated in lecture 33. You
are not required to do any
deployment
or the
study of deployment. Part
-II of the project consist of
two parts i.e. Part
-II(a) which consists
of
identification of a company for
the lifecycle study, and
Part -II(b) actually doing
the lifecycle
study. To do
Part -II(a), you have to
identify a large company or an organization,
this could be
private or
government owned, but must be a prime
candidate for such a
study.
A shop at
the corner of your street
with ship-owner being the
salesman too is not a prime
target
for a
DWH. He may actually do not even
need a database system. So
what is a large company or
an organization?
Well one which has large
number of customers and large
number of
transactions. So
what is large is this circular logic or a trick
question? Neither. Large customers
mean
tens of thousands of customers
and similarly tens and thou
sands of transactions per
week.
Note that what
we call large on our country, may be
small for the developed
world. Once you
have
identified such a company,
submit a report, we call this report_2.
The report should
have
four
reasons why you have
selected a particular company or organization.
What could be those
four
reasons?, this is a good question,
the four reasons are
(i) the number of customers
(ii) the
number of
transactions (iii) typical early adopter
(from next slide) and (iv)
any other reason.
Once
you
have submitted the report, you
can not just by default move
on to Part -II(b) of the
project.
The
company selected must be
approved by the instructor before you
can proceed ahead.
Large
and Typical Early
Adopters
310
1.
Financial
service/insurance.
2.
Telecommunications.
3.
Transportation.
4.
Government.
5.
Educational.
Here you would
be asking the question, what is
meant by an early adopter? Will
discuss this at
the
end of the lecture, but if
you can't wait, please
check fig -36.1. For a
developing country
like
ours, we
are in the phase of talking
about typical early adopters of DWH.
However, in the
developed world;
this stage is no longer
there for many large
companies, as DWH are now in
the
mainstream.
The types of organizations
and business listed are
typically thos e that have a large
number of
customers and large number of
transactions.
Example
DWH Target
Organizations
·
Financial
service/insurance.
Union Bank
State Bank of
Pakistan
·
Telecommunications.
UFone
PTCL
PAKNET
·
Transportation.
PIA
·
Government.
NADRA
For
example, as per www.paksearch.com
Muslim
Commercial Bank has 900+ branches
all over
Pakistan.
With an average of 500 customers per
branch, the total number of customers is
in the
order of
half a million. It would not
be surprising if the weekly ATM
transactions all over
Pakistan
run into millions. Such
banks are potential
candidates for a data
warehouse. Same is
true
for
telecommunication companies. As per
recent government figures, there are 10+
million
mobile
phone users in Pakistan, and
as per www.fdi.com
the number of
mobile phone users of
Mobilink is
3.7 million. Again, it would
not be surprising to have literally
millions of mobile
phone calls
made/received per day. So these
businesses fall under the
category which you
should
be looking at to
select and study as part of
your semester project.
NADRA
(National Database and Registration
Authority) probably has the
largest data
warehouse
in Pakistan
and is the repository of the 1 98
census which covered the
entire population of
9
Pakistan,
which at time stood at 130 millions. As
part of the census (source:
www.nadra.gov.pk)
64 million
NDFs or National Data Forms
were collected and scanned which
are presently stored
in a 4.2
Tera byte DWH in
NADRA.
311
Part-II(b):
Life-Cycle Study...
Project
Planning
§ By now you
already know the
company.
§ Prepare a
questionnaire (at least 15
non-trivial questions).
§ Identify
and contact a key person
who will help you.
Submit
report_3
User
requirement definition
Set an
appointment to meet business
users.
§ Collect
answers to questions.
§ Get a
copy of sample input.
§ Get a
copy of sample output.
§ Compile
report of interview.
§ Identify
business processes.
§ Identify
problems.
§ Identify
measures of success.
Submit
report_4
Once report_2
has been approved, you are
all set to move on to Part
-II(b) of the project i.e.
the
actual
lifecycle case study. The
first part of this study is project
planning, that we discussed
in
great detail in
lecture no. 33. Do the
necessary preparations, which include
development of a
questionnaire of
atleast 15 questions before you
meet the key person in
the company. The
questions
MUST be non -trivial i.e.
must not be overly simplistic
i.e. asking for
information
already
available in the company brochure or
their website. Before preparing
the questionnaire,
you must
have done the necessary
background study about the
company, so that you
make
relevant
and probing questions. You
must also have identified
and set an appointment with
the
key
person who is willing to help
you. Note that any person
will not suffice, but a person who
is
in a position of
influence and wants to help
you. IF you can't find such
a person, don't
worry,
keep on reading,
I will provide you guidance on
how to resolve this problem. After you
have
successfully
gone through the project planning phase,
you should submit report_3.
Now
the next part is user
requirement definition, we have discussed
this too in great detail
in
lecture
no. 33. I would suggest
that before you meet the
user for requirement collection, call
and
set an
appointment, reach there in time
and in formal attire. Also
take along a micro
cassette
reorder or use
your mobile phone for
recording the session. Do the recording
with permission
from
the business user. Along
with recording the answers of
the user (don't forget to
take notes)
312
get a
copy of the sample input
used in the organization, and a
copy of the typical output,
that
could be in the
form of a report etc. Specifically ask
questions and understand the
business
process,
the problems and the
measures of success. After you
are done, go through the
entire
process of
debriefing etc, and submit
report_4.
Part-II(b):
Life-Cycle Study...
Do a technical
track study and submit
report_5.
Do a data
track design and submit
report_6.
Do an analytic
track study and submit
report_7.
About
reports:
§ Each
report A4 page size and NOT
more than 5 pages (excluding
diagrams and tables).
§ 12-pt Times
New Roman font
§ Single
space and 1" top, bottom,
left and right
spacing
What
will be covered in report 5 through 7 was covered in
detail in lecture no. 34,
read the notes
and
the reference book for these
lectures i.e. "The Data
Warehouse Toolkit by Ralph
Kimball".
The
only difference in the project
work w.r.t to the lecture is
that you don't have to do
the
analytics
development, and don't have to go
beyond the three parallel
tracks.
What if
you are not
entertained?
·
There
are prospects, then there
are customers.
·
Typically 1 out
of 10 prospects is a buyer or customer
i.e. 10% success.
·
The
more prospects you meet and
they are not interested,
the more close you actually
get
to your
customer.
·
Quitters
never win, and winners never
quit.
·
Thomas
Edison made 2,000 attempts
to make the filament for
the light bulb that
actually
worked.
·
When
asked how did he managed
2,000 failures? He said they were
not failures; he just
identified
WHICH 2,000 filaments don't
work.
·
DON'T
GIVE UP
313
In the
context of sales and
marketing, a prospect is someone
who can become a customer.
A
customer is
someone who will buy a product or
service form you. So every
sales person is
looking
for a customer or wants to convert a
prospect into a customer.
However, on an average
10% of
the prospects become a
customer, it means if you try 10
prospects, in the worst case
the
last
one will be the customer.
Look at positively, meaning
the more failures you have, the
more
close
you get to your customer.
This is why the saying
goes that "quitters never
win, and winners
never
quit". Similarly look at the
work of Thomas Edison, he
kept trying and untimely
become
immortal
i.e. his name still
lives and his invention
still used.
What if
you are never
entertained?
It may so
happen that no one ente
rtains/helps you, maybe you
did not tried hard, maybe
you had
the
wrong attitude, maybe you
did not meet the right
people , maybe you were not
very
convincing ,
maybe the end user
was apprehensive etc.
Upto
10% less credit if this
approach is adopted
In such a
case, search the web, read
books/magazines and pick any
one of the 5 types of
organizations
discussed and collect
reference/related material (not beyond
year 2000).
Use
the material collected to
write reports 2 to 6.
Now it
may so happen, that you tried,
and tried and tried
very hard, but still
unable to get a key
person
interested and willing to help you so
that you could write to write report_2.
There could be
several reasons
for this, maybe you don't
tried hard, maybe it was
because of the other person
etc.
In such a
case, you will do a lifecycle
case study using the
Internet. In report_2 you will list
all
the
reasons for failure and
how you tried, and then we
will decide about how
much credit to
deduct
from the semester project.
The amount of deduction could be upto
10%. But you still
have
to write
all the reports and
use the material from the
case studies tailored to the
requirements of
the
reports. For this purpose
you will have to download a number of
focused case studies, better
if
about the
same organization, and compile
the results in the form of
the reports as per the
Ralph
Kimball's
road map.
Contents of
Project Reports
The project
reports to include, but is not limited
to, the following:
§
Narrative
summary of results produced
(report_1).
§
Listings of
computer models/programs coded
and utilized (report_1).
§
Reports
displaying results (report_1).
§
System
usage instructions (report_1).
§
Narrative
description of business and tables of
appropriate data
(report_4).
§
Descriptions of
decisions to be supported by information
produced by
system
(report_4).
314
§
Structure
charts, dataflow diagrams and/or other
diagrams to document
the
structure of the system
(report 4-7).
§
Recommendations
(reports 5-7)
This is self
explanatory, and I have also
explained in the lecture.
Follow the guidelines to
the
word, as
your work will be graded
based on these guidelines. Note
that system usage
instructions
have to be
specific with screen shots of
the application developed, so that your
applications can
be executed
using the instructions and graded.
Don't forget to submit the
entire source code,
along with
the compiled code with all
necessary libraries DLLs
attached.
Format of
Project Reports:
Main
·
Report
No.
·
Title of
course, semester & submission
date
·
Names
and roll no. of project
members.
·
Campus
and name of city.
·
Table of
contents.
·
1-page executive
summary for each
report.
·
Attach
(scanned) hard/soft copies of all
related material collected and
referenced with
each
report.
·
At the
end of semester, combine ALL
reports and submit as a
single report.
Again
this is self explanatory, and I
have also explained in the
lecture. Follow the guidelines
to
the
word, as your work will be
graded based on these
guidelines.
Format of
Project Reports: Other
·
No spelling or
grammar mistakes.
·
Make sections
and number them (as per
contents of report discussed).
·
Pages
should be numbered (bottom
center).
·
Add an
index at the end of
report.
·
File
name:
RPT_no_semester_campus_rollno_CS614.doc
·
e.g.
RPT_1_F05_BestComputersWah_234,235_CS614.doc
·
Email
copy of report.
·
Warning:
Do not
copy-paste, I can and I will
catch you.
MS Word
has a facility to create an index.
You begin by creating a file
with a list of keywords to
be listed in
the index, and then go to
Insert then Reference then In
dex and Tables,
press
AutoMark
and select the file
with keywords.
315
The naming
convention of the reports is important,
so that your reports can be
easily identified as
there
are about half a dozen
reports for each group. The
following convention to be used in
report
naming:
RPT:
This
will be repeated for each
report and will not
change.
no:
is
the report number i.e. 1 through 7
semester:
is
the semester, some possible
semesters are F05 i.e.
Fall 2005 or SP06 i.e.
Spring
2006 or SPL05
i.e. special semester 2005
or SM06 i.e. summer
2006.
campus:
The
name of the VU campus where you
are registered or taking the
course along with
the
name of the city or town.
Write full name of the
campus, do not use underscore
i.e. _ or
dashes or
space in the campus
name.
rollno: Since it is a
group project, so roll numbers of
students in the group
separated by a comma.
CS614:
This
will be at the end of every
file name i.e. the
course code.
Don't
try to copy -paste or use
someone else's work with
your name, there are
smart tools to catch
this,
once you are caught, this
can result into zero
credit.
Why
would companies entertain
you?
·
You
are students, and whom you
meet were also once
students.
·
You
can do an assessment of the
company for DWH potential at no
cost.
·
Since you
are only interested in your
project, so your analysis
will be neutral.
·
Your report
can form a basis for a
professional detailed assessment at a
later stage.
·
If a DWH
already exists, you can do an
independent audit.
If you present
your case well you are
likely to be entertained i.e. welcome by
the organization.
The
first and the foremost
reason to get help is, whom
you are talking to was once
a student too
as you are
currently, so there is a common bond.
Since you have studied well
DWH (hopefully),
therefore, you
can do a requirement assessment (in
the form of lifecycle study)
of the company at
no cost;
the company has nothing to
lose. Your only interest is
completion of your project and
grade, so you
are going to be very
objective and very neutra l;
hence the company has
still nothing
to lose.
After you have done
the lifecycle study, the
same can be used as a seed
or input by a
professional
organization for an in depth study,
thus saving in money to the
company for which
you
have done the work.
Again the company has
nothing to lose. It may so
happen that the
company you
contact already has a data
warehouse in place, in such a
case doing the
lifecycle
development
study can be used as a internal audit of
the DWH implementation.
Hence in short, if
the
company allows and helps you
with the study, it will be a
win -win scenario for both
the
parties.
Why
you may be
entertained?
316
Figure-36.1:
Adoption/Innovation curve
Fig-36.1
shows a typical adoption curve when a
new item, product or service is
introduced and
the
ratio of people responding to it. As you
can see the people who
are the first ones to
adopt or
embrace it
are the innovators, and they
are very few. This is
followed by early adopters, which
is
a sizeable
figure, and this is the
category of companies you
are supposed to target as part of
your
project. Note
that in our country, Data
Warehousing is not yet in
the category of early majority,
so there
will be more companies who
are prime candidates for
lifecycle study. You may
also
come
across people who may be
genuinely interested in a DWH, but don't
know enough about
it,
and want to
know. In such a case, be prepared to
educate them or enlighten them. It
would be best
if you look at a
number of case studies or reports
about their line of business
before meeting
them.
Interestingly,
the bell shaped curve
(which is not a perfect bell) divides
the prospects into
two
equal
parts i.e. 50% each.
You should be looking at that
50% which can and
will help you,
instead of
those who are likely to help
you, but at a later stage.
Finally
this project has enough
challenges to become your
final year BSc project. In
such a case
be prepared to
do coding leading to system deployment
and completion of all the remaining
steps.
Remember,
many large organizations/businesses
may not need a data
warehouse today, but
they
will
need one surely tomorrow.
And when that happens, they
will be looking for you to help
them
achieve
their objectives.
The project
has more than enough
potential to become a final year project,
which will cover an
implementation
and deployment also.
317
Table of Contents:
|
|||||