|
|||||
Research
Methods STA630
VU
Lesson
37
USE
OF SECONDARY DATA
Existing
statistics/documents
Prior
to the discussion of secondary data, let
us look at the advantages and
disadvantages of the use of
content
analysis that was covered in the last
lecture. In a way content analysis is
also the study of
documents
through which the writers
try to communicate, though some of the
documents (like
population
census) may simply contain
figures.
Advantages
1.
Access to inaccessible subjects:
One
of the basic advantages of content analysis is
that it allows
research
on subjects to which the researcher
does not have physical
access. These could be
people of
old
civilizations, say their marriage
patterns. These could also be the
documents form the
archives,
speeches
of the past leaders
(Quaid-e-Azam) who are not
alive, the suicide notes,
old films, dramas,
poems,
etc.
2.
Non-reactivity: Document
study shares with certain
types of observations (e.g., indirect
observation
or
non participant observation
through one-way mirror) the advantage of
little or no reactivity,
particularly
when the document was written
for some other purpose. This
is unobtrusive. Even the
creator
of that document, and for
that matter the characters in the document, is
not in contact with the
researcher,
who may not be
alive.
3.
Can do longitudinal analysis: Like
observation and unlike experiments and
survey, document
study
is especially well suited to study
over a long period of time.
Many times the objective of the
research
could be to determine a trend.
One could pick up different
periods in past and try to make
comparisons
and figure out the changes
(in the status of women)
that may have occurred over
time.
Take
two martial periods in Pakistan, study
the news papers and look at
the reported crime in the
press.
4.
Use Sampling: The
researcher can use random
sampling. One could decide on the
population,
develop
sampling frame and draw sample random
sample by following the appropriate
procedure. For
example
how women are portrayed in
weekly English news
magazines. One could pick up
weekly
English
news magazines, make a listing of
articles that have appeared in the
magazines (sampling
frame),
and draw a simple random
sample.
5.
Can use large sample
size: Larger
the sample closer the results to the
population. In
experimentation
as well as in survey research there
could be limitations due to the
availability of the
subjects
or of the resources but in document analysis the
researcher could increase the
sample and can
have
more confidence in generalization. Let us
assume that a researcher is
studying the matrimonial
advertisements
in the newspapers over a long
period of time, there should be no
problem in drawing a
sample
as large as several thousand or more.
6.
Spontaneity: The
spontaneous actions or feelings can be
recorded when they occurred rather than
at
a
time specified by the researcher. If the
respondent was keeping a diary, he or
she may have been
recording
spontaneous feelings about a subject
whenever he or she was
inspired to do so. The
contents
of
such personal recording could be analyzed
later on.
7.
Confessions: A
person may be more likely to
confess in a document, particularly one to be
read only
after
his or her death, than in an interview or
mailed questionnaire study. Thus a
study of documents
such
as diaries, posthumously published
autobiographies, and suicide notes
may be the only way
to
obtain
such information.
129
Research
Methods STA630
VU
8.
Relatively low cost:
Although
the cost of documentary analysis can
vary widely depending on
the
type
of document analyzed, how widely
documents are dispersed, and
how far one must
travel to gain
access
to them, documentary analysis can be
inexpensive compared to large-scale surveys.
Many a
time's
documents are gathered together in a
centralized location such as
library where the researcher
can
study
them for only the cost of
travel to the repository.
9.
High quality: Although
documents vary tremendously in
quality, many documents,
such as news
paper
columns, are written by skilled
commentators and may be more valuable
than, for example,
poorly
written responses to mailed
questionnaires.
Disadvantages
1.
Bias: Many
documents used in research were
not originally intended for
research purposes.
The
various
goals and purposes for which
documents are written can
bias them in various ways.
For
example,
personal documents such as confessional
articles or autobiographies are
often written by
famous
people or people who had
some unusual experience such as having
been a witness to a specific
event.
While often providing a
unique and valuable research
data, these documents
usually are written
for
the purpose of making money. Thus
they tend to exaggerate and even
fabricate to make good
story.
They
also tend to include those
events that make the author
look good and exclude those
that cast him or
her
in a negative light.
2.
Selective survival: Since
documents are usually
written on paper, they do
not withstand the
elements
well
unless care is taken to
preserve them. Thus while documents
written by famous people are
likely
to
be preserved, day-to-day documents such
as letters and diaries written by common
people tend either
to
be destroyed or to be placed in storage and thus
become inaccessible. It is relatively
rare for common
documents
that are not about
some events of immediate interest to the
researcher (e.g., suicide) and
not
about
famous occurrence or by some famous
person to be gathered together in a
public repository that
is
accessible
to researchers.
3.
Incompleteness: Many
documents provide incomplete
account to the researcher who
has had no
prior
experience with or knowledge of the
events or behavior discussed. A
problem with many
personal
documents
such as letters and diaries is that
they were not written for
research purposes but
were
designed
to be private or even secret. Both
these kinds of documents
often assume specific
knowledge
that
researcher unfamiliar with
certain events will not
possess. Diaries are
probably the worst in
this
respect,
since they are usually
written to be read only by the
author and can consist more of
"soul
searching"
and confession than of description.
Letters tend to be little more complete,
since they are
addressed
to a second person. Since many letters
assume a great amount of prior
information on the
part
of the reader.
4.
Lack of availability of documents:
In
addition to thee bias, incompleteness, and selective
survival of
documents,
there are many areas of
study for which no documents
are available. In many
cases
information
simply was never recorded. In
other cases it was recorded,
but the documents remain
secret
or
classified, or have been
destroyed.
5.
Sampling bias: One
of the problems of bias occurs because
persons of lower educational or
income
levels
are less likely to be
represented in the sampling frames.
The problem of sampling bias
by
educational
level is more acute for document
study than for survey
research. It is a safe
generalization
that
a poorly educated people are
much less likely than well
educated people to write
documents.
6.
Limited to verbal behavior:
By
definition, documents provide
information only about
respondent's
verbal
behavior, and provide no direct
information on the respondent's nonverbal
behavior, either that
of
the document's author or other characters
in the document.
130
Research
Methods STA630
VU
7.
Lack of standardized format:
Documents
differ quite widely in regard to
their standardization of
format.
Some documents such as
newspapers appear frequently in a
standard format. Large
dailies
always
contain such standard
components as editorial page,
business page, sports page,
and weather
report.
Standardization facilitates comparison
across time for the same
newspapers and comparison
across
different newspapers at one point in
time. However, many
other documents,
particularly
personal
documents have no standard format.
Comparison is difficult or impossible,
since valuable
information
contained in the document at one point in
time may be entirely lacking
in an earlier or later
documents.
8.
Coding difficulties: For a
number of reasons, including differences
in purpose for which
the
documents
were written, differences in content or subject
matter, lack of standardization,
and
differences
in length and format, coding is one of
the most difficult tasks
facing the content analyst.
Documents
are generally written
arrangements, rather than numbers
are quite difficult to
quantify. Thus
analysis
of documents is similar to analysis of open-ended
survey questions.
9.
Data must be adjusted for
comparability over time:
Although
one of the advantages of document
study
is that comparisons may be
made over a long period of
time, since external events
cause changes
so
drastic that even if a common unit of
measure is used for the
entire period, the value of
this unit may
have
changed so much over time
that comparisons are
misleading unless corrections are
made. Look at
the
change in measuring distance, temperature,
currency, and even literacy in
Pakistan.
Use
of Secondary Data: Existing
Statistics/Documents
Secondary
Data
Secondary
data refer to information gathered by
someone other than the
researcher conducting the
present
study. Secondary data are
usually historical, already
assembled, and do not require
access to
respondents
or subjects. Many types of information
about the social and behavioral world
have been
collected
and are available to the
researcher. Some information is in the
form of statistical
documents
(books,
reports) that contain numerical
information. Other information is in the
form of published
compilations
available in a library or on computerized
records. In either case the
researcher can search
through
collections of information with a
research question and variables in
mind, and then
reassemble
the
information in new ways to address the
research question.
Secondary
data may be collected by
large bureaucratic organization like the
Bureau of Statistics or other
government
or private agencies. These
data may have been collected
for policy decisions or as part
of
public
service.
The
data may be a time bound
collection of information (population
census) as well as spread
over long
periods
of time (unemployment trends, crime
rate). Secondary data are used
for making
comparisons
over
time in the country (population
trends in the country) as well as
across the countries (world
population
trends).
Selecting
Topic for Secondary
Analysis
Search
through the collections of information
with research question and
variables in mind, and
then
reassemble
the information in new ways to
address the research
question.
It
is difficult to specify topics that
are appropriate for existing
statistics research because
they are so
varied.
Any topic on which information
has been collected and is
publicly available can be
studied. In
fact,
existing statistics projects may
not neatly fit into a
deductive model of research design.
Rather
researchers
creatively recognize the existing
information into the variables
for a research question
after
first
finding what data are
available.
131
Research
Methods STA630
VU
Experiments
are best for topics where the
researcher controls a situation and
manipulates an
independent
variable. Survey research is
best for topics where the
researcher asks questions and
learns
about
reported attitudes and behavior.
Content analysis is for topics that
involve the content of
messages
in cultural communication.
Existing
statistics research is best
for topics that involve
information collected by large
bureaucratic
organizations.
Public or private organizations
systematically gather many types of
information. Such
information
is collected for policy decisions or as a
public service. It is rarely collected
for purposes
directly
related to a specific research
question. Thus existing statistics
research is appropriate when
a
researcher
wants to test hypotheses
involving variables that are
also in official reports of
social,
economic
and political conditions. These
include descriptions of organizations or
people in them.
Often,
such information is collected
over long periods. For
example, existing statistics
can be used by
researcher
who wants to see whether
unemployment and crime rates
are associated in 100 cities
across a
20
year period.
As
part of the trends, say in development,
researchers try to develop social
indicators for measuring the
well
being of the people. A social
indicator is
any measure of wellbeing
used in policy. There are
many
specific
indicators that are
operationalization of well-being. It is hoped
that information about
social
well
being could be combined with
widely used indicators of economic
performance (e.g., gross
national
product) to better inform
government and other policy
making officials.
The
main sources of existing
statistics are government or
international agencies and private
sources. An
enormous
volume and variety of information exists.
If you plan to conduct existing
statistics research, it
is
wise to discuss your interests
with an information professional in
this case, a reference
librarian,
who
can point you in the
direction of possible sources.
Many
existing documents are
"free" that is, publicly
available at libraries but the
time and effort it
takes
to research for specific
information can be substantial.
Researchers who conduct existing
statistics
research
spend many hours in libraries or on the
internet.
There
are so many sources of
existing statistics like: UN
publications, UNESCO Statistical
Yearbook,
UN
Statistical Yearbook, Demographic
Yearbook, Labor Force Survey
of Pakistan, and Population
Census
Data.
Secondary
Survey Data
Secondary
analysis is a special case of existing statistics; it
is reanalysis of previously collected
survey
or
other data that was
originally gathered by others. As opposed to
primary research (e.g.,
experiments,
surveys,
and content analysis), the focus is on analyzing
rather than collecting data.
Secondary
analysis is increasingly used by
researchers. It is relatively
inexpensive; it permits
comparisons
across groups, nations, or time; it
facilitates replication; and permits
asking about issues
not
thought by the original researchers.
There are several questions the
researcher interested in
secondary
research should ask: Are the
secondary data appropriate
for the research question?
What
theory
and hypothesis can a researcher
use with the data? Is the
researcher already familiar
with the
substantive
area? Does the researcher understand
how the data were originally gathered and
coded?
Large-scale
data collection is expensive
and difficult. The cost and
time required for major
national
surveys
that uses rigorous techniques
are prohibitive for most
researchers. Fortunately,
the
organization,
preservation, and dissemination of major
survey data sets have
improved. Today, there
are
archives of past surveys open to researchers (e.g.,
data on Population Census of
Pakistan,
Demographic
Survey of Pakistan).
Reliability
and Validity
Existing
statistics and secondary data
are not trouble free
just because a government agency or
other
source
gathered the original data. Researchers
must be concerned with
validity and reliability, as well
as
with
some problems unique to this
research technique.
132
Research
Methods STA630
VU
A
common error is the fallacy
of misplaced concreteness. It
occurs when someone gives a
false
impression
of accuracy by quoting statistics in
greater detail than warranted by
how the statistics
are
collected
and by overloading detail. For
example, in order to impress an audience,
a politician might
say
that every year 3010,534
persons, instead of saying 3 million
persons, are annually being
added to
the
population of Pakistan.
Validity:
Validity
problems occur when the researcher's
theoretical definition does
not match that of
the
government agency or organization that
collected the information. Official
policies and procedures
specify
definitions for official
statistics. For example, a
researcher defines a work
injury as
including
minor
cuts, bruises, and sprains
that occur on the job, but the
official definition in government
reports
only
includes injuries that
require a visit to a physician or
hospital. Many work injuries
as defined by
thee
researcher will not be in the
official statistics. Another
example occurs when a
researcher defines
people
unemployed
if
they would work if a good
job was available, if they
have to work part-time
when
they
want full-time work, and if
they have given up looking
for work. The official
definition, however,
includes
only those who are
now actively seeking work
(full or part-time) as unemployed.
The official
statistics
exclude those who stopped
looking, who work part-time
out of necessity, or who do not
look
because
they believe no work is
available. In both the cases the
researcher's definition differs
from that
in
official statistics.
Another
validity problem arises when
official statistics are a
proxy for a construct in which
the
researcher
is really interested. This is necessary
because the researcher cannot collect
original data. For
example,
the researcher wants to know
how many people have been
robbed, so he or she uses
police
statistics
on robbery arrests as a proxy.
But the measure is not
entirely valid because many
robberies are
not
reported to the police, and reported
robberies do not always result in an
arrest.
Another
validity problem arises
because the researcher lacks control
over how information is
collected.
All
information, even that in official
government reports, is originally gathered by
people in
bureaucracies
as part of their job. A
researcher depends on them for
collecting organizing,
reporting,
and
publishing data accurately. Systematic
errors in collecting the initial
information (e.g., census
people
who avoid poor neighborhoods
and make-up information, or people who
put a false age on
their
ID
card); errors in organizing and reporting
information (e.g., police department that
is sloppy about
filing
crime reports and loses some); errors in
publishing information (e.g., a
typographical error in a
table)
all reduce measurement
validity.
Reliability:
Stability
reliability problems develop when
official definition or the method of
collecting
information
changes over time. Official
definitions of work injury,
disability, unemployment,
literacy,
poverty,
and the like change periodically.
Even if the researcher learns of
such changes,
consistent
measurement
over time is
impossible.
Equivalence
reliability can also be a
problem. For example,
studies of police department suggest
that
political
pressures to increase arrests
are closely related to the number of
arrests. It could be seen
when
political
pressures in one city may
increase arrests (e.g., a crackdown on
crime), whereas pressures
in
another
city may decrease arrests
(e.g., to show drop in crime shortly
before an election in order to
make
officials
look better).
Researchers
often use official
statistics for international
comparisons but national governments
collect
data
differently and the quality of data
collection varies.
Inferences
from Non-Reactive
Data:
A
researcher's ability to infer
causality or to test a theory on the
basis of non-reactive data is
limited. It
is
difficult to use unobtrusive
measures to establish temporal order and
eliminate alternative
explanations.
In content analysis, a researcher cannot generalize
from the content to its effects on
those
who
read the text, but can
only use the correlation
logic of survey research to show an
association
among
variables. Unlike the case of
survey research, a researcher
does not ask respondents
direct
questions
to measure variables, but
relies on the information available in
thee text.
133
Table of Contents:
|
|||||