|
|||||
![]() Research
Methods STA630
VU
Lesson
29
DATA
ANALYSIS
Once
the data begins to flow in,
attention turns to data analysis. If the
project has been done
correctly,
the
analysis planning is already done. Back
at the research design stage or at least
by the completion of
the
proposal or the pilot test, decisions
should have been made about
how to analyze the data.
During
the analysis stage several interrelated
procedures are performed to
summarize and rearrange the
data.
The goal of most research is
to provide information. There is a
difference between raw data
and
information.
Information
refers
to a body of facts that are
in a format suitable for
decision making, whereas
data
are
simply
recorded measures of certain phenomenon.
The raw data collected
in the field must be
transformed
into information that will
answer the sponsor's (e.g. manager's) questions.
The conversion
of
raw data into information
requires that the data be edited and
coded so that the data may
be
transferred
to a computer or other data storage
medium.
If
the database is large, there are
many advantages to utilizing a
computer. Assuming a large
database,
entering
the data into computer follows the
coding procedure.
Editing
Occasionally,
a fieldworker makes a mistake and
records an improbable answer (e.g.,
birth year: 1843)
or
interviews an ineligible respondent (e.g.,
someone too young to
qualify). Seemingly
contradictory
answers,
such as "no" to automobile
ownership but "yes" to an
expenditure on automobile
insurance,
may
appear on a questionnaire. There
are many problems like these
that must be dealt with
before the
data
can be coded. Editing procedures
are conducted to make the data
ready for coding and transfer
to
data
storage.
Editing
is
the process of checking and adjusting the
data for omissions, legibility, and
consistency.
Editing
may be differentiated from
coding, which is the assignment of
numerical scales or
classifying
symbols
to previously edited data.
The
purpose of editing is to ensure the
completeness, consistency, and readability of the
data to be
transferred
to data storage. The
editor's task is to check
for errors and omissions on the
questionnaires
or
other data collection
forms.
The
editor may have to reconstruct some
data. For instance, a respondent
may indicate weekly
income
rather
than monthly income, as
requested on the questionnaire. The
editor must convert the
information
to
monthly data without adding
any extraneous information. The
editor "should bring to
light all hidden
values
and extract all possible information
from a questionnaire, while
adding nothing
extraneous."
Field
Editing
In
large projects, field supervisors are
often responsible for conducting
preliminary field edits.
The
purpose
of field editing the same
day as the interview is to catch
technical omissions (such as a
blank
page),
check legibility of the handwriting, and
clarify responses that are
logically or conceptually
inconsistent.
If a daily field editing is conducted, a
supervisor who edits completed questionnaires
will
frequently
be able to question the interviewers,
who may be able to recall
the interview well enough
to
correct
any problems. The number of "no answers,"
or incomplete answers can be
reduced with a rapid
follow-up
simulated by a field edit.
The daily edit also
allows fieldworkers to re-contact the
respondent
to
fill in omissions before the
situation has changed. The
field edit may also
indicate the need for
further
training of interviewers.
97
![]() Research
Methods STA630
VU
In-House
Editing
Although
almost simultaneous editing in the field is
highly desirable, in many situations
(particularly
with
mail questionnaires), early reviewing of
the data is not possible. In-house
editing rigorously
investigates
the results of data
collection.
Editing
for Consistency:
The
in-house editor's task is to
ensure that inconsistent or contradictory
responses are adjusted and
that
answers
will not be a problem for
coders and keyboard punchers. Consider
the situation in which a
telephone
interviewer has been
instructed to interview only registered
voters that requires voters to be
18
years old. If the editor's
reviews of a questionnaire indicate
that the respondent was only 17
years of
age,
the editor's task is to eliminate
this obviously incorrect
sampling unit. Thus, in this
example, the
editor's
job is to make sure that
thee sampling unit is
consistent with thee
objectives of the study.
Editing
requires checking for logically
consistent responses. The
in-house editor must
determine if the
answers
given by a respondent to one question
are consistent with those
given to other,
related
questions.
Many surveys utilize filter
questions or skip questions that direct
the sequence of questions,
depending
upon respondent's answer. In some cases
the respondent will have answered a
sequence of
questions
that should not have been
asked. The editor should
adjust these answers, usually to
"no
answer'
or "inapplicable," so that the responses
will be consistent.
Editing
for Completeness: In
some cases the respondent may have
answered only the second
portion
of
a two-part question. An in-house
editor may have to adjust the answers to
the following question
for
completeness.
Does
your organization have more than one
Internet Web site? Yes ____
No. _____
If
a respondent checked neither "yes"
nor "No", but indicated
three Internet Web sites, the editor
may
check
the "yes" to ensure that
this answer is not missing
from the questionnaire.
Item
Non-response: It is a
technical term for an unanswered
question on an otherwise complete
questionnaire.
Specific decision rules for
handling this problem should
be meticulously outlined in the
editorial
instructions. In many situations the
decision rule will be to do
nothing with the
unanswered
question:
the editor merely indicates in item
non response by writing a
message instructing the coder to
record
a "missing value" or blank as the
response. However, in case the
response is necessary then
the
editor
uses the plug
value. The
decision rule may to "plug
in" an average or neutral
value in each case
of
missing data. A blank response in an
interval scale item with a
mid point would be to assign
the mid
point
in the scale as the response to that
particular item. Another way
is to assign to the item the
mean
value
of the responses of all those
who have responded to that
particular item. Another choice is to
give
the
item the mean of the responses of
this particular respondent to all
other questions measuring thee
variables.
Another decision rule may be
to alternate the choice of the response
categories used as
plug
values
(e.g. "yes" the first time,
"no" the second time, "yes"
the third time, and so
on).
The
editor must also decide
whether or not an entire
questionnaire is "usable." When a
questionnaire
has
too many (say 25%)
answers missing, it may not be
suitable for the planned
data analysis. In such a
situation
the editor simply records
thee fact that a particular
incomplete questionnaire has
been dropped
from
the sample.
Editing
Questions Answered out of Order: Another
situation an editor may face
is thee need to
rearrange
the answers to an open-ended response to a
question. For example, a respondent
may have
provided
the answer to a subsequent question in
his answer to an earlier open-ended
response question.
Because
thee respondent had already clearly
identified his answer, the interviewer
may have avoided
asking
thee subsequent question.
The interviewer may have
wanted to avoid hearing "I have
already
answered
that earlier" and to maintain
rapport with the respondent and therefore
skipped the question.
To
make the response appear in the
same order as on other questionnaires,
the editor may remove the
out-of-order
answer to the section related to the
skipped question.
98
![]() Research
Methods STA630
VU
Coding
Coding
involves assigning numbers or other
symbols to answers so the responses can
be grouped into
limited
number of classes or categories. The
classifying of data into
limited categories sacrifices
some
data
detail but is necessary for
efficient analysis. Nevertheless, it is recommended
that try to keep the
data
in raw form so far it is possible.
When the data have been
entered into the computer you
can
always
ask the computer to group and
regroup the categories. In case the
data have been entered in the
compute
in grouped form, it will not
be possible to disaggregate it.
Although
codes are generally considered to be
numerical symbols, they are more
broadly defined as
the
rules
for interpreting, classifying, and
recording data. Codes allow
data to be processed in a
computer.
Researchers
organize data into fields,
records, and files. A
field
is
a collection of characters (a
character
is
a single number, letter of the
alphabet, or special symbol such as the
question mark) that
represent a
single
type of data. A record
is
collection of related fields. A
file
is
a collection of related
records.
File,
records, and fields are
stored on magnetic tapes, floppy disks,
or hard drives.
Researchers
use a coding procedure and codebook. A
coding
procedure is a
set of rules stating
that
certain
numbers are assigned to
variable attributes. For
example, a researchers codes
males as 1 and
females
as 2. Each category of variable
and missing information needs a
code. A codebook
is
a
document
(i.e. one or more pages) describing the
coding procedure and the location of data
for variables
in
a format that computers can
use.
When
you code data, it is very
important to create a well-organized,
detailed codebook and make
multiple
copies of it. If you do not
write down the details of the
coding procedure, or if you
misplace
thee
codebook, you have lost thee
key to the data and may have to
recode the data
again.
Researchers
begin thinking about a
coding procedure and a codebook before
they collect data.
For
example
a survey researcher pre-codes a
questionnaire before collecting
thee data. Pre-coding
means
placing
the code categories (e.g. 1 for male, 2
for female) on the questionnaire.
Sometimes to reduce
dependence
on codebooks, researchers also place the
location in the computer format on
the
questionnaire.
If
the researcher does not pre-code,
his or her first step after
collecting and editing of data is to
crate a
codebook.
He or she also gives each
case an identification number to keep
track of the cases. Next,
the
researcher
transfers the information from
each questionnaire into a
format that computers can
read.
Code
Construction
When
the question has a fixed-alternative
(closed ended) format, the number of categories
requiring
codes
is determined during the questionnaire
design stage. The codes 8 and 9
are conventionally
given
to
"don't know" (DK) and
"no answer" (NA)
respectively. However, many computer
program fields
recognize
a blank field or a certain
character symbol, such as a
period (.), as indicating a missing
value
(no
answer).
There
are two basic rules for
code construction. First, the
coding categories should be
exhaustive
that
is,
coding categories should be
provided for all subjects or
objects or responses. With a
categorical
variable
such as sex, making
categories exhaustive is not a
problem. However, when the
response
represents
a small number of subjects or when the
responses might be categorized in a class
not
typically
found, there may be a
problem.
Second,
the coding categories should
also be mutually
exclusive and
independent.
This means that
there
should be no overlap between the categories, to
ensure that a subject or response
can be placed in
only
one category. This
frequently requires that an "other"
code category be included, so
that the
99
![]() Research
Methods STA630
VU
categories
are all inclusive and
mutually exclusive. For
example, managerial span of
control might be
coded
1, 2, 3, 4, and "5 or more." The "5 or
more" category ensures
everyone a place in a category.
When
a questionnaire is highly structured,
pre-coding of the categories typically
occurs before the
data
are
collected. In many cases,
such as when researchers are
using open-ended response questions,
a
framework
for classifying responses to questions
cannot be established before data
collection. This
situation
requires some careful thought
concerning the determination of
categories after editing
process
has
been completed. This is
called post-coding or simply
coding.
The
purpose of coding
open-ended
response
questions is to
reduce the large number of individual
responses to a few general categories
of
answers
that can be assigned
numerical scores. Code construction in
these situations necessarily
must
reflect
the judgment of the researcher. A major
objective in code-building process is to
accurately
transfer
the meaning from written
answers to numeric
codes.
Code
Book
A
book identifying each
variable in a study and its
position in thee data
matrix. The book is used
to
identify
a variable's description, code
name, and field. Here is a
sample:
�
Q/V
No.
Field/
col. No.
Code
values
�
--
1-5
Study
number
�
-
6
City
�
1
= Lahore
�
2
= Rawalpindi
�
3
= Karachi
�
7
-9
Interview
No.
�
Sex
10
1
= Male
�
2
= Female
�
Age
11-12
Actual
�
Education
13
1
= Non literate
2
= Literate
Production
Coding
Transferring
the data from the questionnaire or
data collection form after
the data have been collected
is
called
production coding. Depending
upon the nature of the data collection
form, codes may be
written
directly
on the instrument or on a special coding
sheet.
Data
Entries
Use
of scanner sheets for data
collection may facilitate the
entry of the responses directly
into the
computer
without manual keying in the data. In
studies involving highly structured
paper
questionnaires,
an Optical
scanning system may
be used to read material
directly to the computer's
memory
into the computer's memory.
Optical scanners process the
marked-sensed questionnaires and
store
thee answers in a
file.
Cleaning
Data
The
final stage in the coding
process is the error checking and
verification, or "data
cleaning" stage,
which
is a check to make sure that
all codes are legitimate.
Accuracy is extremely important
when
coding
data. Errors made when coding or
entering data into a computer threaten
the validity of
measures
and cause misleading
results. A researcher who
has perfect sample, perfect
measures, and no
errors
in gathering data, but who
makes errors in the coding
process or in entering data
into a computer,
can
ruin a whole research
project.
100
Table of Contents:
|
|||||