img/71-29_files/71-2900001im.jpg" width="695" height="1066" useMap="#Map">
Research Methods ­STA630
VU
Lesson 29
DATA ANALYSIS
Once the data begins to flow in, attention turns to data analysis. If the project has been done correctly,
the analysis planning is already done. Back at the research design stage or at least by the completion of
the proposal or the pilot test, decisions should have been made about how to analyze the data.
During the analysis stage several interrelated procedures are performed to summarize and rearrange the
data. The goal of most research is to provide information. There is a difference between raw data and
information.
Information refers to a body of facts that are in a format suitable for decision making, whereas data are
simply recorded measures of certain phenomenon.  The raw data collected in the field must be
transformed into information that will answer the sponsor's (e.g. manager's) questions. The conversion
of raw data into information requires that the data be edited and coded so that the data may be
transferred to a computer or other data storage medium.
If the database is large, there are many advantages to utilizing a computer. Assuming a large database,
entering the data into computer follows the coding procedure.
Editing
Occasionally, a fieldworker makes a mistake and records an improbable answer (e.g., birth year: 1843)
or interviews an ineligible respondent (e.g., someone too young to qualify). Seemingly contradictory
answers, such as "no" to automobile ownership but "yes" to an expenditure on automobile insurance,
may appear on a questionnaire. There are many problems like these that must be dealt with before the
data can be coded. Editing procedures are conducted to make the data ready for coding and transfer to
data storage.
Editing is the process of checking and adjusting the data for omissions, legibility, and consistency.
Editing may be differentiated from coding, which is the assignment of numerical scales or classifying
symbols to previously edited data.
The purpose of editing is to ensure the completeness, consistency, and readability of the data to be
transferred to data storage. The editor's task is to check for errors and omissions on the questionnaires
or other data collection forms.
The editor may have to reconstruct some data. For instance, a respondent may indicate weekly income
rather than monthly income, as requested on the questionnaire. The editor must convert the information
to monthly data without adding any extraneous information. The editor "should bring to light all hidden
values and extract all possible information from a questionnaire, while adding nothing extraneous."
Field Editing
In large projects, field supervisors are often responsible for conducting preliminary field edits. The
purpose of field editing the same day as the interview is to catch technical omissions (such as a blank
page), check legibility of the handwriting, and clarify responses that are logically or conceptually
inconsistent. If a daily field editing is conducted, a supervisor who edits completed questionnaires will
frequently be able to question the interviewers, who may be able to recall the interview well enough to
correct any problems. The number of "no answers," or incomplete answers can be reduced with a rapid
follow-up simulated by a field edit. The daily edit also allows fieldworkers to re-contact the respondent
to fill in omissions before the situation has changed. The field edit may also indicate the need for
further training of interviewers.
97
img/71-29_files/71-2900002im.jpg" width="695" height="1066" useMap="#Map">
Research Methods ­STA630
VU
In-House Editing
Although almost simultaneous editing in the field is highly desirable, in many situations (particularly
with mail questionnaires), early reviewing of the data is not possible. In-house editing rigorously
investigates the results of data collection.
Editing for Consistency:
The in-house editor's task is to ensure that inconsistent or contradictory responses are adjusted and that
answers will not be a problem for coders and keyboard punchers. Consider the situation in which a
telephone interviewer has been instructed to interview only registered voters that requires voters to be
18 years old. If the editor's reviews of a questionnaire indicate that the respondent was only 17 years of
age, the editor's task is to eliminate this obviously incorrect sampling unit. Thus, in this example, the
editor's job is to make sure that thee sampling unit is consistent with thee objectives of the study.
Editing requires checking for logically consistent responses. The in-house editor must determine if the
answers given by a respondent to one question are consistent with those given to other, related
questions. Many surveys utilize filter questions or skip questions that direct the sequence of questions,
depending upon respondent's answer. In some cases the respondent will have answered a sequence of
questions that should not have been asked. The editor should adjust these answers, usually to "no
answer' or "inapplicable," so that the responses will be consistent.
Editing for Completeness: In some cases the respondent may have answered only the second portion
of a two-part question. An in-house editor may have to adjust the answers to the following question for
completeness.
Does your organization have more than one Internet Web site? Yes ____ No. _____
If a respondent checked neither "yes" nor "No", but indicated three Internet Web sites, the editor may
check the "yes" to ensure that this answer is not missing from the questionnaire.
Item Non-response: It is a technical term for an unanswered question on an otherwise complete
questionnaire. Specific decision rules for handling this problem should be meticulously outlined in the
editorial instructions. In many situations the decision rule will be to do nothing with the unanswered
question: the editor merely indicates in item non response by writing a message instructing the coder to
record a "missing value" or blank as the response. However, in case the response is necessary then the
editor uses the plug value. The decision rule may to "plug in" an average or neutral value in each case
of missing data. A blank response in an interval scale item with a mid point would be to assign the mid
point in the scale as the response to that particular item. Another way is to assign to the item the mean
value of the responses of all those who have responded to that particular item. Another choice is to give
the item the mean of the responses of this particular respondent to all other questions measuring thee
variables. Another decision rule may be to alternate the choice of the response categories used as plug
values (e.g. "yes" the first time, "no" the second time, "yes" the third time, and so on).
The editor must also decide whether or not an entire questionnaire is "usable." When a questionnaire
has too many (say 25%) answers missing, it may not be suitable for the planned data analysis. In such a
situation the editor simply records thee fact that a particular incomplete questionnaire has been dropped
from the sample.
Editing Questions Answered out of Order: Another situation an editor may face is thee need to
rearrange the answers to an open-ended response to a question. For example, a respondent may have
provided the answer to a subsequent question in his answer to an earlier open-ended response question.
Because thee respondent had already clearly identified his answer, the interviewer may have avoided
asking thee subsequent question. The interviewer may have wanted to avoid hearing "I have already
answered that earlier" and to maintain rapport with the respondent and therefore skipped the question.
To make the response appear in the same order as on other questionnaires, the editor may remove the
out-of-order answer to the section related to the skipped question.
98
img/71-29_files/71-2900003im.jpg" width="695" height="1066" useMap="#Map">
Research Methods ­STA630
VU
Coding
Coding involves assigning numbers or other symbols to answers so the responses can be grouped into
limited number of classes or categories. The classifying of data into limited categories sacrifices some
data detail but is necessary for efficient analysis. Nevertheless, it is recommended that try to keep the
data in raw form so far it is possible. When the data have been entered into the computer you can
always ask the computer to group and regroup the categories. In case the data have been entered in the
compute in grouped form, it will not be possible to disaggregate it.
Although codes are generally considered to be numerical symbols, they are more broadly defined as the
rules for interpreting, classifying, and recording data. Codes allow data to be processed in a computer.
Researchers organize data into fields, records, and files. A field is a collection of characters (a character
is a single number, letter of the alphabet, or special symbol such as the question mark) that represent a
single type of data. A record is collection of related fields. A file is a collection of related records.
File, records, and fields are stored on magnetic tapes, floppy disks, or hard drives.
Researchers use a coding procedure and codebook. A coding procedure is a set of rules stating that
certain numbers are assigned to variable attributes. For example, a researchers codes males as 1 and
females as 2. Each category of variable and missing information needs a code. A codebook is a
document (i.e. one or more pages) describing the coding procedure and the location of data for variables
in a format that computers can use.
When you code data, it is very important to create a well-organized, detailed codebook and make
multiple copies of it. If you do not write down the details of the coding procedure, or if you misplace
thee codebook, you have lost thee key to the data and may have to recode the data again.
Researchers begin thinking about a coding procedure and a codebook before they collect data. For
example a survey researcher pre-codes a questionnaire before collecting thee data. Pre-coding means
placing the code categories (e.g. 1 for male, 2 for female) on the questionnaire. Sometimes to reduce
dependence on codebooks, researchers also place the location in the computer format on the
questionnaire.
If the researcher does not pre-code, his or her first step after collecting and editing of data is to crate a
codebook. He or she also gives each case an identification number to keep track of the cases. Next, the
researcher transfers the information from each questionnaire into a format that computers can read.
Code Construction
When the question has a fixed-alternative (closed ended) format, the number of categories requiring
codes is determined during the questionnaire design stage. The codes 8 and 9 are conventionally given
to "don't know" (DK) and "no answer" (NA) respectively. However, many computer program fields
recognize a blank field or a certain character symbol, such as a period (.), as indicating a missing value
(no answer).
There are two basic rules for code construction. First, the coding categories should be exhaustive ­ that
is, coding categories should be provided for all subjects or objects or responses. With a categorical
variable such as sex, making categories exhaustive is not a problem. However, when the response
represents a small number of subjects or when the responses might be categorized in a class not
typically found, there may be a problem.
Second, the coding categories should also be mutually exclusive and independent. This means that
there should be no overlap between the categories, to ensure that a subject or response can be placed in
only one category. This frequently requires that an "other" code category be included, so that the
99
img/71-29_files/71-2900004im.jpg" width="695" height="1066" useMap="#Map">
Research Methods ­STA630
VU
categories are all inclusive and mutually exclusive. For example, managerial span of control might be
coded 1, 2, 3, 4, and "5 or more." The "5 or more" category ensures everyone a place in a category.
When a questionnaire is highly structured, pre-coding of the categories typically occurs before the data
are collected. In many cases, such as when researchers are using open-ended response questions, a
framework for classifying responses to questions cannot be established before data collection. This
situation requires some careful thought concerning the determination of categories after editing process
has been completed. This is called post-coding or simply coding. The purpose of coding open-ended
response questions is to reduce the large number of individual responses to a few general categories of
answers that can be assigned numerical scores. Code construction in these situations necessarily must
reflect the judgment of the researcher. A major objective in code-building process is to accurately
transfer the meaning from written answers to numeric codes.
Code Book
A book identifying each variable in a study and its position in thee data matrix. The book is used to
identify a variable's description, code name, and field. Here is a sample:
·
Q/V No.
Field/ col. No.
Code values
·
--
1-5
Study number
·
-
6
City
·
1 = Lahore
·
2 = Rawalpindi
·
3 = Karachi
·
7 -9
Interview No.
·
Sex
10
1 = Male
·
2 = Female
·
Age
11-12
Actual
·
Education
13
1 = Non literate
2 = Literate
Production Coding
Transferring the data from the questionnaire or data collection form after the data have been collected is
called production coding. Depending upon the nature of the data collection form, codes may be written
directly on the instrument or on a special coding sheet.
Data Entries
Use of scanner sheets for data collection may facilitate the entry of the responses directly into the
computer without manual keying in the data. In studies involving highly structured paper
questionnaires, an Optical scanning system may be used to read material directly to the computer's
memory into the computer's memory. Optical scanners process the marked-sensed questionnaires and
store thee answers in a file.
Cleaning Data
The final stage in the coding process is the error checking and verification, or "data cleaning" stage,
which is a check to make sure that all codes are legitimate. Accuracy is extremely important when
coding data. Errors made when coding or entering data into a computer threaten the validity of
measures and cause misleading results. A researcher who has perfect sample, perfect measures, and no
errors in gathering data, but who makes errors in the coding process or in entering data into a computer,
can ruin a whole research project.
100