DATA ANALYSIS:Information, Editing, Editing for Consistency

<< TYPES OF PROBABILITY SAMPLING:Systematic Random Sample

DATA TRANSFROMATION:Indexes and Scales, Scoring and Score Index >>

Research Methods STA630

Lesson 29

DATA ANALYSIS

Once the data begins to flow in, attention turns to data analysis. If the project has been done correctly,

the analysis planning is already done. Back at the research design stage or at least by the completion of

the proposal or the pilot test, decisions should have been made about how to analyze the data.

During the analysis stage several interrelated procedures are performed to summarize and rearrange the

data. The goal of most research is to provide information. There is a difference between raw data and

information.

Information refers to a body of facts that are in a format suitable for decision making, whereas data are

simply recorded measures of certain phenomenon. The raw data collected in the field must be

transformed into information that will answer the sponsor's (e.g. manager's) questions. The conversion

of raw data into information requires that the data be edited and coded so that the data may be

transferred to a computer or other data storage medium.

If the database is large, there are many advantages to utilizing a computer. Assuming a large database,

entering the data into computer follows the coding procedure.

Editing

Occasionally, a fieldworker makes a mistake and records an improbable answer (e.g., birth year: 1843)

or interviews an ineligible respondent (e.g., someone too young to qualify). Seemingly contradictory

answers, such as "no" to automobile ownership but "yes" to an expenditure on automobile insurance,

may appear on a questionnaire. There are many problems like these that must be dealt with before the

data can be coded. Editing procedures are conducted to make the data ready for coding and transfer to

data storage.

Editing is the process of checking and adjusting the data for omissions, legibility, and consistency.

Editing may be differentiated from coding, which is the assignment of numerical scales or classifying

symbols to previously edited data.

The purpose of editing is to ensure the completeness, consistency, and readability of the data to be

transferred to data storage. The editor's task is to check for errors and omissions on the questionnaires

or other data collection forms.

The editor may have to reconstruct some data. For instance, a respondent may indicate weekly income

rather than monthly income, as requested on the questionnaire. The editor must convert the information

to monthly data without adding any extraneous information. The editor "should bring to light all hidden

values and extract all possible information from a questionnaire, while adding nothing extraneous."

Field Editing

In large projects, field supervisors are often responsible for conducting preliminary field edits. The

purpose of field editing the same day as the interview is to catch technical omissions (such as a blank

page), check legibility of the handwriting, and clarify responses that are logically or conceptually

inconsistent. If a daily field editing is conducted, a supervisor who edits completed questionnaires will

frequently be able to question the interviewers, who may be able to recall the interview well enough to

correct any problems. The number of "no answers," or incomplete answers can be reduced with a rapid

follow-up simulated by a field edit. The daily edit also allows fieldworkers to re-contact the respondent

to fill in omissions before the situation has changed. The field edit may also indicate the need for

further training of interviewers.

Research Methods STA630

In-House Editing

Although almost simultaneous editing in the field is highly desirable, in many situations (particularly

with mail questionnaires), early reviewing of the data is not possible. In-house editing rigorously

investigates the results of data collection.

Editing for Consistency:

The in-house editor's task is to ensure that inconsistent or contradictory responses are adjusted and that

answers will not be a problem for coders and keyboard punchers. Consider the situation in which a

telephone interviewer has been instructed to interview only registered voters that requires voters to be

18 years old. If the editor's reviews of a questionnaire indicate that the respondent was only 17 years of

age, the editor's task is to eliminate this obviously incorrect sampling unit. Thus, in this example, the

editor's job is to make sure that thee sampling unit is consistent with thee objectives of the study.

Editing requires checking for logically consistent responses. The in-house editor must determine if the

answers given by a respondent to one question are consistent with those given to other, related

questions. Many surveys utilize filter questions or skip questions that direct the sequence of questions,

depending upon respondent's answer. In some cases the respondent will have answered a sequence of

questions that should not have been asked. The editor should adjust these answers, usually to "no

answer' or "inapplicable," so that the responses will be consistent.

Editing for Completeness: In some cases the respondent may have answered only the second portion

of a two-part question. An in-house editor may have to adjust the answers to the following question for

completeness.

Does your organization have more than one Internet Web site? Yes ____ No. _____

If a respondent checked neither "yes" nor "No", but indicated three Internet Web sites, the editor may

check the "yes" to ensure that this answer is not missing from the questionnaire.

Item Non-response: It is a technical term for an unanswered question on an otherwise complete

questionnaire. Specific decision rules for handling this problem should be meticulously outlined in the

editorial instructions. In many situations the decision rule will be to do nothing with the unanswered

question: the editor merely indicates in item non response by writing a message instructing the coder to

record a "missing value" or blank as the response. However, in case the response is necessary then the

editor uses the plug value. The decision rule may to "plug in" an average or neutral value in each case

of missing data. A blank response in an interval scale item with a mid point would be to assign the mid

point in the scale as the response to that particular item. Another way is to assign to the item the mean

value of the responses of all those who have responded to that particular item. Another choice is to give

the item the mean of the responses of this particular respondent to all other questions measuring thee

variables. Another decision rule may be to alternate the choice of the response categories used as plug

values (e.g. "yes" the first time, "no" the second time, "yes" the third time, and so on).

The editor must also decide whether or not an entire questionnaire is "usable." When a questionnaire

has too many (say 25%) answers missing, it may not be suitable for the planned data analysis. In such a

situation the editor simply records thee fact that a particular incomplete questionnaire has been dropped

from the sample.

Editing Questions Answered out of Order: Another situation an editor may face is thee need to

rearrange the answers to an open-ended response to a question. For example, a respondent may have

provided the answer to a subsequent question in his answer to an earlier open-ended response question.

Because thee respondent had already clearly identified his answer, the interviewer may have avoided

asking thee subsequent question. The interviewer may have wanted to avoid hearing "I have already

answered that earlier" and to maintain rapport with the respondent and therefore skipped the question.

To make the response appear in the same order as on other questionnaires, the editor may remove the

out-of-order answer to the section related to the skipped question.

Research Methods STA630

Coding

Coding involves assigning numbers or other symbols to answers so the responses can be grouped into

limited number of classes or categories. The classifying of data into limited categories sacrifices some

data detail but is necessary for efficient analysis. Nevertheless, it is recommended that try to keep the

data in raw form so far it is possible. When the data have been entered into the computer you can

always ask the computer to group and regroup the categories. In case the data have been entered in the

compute in grouped form, it will not be possible to disaggregate it.

Although codes are generally considered to be numerical symbols, they are more broadly defined as the

rules for interpreting, classifying, and recording data. Codes allow data to be processed in a computer.

Researchers organize data into fields, records, and files. A field is a collection of characters (a character

is a single number, letter of the alphabet, or special symbol such as the question mark) that represent a

single type of data. A record is collection of related fields. A file is a collection of related records.

File, records, and fields are stored on magnetic tapes, floppy disks, or hard drives.

Researchers use a coding procedure and codebook. A coding procedure is a set of rules stating that

certain numbers are assigned to variable attributes. For example, a researchers codes males as 1 and

females as 2. Each category of variable and missing information needs a code. A codebook is a

document (i.e. one or more pages) describing the coding procedure and the location of data for variables

in a format that computers can use.

When you code data, it is very important to create a well-organized, detailed codebook and make

multiple copies of it. If you do not write down the details of the coding procedure, or if you misplace

thee codebook, you have lost thee key to the data and may have to recode the data again.

Researchers begin thinking about a coding procedure and a codebook before they collect data. For

example a survey researcher pre-codes a questionnaire before collecting thee data. Pre-coding means

placing the code categories (e.g. 1 for male, 2 for female) on the questionnaire. Sometimes to reduce

dependence on codebooks, researchers also place the location in the computer format on the

questionnaire.

If the researcher does not pre-code, his or her first step after collecting and editing of data is to crate a

codebook. He or she also gives each case an identification number to keep track of the cases. Next, the

researcher transfers the information from each questionnaire into a format that computers can read.

Code Construction

When the question has a fixed-alternative (closed ended) format, the number of categories requiring

codes is determined during the questionnaire design stage. The codes 8 and 9 are conventionally given

to "don't know" (DK) and "no answer" (NA) respectively. However, many computer program fields

recognize a blank field or a certain character symbol, such as a period (.), as indicating a missing value

(no answer).

There are two basic rules for code construction. First, the coding categories should be exhaustive that

is, coding categories should be provided for all subjects or objects or responses. With a categorical

variable such as sex, making categories exhaustive is not a problem. However, when the response

represents a small number of subjects or when the responses might be categorized in a class not

typically found, there may be a problem.

Second, the coding categories should also be mutually exclusive and independent. This means that

there should be no overlap between the categories, to ensure that a subject or response can be placed in

only one category. This frequently requires that an "other" code category be included, so that the

Research Methods STA630

categories are all inclusive and mutually exclusive. For example, managerial span of control might be

coded 1, 2, 3, 4, and "5 or more." The "5 or more" category ensures everyone a place in a category.

When a questionnaire is highly structured, pre-coding of the categories typically occurs before the data

are collected. In many cases, such as when researchers are using open-ended response questions, a

framework for classifying responses to questions cannot be established before data collection. This

situation requires some careful thought concerning the determination of categories after editing process

has been completed. This is called post-coding or simply coding. The purpose of coding open-ended

response questions is to reduce the large number of individual responses to a few general categories of

answers that can be assigned numerical scores. Code construction in these situations necessarily must

reflect the judgment of the researcher. A major objective in code-building process is to accurately

transfer the meaning from written answers to numeric codes.

Code Book

A book identifying each variable in a study and its position in thee data matrix. The book is used to

identify a variable's description, code name, and field. Here is a sample:

�

Q/V No.

Field/ col. No.

Code values

�

1-5

Study number

�

City

�

1 = Lahore

�

2 = Rawalpindi

�

3 = Karachi

�

7 -9

Interview No.

�

Sex

1 = Male

�

2 = Female

�

Age

11-12

Actual

�

Education

1 = Non literate

2 = Literate

Production Coding

Transferring the data from the questionnaire or data collection form after the data have been collected is

called production coding. Depending upon the nature of the data collection form, codes may be written

directly on the instrument or on a special coding sheet.

Data Entries

Use of scanner sheets for data collection may facilitate the entry of the responses directly into the

computer without manual keying in the data. In studies involving highly structured paper

questionnaires, an Optical scanning system may be used to read material directly to the computer's

memory into the computer's memory. Optical scanners process the marked-sensed questionnaires and

store thee answers in a file.

Cleaning Data

The final stage in the coding process is the error checking and verification, or "data cleaning" stage,

which is a check to make sure that all codes are legitimate. Accuracy is extremely important when

coding data. Errors made when coding or entering data into a computer threaten the validity of

measures and cause misleading results. A researcher who has perfect sample, perfect measures, and no

errors in gathering data, but who makes errors in the coding process or in entering data into a computer,

can ruin a whole research project.

100

Table of Contents: