|
|||||
Research
Methods STA630
VU
Lesson
30
DATA
TRANSFROMATION
Data
transformation is the process of changing
data from their original
form to a format that is
more
suitable
to perform a data analysis that
will achieve the research objectives.
Researchers often
modify
thee
values of a scalar data or create
new variables. For example
many researchers believe
that response
bias
will be less if interviewers
ask consumers for their
year of birth rather than
their age, even
though
the
objective of the data analysis is to
investigate respondents' age in years.
This does not present
a
problem
for thee research analyst,
because a simple data
transformation is possible. The raw
data coded
at
birth year can be easily
transformed to age by subtracting the birth
year from thee current
year.
Collapsing
or combining categories of a variable is
a common data transformation that
reduces the
number
of categories. For example
five categories of Likert
scale response categories to a
question may
be
combined like: the "strongly
agree" and the "agree" response
categories are combined.
The
"strongly
disagree" and the "disagree"
response categories are
combined into a single
category. The
result
is the collapsing of the five-category
scale down to three.
Creating
new variables by re-specifying the
data numeric or logical
transformations is another
important
data
transformation. For example,
Likert summated scale
reflect the combination of scores
(raw data)
from
various attitudinal statements.
The summative score for an
attitude scale with three
statements is
calculated
as follows:
Summative
Score = Variable 1 + Variable 2 +
Variable 3
This
calculation can be accomplished by using
simple arithmetic or by programming a
computer with a
data
transformation equation that
creates the new variable
"summative score."
The
researchers have created numerous
different scales and indexes to measure
social phenomenon. For
example
scales and indexes have been developed to
measure the degree of formalization in
bureaucratic
organization,
the prestige of occupations, the adjustment of people in marriage,
the intensity of group
interaction,
thee level of social activity in a
community, and thee level of
socio-economic development
of
a nation.
Keep
it in mind that every social
phenomenon can be measured. Some
constructs can be
measured
directly
and produce precise numerical values
(e.g. family income). Other
constructs require the use
of
surrogates
or proxies that indirectly
measure a variable (e.g. job
satisfaction). Second, a lot
can be
learned
from measures used by other
researchers. We are fortunate to have the
work of thousands of
researchers
to draw on. It is not always
necessary to start from a
scratch. We can use a past
scale or
index,
or we can modify it for our
own purposes. The process of
creating measures for a
construct
evolves
over time. Measurement is an
ongoing process with
constant change; new
concepts are
developed,
theoretical definitions are
refined, and scales or indexes
that measure old or new
constructs
are
improved.
Indexes
and Scales
Scales
and indexes are often used
interchangeably. One researcher's
scale is another's index.
Both
produce
ordinal- or interval- level
measures of variable. To add to thee
confusion, scale and
index
techniques
can be combined in one
measure. Scales and indexes give a
researcher more information
about
variables and make it possible to assess
thee quality of measurement.
Scales and indexes increase
reliability
and validity, and they aid in
data
reduction; that
is condense and simplify the
information
that
is collected.
A
scale
is
a measure in which the researcher
captures the intensity, direction,
level, or potency of a
variable
construct. It arranges responses or
observation on a continuum. A scale
can use single
indicator
or multiple indicators. Most
are at thee ordinal level of
measurement.
101
Research
Methods STA630
VU
An
index
is
a measure in which a researcher
adds or combines several distinct
indicators of a construct
into
a single score. This composite
score is often a simple sum
of multiple indicators. It is used
for
content
or convergent validity. Indexes
are often measured at the
interval or ratio
level.
Researchers
sometimes combine the features of
scales and indexes in a single
measure. This is common
when
a researcher has several indicators
that are scales. He or she
then adds these indicators
together to
yield
a single score, thereby an
index.
Unidimensionality:
It
means that al the items in a
scale or index fit together,
or measure a single
construct.
Unidimensionality says: If you
combine several specific pieces of
information into a
single
score
or measure, have all the pieces
measure the same thing.
(each sub dimension is part
of the
construct's
overall content).
For
example, we define the construct
"feminist ideology" as a general ideology
about gender.
Feminist
ideology
is a highly abstract and general construct. It
includes a specific beliefs and
attitudes towards
social,
economic, political, family, sexual
relations. The ideology's
five belief areas parts of a
single
general
construct. The parts are
mutually reinforcing and together
form a system of beliefs
about
dignity,
strength, and power of women.
Index
Construction
You
may have heard about a consumer
price index (CPI). The
CPI, which is a measure of
inflation, is
created
by totaling the cost of buying a
list of goods and services (e.g.
food, rent, and utilities)
and
comparing
the total to the cost of buying the
same list in the previous
year. An index
is
combination of
items
into a single numerical
score. Various components or
subgroups of a construct are
each
measured,
and then combined into one
measure.
There
are many types of indexes. For
example, if you take an exam
with 25 questions, the total
number
of
questions correct is a kind of index. It is a
composite measure in which each
question measures a
small
piece of knowledge, and all the questions
scored correct or incorrect are
totaled to produce a
single
measure.
One
way to demonstrate that indexes
are not a very complicated
is to use one. Answer yes or no to
the
seven
questions that follow on the
characteristics of an occupation. Base
your answers on your
thoughts
regarding
the following four occupations:
long-distance truck driver,
medical doctor, accountant,
telephone
operator. Score each answer
1 for yes and 0 for
no.
1.
Does
it pay good salary?
2.
Is
the job secure from layoffs
or unemployment?
3.
Is
the work interesting and
challenging?
4.
Are
its working conditions (e.g. hours,
safety, time on the road)
good?
5.
Are
there opportunities for career
advancement and promotion?
6.
Is
it prestigious or looked up to by
others?
7.
Does
it permit self-direction and thee
freedom to make decisions?
Total
the seven answers for each
of the four occupations. Which had the
highest and which had the
lowest
score? The seven questions
are our operational
definition of the construct good
occupation.
Each
question represents a subpart of our
theoretical definition.
Creating
indexes is so easy that it is important
to be careful that every
item in the index has
face
validity.
Items without face validity
should be excluded. Each
part of the construct should be
measured
with
at least one indicator. Of
course, it is better to measure the
parts of a construct with
multiple
indicators.
102
Research
Methods STA630
VU
Another
example of an index is college
quality index. Our
theoretical definition says
that a high quality
college
has six distinguished characteristics:
(1) fewer students per
faculty member, (2) a
highly
educated
faculty, (3) more books in the library,
(4) fewer students dropping
out of college, (5)
more
students
who go to advanced degrees, and (6)
faculty members who publish
books or scholarly articles.
We
score 100 colleges on each
item, and then add the score
for each to create an index
score of college
quality
that can be used to compare
colleges.
Indexes
can be combined with one
another. For example, in
order to strengthen the college
quality
index.
We add a sub-index on teaching
quality. The index contain
eight elements: (1) average
size of
classes,
(2) percentage of class time
devoted to discussion, (3) number of
different classes each
faculty
member
teaches, (4) availability of
faculty to students outside
thee classroom, (5) currency
and amount
of
reading assigned, (6) degree
to which assignments promote
learning, (7) degree to
which faculty get
to
know each student, and (8) student
ratings of instruction. Similar
sub-index measures can be
created
for
other parts of the college
quality index. They can be
combined into a more global
measure of
college
quality. This further
elaborates the definition of a construct
"quality of college."
Weighting
An
important issue in index
construction is whether to weight items.
Unless it is otherwise stated,
assume
that an index is un-weighted.
Likewise, unless we have a good
reason for assigning
different
weights,
use equal weights. A
weighted index gives each
item equal weight. It
involves adding up the
items
without modification, as if each were
multiplied by 1 (or 1 for
negative items that are
negative).
Scoring
and Score Index
In
one our previous discussions we
had tried to measure job
satisfaction. It was operationalized
with the
help
of dimensions and elements. We had constructed number of
statements on each element with
5
response
categories using Likert
scale i.e. strongly agree,
agree, undecided, disagree, and
strongly
disagree.
We could score each of these
items from 1 to 5 depending upon the
degree of agreement
with
the
statement. The statements have
been both positive as well
as negative. For positive
statements we
can
score straight away from 5
to 1 i.e. strongly agree to
strongly disagree. For the
negative statements
we
have to reverse the score i.e. 1
for "strongly agree," 2 for
"agree," 3 for "undecided" to 4
for
"disagree,"
and 5 for "strongly disagree."
Reason being that negative
multiplied by a negative
becomes
positive
i.e. a negative statement and a
person strongly disagreeing
with it implies that he or
she has a
positive
responsive so we give a score of 5 in
this example. In our
example, let us say there were
23
statements
measuring for different elements
and dimensions measuring job
satisfaction. When on
each
statement
the respondent could get a minimum score
of 1 and a maximum score of 5, on 23
statements a
respondent
could get a minimum score of
(23 X 1) and a maximum score of
(23 X 5) 115. In this
way
the
score index ranges from 23
to 115, the lower end of the score
index showing minimum
job
satisfaction
and upper end as the highest job
satisfaction. In reality we may not
find any on the
extremes,
rather the respondents could be
spread along this continuum.
We could use the raw scores
of
independent
and dependent variable and apply
appropriate statistics for
testing the hypothesis. We
could
also divide the score index
into different categories
like high "job satisfaction"
and "low
satisfaction"
for presentation in a table. We cross-classify
job satisfaction with some
other variable,
apply
appropriate statistics for
testing the hypothesis.
103
Table of Contents:
|
|||||