Day 1: Before data collection
Measurement is a pre-number crunching activity in statistics. No math is required! But to do research, you must know–at least in general–what you’re trying to prove. Let’s summarize it in five questions:
Before you conduct a study, use your theory to answer five questions: (1) what are you trying to prove, (2) what is it like in practice, (3) who is predicting whom, (4) who is being studied, and (5) what do the numbers mean? Theories are used to guide research; models are used to test theories. Because theories are composed of constructs, they are untested theoretical realities. But models are built for the purpose of being tested; they are composed of variables.
1. What Are You Trying To Prove?
Research begins in your head. It starts with your ideas (constructs). These constructs (ideas about the way life is) are mental abstractions of reality. They may be informed by your prior experience but they are not directly measurable. No one knows how to measure an idea, so later on you’ll have to convert them to practical, observable operations. But at the beginning, you get to avoid reality.
A theory is a collection of ideas. Theories determine the questions you ask, how they are asked, and who is studied. A theory is a collection of ideas. You use these idea clusters to explain the world you know and make predictions about things you haven’t yet seen. Your theory of “self concept” is not directly measurable but it can help inform your inquiry into why people act as they do. Theories give structure to our understanding. They determine the questions we ask and how we ask them.
Although there are many views of what makes a good theory, let’s use the acronym CUSSIT. Theories should be Clear, Useful, Summarize facts, Simple (small number of assumptions), Internally consistent, and Testable.
Clear. Think of this as understandable. Do you know what the theory is. Skinner’s theory of operant conditioning is clear. Piaget’s theory of child development changed over time, so it less clear.
Useful. Piaget’s theory is, surprisingly, quite useful. It reminds us that children aren’t small adults. They have their own way of thinking. To understand kids, you have to think as a child.
Summarize Facts. One of the ways theories are useful is that they summarize facts. Although there may be facts which cannot be explained, a theory should summarize as many facts as possible. In that sense, there should be evidence for and against theories. Indeed, theories provide a useful function of contrasting all of the evidence currently available in an area of knowledge. But summarizing facts with only a few assumptions is extremely difficult. The more area a theory covers and the more things it explains, the less likely it is to be simple. Therein lies the challenge.
Simple. Fraud’s theory is not simple. Information processing is not simple. In fact, most major theories in science are not simple. But if they were, they would be better.
Internally Consistent. This is something people can’t do: agree with ourselves. Our views are remarkably inconsistent. We believe in honesty, truth and civility for others; yet fudge our travel expenses, cheat on our spouses, and yell at people who cut us off in traffic. Theories should be better than people.
Testable Hypotheses. It is not enough to have hypotheses; they must be testable. A good theory leads to the creation and testing of many models, each of which can be examined by collecting data in the real world.
2. What’s It Like In Practice?
In order to test a theory, we convert it into a model. Like Plato’s world of ideas, theories live in their own world and do not necessarily correspond to reality. That’s why you hear the expression “It sounded good in theory.” Many theories sound good until they are tested for their performance in reality.
Although theories are pure ideas and abstractions of reality, models are much more reality based. Models give you a practical way of testing theories. Models differ from theories in their nature, their scope and their use. By their nature, theories are composed of constructs (ideas), while models are composed of variables. That is, the basic element of a model is a factor upon which people vary.
After only a few seconds of being with me, you would be able to describe some of my characteristics. Your list might include my sense of humor, my gracefulness (or lack there of), observations about how I am dressed (shoes tied, Dockers ironed, Polo shirt) or my grooming and appearance (hair standing on end, wild look in my eyes, etc.). You might guess my marital status, ethnic background, geographical upbringing and the number of languages I speak.
Whatever would be on your list, each item is a variable. It is something on which people vary. Not everyone has my level of musical ability, honesty and silliness. Some people are very musical, some very unmusical; most people, however, are in the middle. Some people are totally dishonest, some totally honest, most are in the middle.
In fact, as researchers, we believe that in every variable we can measure, there is middle area in which we will find most people. We believe that in general people are much alike. Each of is different in the combination of variables (high on musical ability, low on visual acuity, medium on honesty, etc.). But the variables themselves, when everyone is measured, will show that most people are the same on that variable.
Characteristics that do not vary are called constants. In Einstein’s famous E = mc2, c is a constant. But constants are unusual. Particularly in social science, our models are frequently, mostly, almost entirely composed of variables, factors on which people vary.
Models also differ from theories in their scope. Just as model bridges and model trains are smaller, scaled approximations of the real thing, theoretical models often test segments of a theory. Often it is impossible to test a complete model. When the underlying theory is too big, measures are unavailable or intervention is inappropriate, large model are routinely broken into smaller segments and tested separately.
Theories are used to guide research; models are used to test theories. One characteristic of variables is that they are measurable. Because theories are composed of constructs, they are untested theoretical realities. But models are build for the purpose of being tested.
We convert theories to models by operational defining what we mean. Operational definitions are explanations of what exactly was done. An operational definition for intelligence, for example, could be the score on an intelligence test. Or we could ask people to rate how intelligent they are. Or we might measure brain activity, age or brain weight. Each could be a definition of intelligence. It is up to the experimenter to identify and define what intelligence is in that particular study.
The purpose of operational definitions is clear communication. We want people to know exactly what we do in an experiment so that they could replicate it if they wanted.
In essence we use models in two ways: descriptive and inferential studies. Descriptive studies do not intentionally manipulate the environment; they simply see the world as it is. Our observations are guided by a theory but we do not control what our subjects do or manipulate the environment to ascertain its effect on their behavior.
In contrast, inferential studies have a clear hypothesis and often restrict or control environmental factors. A hypothesis is part of that communication process. We specify what we hope to find, what we expect to find, so we can compare what we see with the model we are testing.
That’s an important point: we typically begin with a model and test it. We do not begin with observations and create a model. We begin with a model of a theory and test it. The reason is that it is easy to decide that we see relationships between events we observe when in fact none exist. This error in judgment is unacceptable; it is like hallucinating: seeing things that don’t exist.
To avoid that problem, our theories guide our studies. Theories determine which questions we ask, which variables we include, and how we operationally define factors. Even naturalistic observations are guided by theories. We are very careful. We don’t want to infer causation where none exists.
In order to test a theory, you convert it into a model. Models differ from theories in their nature, their scope and their use. To convert a theory to a model requires operational definitions. Research (and statistics) can be described as being either descriptive or inferential. Inferential studies have a clear hypothesis.
3. Who Is Predicting Whom?
In general, we believe that most variables are continuous, though they sometimes appear to be discrete. A dependent variable depends on the performance of the subjects. It is an outcome, and is usually a continuous variable. An independent variable is independent of the subjects’ control. It is something the researcher selects, manipulates or induces, and is a discrete variable. Predictors and criteria can be either continuous or discrete.
People aren’t just smart and stupid, they vary on a continuous scale of intelligence. People are not just rich and poor, their earnings are better described by a continuous variable. Even drug abuse can be considered on a continuous scale (amount of drugs consumed).
Although the underlying variable is continuous, how the questions are worded can make the data appear discrete. Although years of school is a continuous variable, the question “have you ever gone to school?” would result in non-continuous (discrete) data. “Are you employed?” produces discrete information, but the number of days worked is continuous. It is possible to study a continuous variable in a way which
makes it look discrete.
Continuous data, then, is a factor which can describe people on a large scale with small steps; discrete data is a continuous variable chopped up into parts (high, medium, low; fast, slow). A discrete variable with only two levels (e.g., yes, no) has its own name: dichotomous.
Traditionally, a differentiation is made between independent and dependent variables. It is a characterization based on locus of control. A dependent variable depends on the performance of the subjects. It is anything that we measure, observe or record. A dependent variable is an outcome. In contrast, an independent variable is independent of the subjects’ control. It is something the researcher selects, manipulates or induces.
The distinction is clearest in a traditional experiment: an independent variable is manipulated and a dependent variable is measured. Such a structure provides confidence in making inferences of causation. You stomp on a foot, the person says “ouch.” You don’t stomp on the foot and the person says nothing. The clear inference is that stomping on a foot causes a person to say “ouch.”
Notice that the independent variable is a discrete variable: stomp or not-stomp. It is not measured in continuous increments of pressure but is either there or absent. A variation of this theme is to select high, medium and low levels of an independent variable but, again, the independent variable is a discrete variable that is manipulated to see what impact it has on a continuous dependent variable.
In many areas of research, variables can not be directly manipulated, if at all. It would be ridiculous and unethical to assign children to abusive and non-abusive environments to see what impact the independent variable (abuse) has on the dependent variable (self-esteem, for instance). Consequently, the independence of many “independent variables” is in question.
Also, the more complicated models of human behavior include many variables, each impacting and being impacted upon by others. These experimental designs do not lend themselves to the independent-dependent variable distinction. Consequently, there is much to recommend the replacement of independent-dependent variables with the designation of predictor-criterion.
As an alternative to the independent-dependent variable characterization, the predictor-criterion designation provides more flexibility and more accurately depicts the relationships between model components.
Predictor-criterion is more flexible because it includes discrete and continuous variables. Although a discrete predictor (stomp or don’t stomp) is good, a continuous predictor would give more information about the amount of pressure needed before you said “ouch.” Also, when it is impossible to manipulate a situation (such as height, gender, or personality type), the term “independent” doesn’t aptly describe the variable. Predictors can be discrete (like a traditional independent variable) or continuous (like a correlation or regression). The predictor-criterion distinction also is a better description of the relationship between the variables. When subjects cannot be randomly assigned to treatments, the independence of variables is in question. It is clearer to note that a particular variable is being used as a predictor of another.
This approach accommodates both traditional experimental designs and complex correlational and causal modeling designs. In addition to simple discrete predictor and continuous criterion, the same nomenclature can be used for continuous predictors, moderator variables (ones that influence only part of a model), intervening variables (variables stuck between a predictor and a criterion) and suppressor variables (variables that filter out noise).
It is important to note that in an actual research study, any variable can be a predictor or a criterion. Annual income, level of education, self-esteem, intelligence-any could be used as a predictor of another. And each could be a criterion. Since the choice is arbitrary, the choice of model components and the hypothesized interrelationships should be determined by the theory being studied. The type of relationships between model components is determined by our theoretical questions
4. Who Are You Going To Study?
Sometimes researchers want to study an entire population: the total number of subjects in a particular area of interest. As the focus of interest changes, the size of the population being studying changes. If you’re only interesting in what happens to you, the population of interest is 1.
Although we think of population as the number of people in a city or country, in research, a population is any group of interest. It can the number of people in a family, the number of dogs in a town, or the number of lights on a Christmas tree. Sometimes the population of interest is too large to measure directly. It is usually not convenient to talk to all of the people in a county or inspect all of the paper clips made daily. When the population is too large, a sample is chosen.
A selected part of a larger group is called a sample. Any group can be thought of as both a collection of smaller groups (a population) and a sample of a larger group. The students in Ms. Mendoza’s class are a population to her, a sample of all the 4th graders in the school, a sample of all of the students in the school district, etc.
Obviously, how a sample is chosen determines how well it represents the population. If the first 10 children who enter the class are selected, Ms. Mendoza might have excluded those who rode on the bus (if it ran late that day). A common practice is random selection. Each subject is selected at random from the convenient pool of subjects. Subjects are randomly selected from those students taking introductory psychology classes who want to participant in a study.
An alternative method is called stratification. Like rock walls, groups of people are composed of segments or layers. When there are certain subgroup comparisons you want to make (male-
female, rich-poor or tall-medium-short, for example), subjects are randomly selected from within the categories. First, categories of interest are selected. Then, subjects are randomly selected within each category.
However, the best way to pick a sample is random sampling. If everyone in the population of interest has an equal opportunity to be selected, the sample is unlikely to be biased in favor of any particular subgroup. This is very seldom done…for very good reasons. Although researchers want to draw conclusions about the people in general, each person does not have an equal chance of being selected. People living in rural areas-and the disabled, elderly and very young-are generally not included in studies. When interpreting research results, it is important to remember the limitations of sampling.
5. What Do The Numbers Mean?
Variables do not always use numbers in the same way. A high number on the back of a marathon runner doesn’t necessarily mean that person will run faster than one with a small number. Numbers can be nominal, ordinal, interval, or ratio.
There are a few things that need to be done before collecting data. This is pre-number crunching. No math is required! You must know–at least in general–what you’re trying to prove. Research makes many assumptions, including ideas of how research should be done. Let’s summarize it in five questions:
Case studies don’t use numbers. And N=1 studies limit the use of numbers to counting. In contrast, most other approaches to research use numbers to measure and describe groups of people. But what meaning do the numbers have?
Obviously, variables do not always use numbers in the same way. You might want to find the average age of a group of people but it’s unlikely, for example, that you will want to calculate the average ID number. You know intuitively that averaging ID numbers, room numbers, or Social Security numbers isn’t very useful. Such numbers aren’t used for their numerical value but simply as names.
A number which substitutes for a name makes no mathematical assumptions. A marathon runner with a high number of his back doesn’t necessarily run faster than one with a small number. The numbers are only used to be able to tell the difference between contestants. Such numbers are at the lowest level of assumption, and are said to be at a nominal level of measurement.
There are four levels of assumptions which can be made about numbers. At the nominal scale, we assume that the numbers we obtain can be used to simply distinguish between entities. The numbers on the jerseys of football players, for example, help us to distinguish between players. It makes no sense to add these numbers together, or find their average; each number is used as a name (nom).
The exuberance of this runner is not coming in 8th place. Although I’d be thrilled with that performance, his smile is for winning the race. The number 8 is just a nominal marker. It has no mathematical value.
In contrast, the second level of measurement, ordinal, makes two assumptions
about its numbers. An ordinal scale distinguishes between members plus places them in order. Ranking children from tallest to shortest is an ordinal measurement. Winners of a race can be placed in order of 1, 2, and 3 (first, second, and third) but it would be silly to find the average of these numbers. An ordinal scale is like a footrace in a snowstorm: it can tell who came in first but it can’t tell how far apart the runners are.
An interval scale includes both of the previous assumptions plus the assumption that the distances between numbers (intervals) are equal. The distance between a score of 8 and a score of 9 on a spelling test is the same distance apart as 3 and 4. Using an interval scale, we could tell the difference between players, find out who came in first, and determine by how much our spelling star won.
Notice, that an interval scale assumes equal intervals. In the case of a test, equal intervals means that each item is equally difficult. When the steps are not equal, the scale is ordinal. Consequently, a lot of teacher-made tests look as though they are based on an interval scale but are in fact making ordinal measurements.
The final level of measurement is ratio. A ratio scale includes the previous three assumptions and adds an absolute zero. Because of their absolute zeros, ratio scales have a unique characteristic: they can be used to make ratio comparisons. We can say that a task took twice as long (a ratio of 2 to 1), or that an object weighs a third as much (a ratio of 1 to 3). Our judgments can be described in relation to each other. We can’t do that with nominal, ordinal or interval scales.
A 0 on a spelling test doesn’t mean that the person cannot spell anything at all, only that those selected words couldn’t be spelled. The zero is not absolute. Similarly, a 0 on a Fahrenheit thermometer doesn’t indicate a total lack of heat (if it did we couldn’t have minus degrees). In contrast, time, distance, and weight are all ratio scales. A 0 on these scales indicates the total absence of that factor.
There are two problems with ratio scales. First, ratio scales are very rare. We often use interval scales (e.g., intelligence scales, reading tests, personality inventories) or ordinal scales (e.g., rating scales), but do not often use ratio scales.
The second problem is that measurement levels often are ignored. It is common for executives, teachers and others to treat ordinal and interval data as if they were on a ratio scale. Rating scales (1 to 5, 1 to 7, 1 to 10) are ordinal in nature. This is important to understand because some people make the mistake of saying that Group A did twice as well as Group B in the last survey.
When our measurements do not meet the assumptions of a ratio scale, we cannot say that a person with an IQ of 140 is twice as smart as a person with an IQ of 70. Nor can we say that a person who scores 0 on our extroversion scale is not extraverted. These are interval scales.
The way we measure determines the strength of the conclusions we draw. If we label horses as “jumpers” and “non-jumpers,” we have not made any assumptions about which is better, only that they are different. This is a nominal scale. Similarly, if we differentiate between managers and engineers at a nominal level, we make no assumptions concerning which status is best.
At an ordinal level, we could rate horses on their jumping ability or personnel on their sales ability. We could use a scale of 1 to 5, for example. Notice, that we could use a two-level scale: thumbs up, thumbs down. The only difference between a two-level ordinal scale and the nominal scale mentioned above is in the assumptions. If we assume that jumpers are better to have than non-jumpers or that sellers are better than non-sellers, the underlying scale is not nominal but ordinal.
Prejudice is a good example of this. Distinguishing between Asians and Whites, north and south, or tall and short is a nominal description. Yet, if underlying our distinctions there is an assumption of one being better than another, we have moved to a different level of measurement.
It should be clear that the number of spots on an ordinal scale is arbitrary. We are still at an ordinal level when rating on a 10-point scale, a 50-point scale or a 87-point scale. It is the underlying assumptions which determine an item’s level of measurement.
To move up to an interval scale in our horse testing, we could set up a course with obstructions for the horses to jump. Again, the number of hurtles included in the course is arbitrary and does not affect the level of its measurement. And, it should also be clear that a score of zero on our hypothetical course does not mean that a given horse cannot jump at all. We may have made all of the jumps too high for any horse to successfully clear.
If we measured how fast each horse ran the course, or how high each one jumped, the measurements would be on a ratio scale. Then, and only then, could we say that one horse jumped twice as high or ran half as fast.
In social science much of our data is ordinal. When we build a test, we usually don’t make each item of equal difficulty (one of the assumptions for an interval scale). Consequently, our measurements are more like rating scales than precision scientific instruments. Although some of our rating systems are quite complex, the data does not allow us to make fine distinctions between people. We can say one person is more generous, skilled or intelligent than another, but not by how much.
Just as horse-jumping courses usually are composed of items with varying difficulty, items of sales ability differ in difficulty. We do it to save time. With a few items of increasing difficulty, we can distinguish between poor performance and great performance. Without thinking about it, though, we have shifted the underlying level of measurement to an ordinal scale.
This shift is not necessarily bad. It allows us to make gross distinctions with only a few items. But researchers should know which level of measurement they are using. Without such knowledge, they are relying on assumptions which might not be true. We should not fool ourselves into thinking that we are measuring with more precision than is actually present.
Clearly, every level of measurement can be useful. Our tests of increasing difficulty are valuable. We don’t have to measure everything as ratio data. We can use nominal, ordinal, interval and ratio data. All are useful. Levels of measurement are themselves nominal: one level is not better than another.
Summary
- Theories are composed of: constructs.
- Models are composed of variables.
- Laws have accuracy beyond doubt.
- Principles have some predictability
- Beliefs are personal opinions.
- Four basic types of variables: independent, dependent, intervening, modifying.
- Four levels of measurement: nominal, ordinal, interval, and ratio.
- There are 6 criteria for evaluating theories: clear, useful, small number of assumptions, summarizes facts, internally consistent, and testable hypotheses.
.
Central Temdency
Want to jump ahead?
- What is statistics?
- Ten Day Guided Tour
- How To Calculate Statistics
- Start At Square One
- Practice Items
- Resources
- Final Exam