Early Version
Dr. Ken Tangen’s Statistics Safari: A 10-Day Guided Tour
Spring 2005
-2-
Statistics Safari © 2005 by Kenneth L. Tangen
-3-
“Statistics shouldn’t be hard or scary. Although it might not be as fun as a vacation but it shouldn’t be as hard as climbing a mountain. It should be more of a guided tour.
“I’ve written this to be more of a guide book than a textbook. I’ve tried to pick out interesting sites, tell stories about them, and generally point you in the right way. Naturally, the sites I choose will be rather idiosyn- cratic and incomplete but they will give you a solid background in statis- tics and a good idea of areas you’d like to explore in more depth.
“I’ve organized this tour into 10 lessons you can explore them at your own pace. Each “day” covers a major area of statistics and is composed of six parts: briefly, introduction, understand, remember, do and sum- mary.” — Ken Tangen
Additional help: www.kentangen.com prof@kentangen.com
-4-
Contents
Introduction 5 Day 1: Measurement 10
Before Collecting Data
Day 2: Central Tendency 28
Describing A Group
Day 3: Dispersion 54
Measuring Diversity
Day 4: Z-Scores 78
Self-Comparisons
Day 5: Correlation 96
Comparing A Group To Itself
Day 6: Regression 116
Predicting The Future And The Past
Day 7: Probability 143
Comparing A Group To A Standard
Day 8: T-Tests 165
Comparing Two Groups
Day 9: 1-Way ANOVA 187
Comparing Three Or More Groups
Day 10: Advanced Designs 213
Testing For Interactions
Tables 230 Basic Bacts Test 233 Formulas 235
-5-
introduction
You probably didn’t set out to be a number-cruncher. Most people don’t. Most are people- persons who spend a great deal of time learning about people, honing their communication skills, and developing intervention strategies. Many don’t like math, avoid working with num- bers, and never want to do research for a living.
Yet people and numbers seem to go together. The more people there are, the more someone has to keep track of who they are, where they come from, and where they are headed. People, whether customers, students or patients, come with test scores, evaluations, and outcome mea- sures. The more you work with people, the more data there is that has to be summarized, understood, interpreted and communicated.
Fortunately, measuring skills don’t have to be acquired, just honed. We come packaged at birth with tremendous processing ability. We do it all the time, and with very little effort. When we walk, our brains automatically calculate the slope and curve of the path. We scan for obstructions and estimate the amount of effort needed to climb a hill or jump a curb.
When we listen, we calculate frequency, distance, and quality of tone. We can quickly detect the difference between children who are playing and those who are fighting When we meet people, our internal measuring systems automatically give us data. Before we even real- ize it, our predefined questions already have been asked and the data collected. Are they taller than us? Better dressed? Younger, older, or about the same? Would we like being friends; would they like us?
Measuring people is simply a matter of applying the principles we already use to a set of numbers. The only difference is that we must use a more indirect system. Instead of looking directly at a person, we look at numbers which describe that person.
Clearly, our indirect system is not exact. People are not numbers, nor can they be com- pleted described with numbers. There are gaps in our knowledge, holes in our ability to under- stand. Our measurements are imprecise approximations and subject to errors in data collec- tion, storage, retrieval and interpretation.
To counter the imperfections of our data, we generally restrict our focus to a group of people. Measurement is like a telescope than a microscope. We do best with a general look. The more detail we try to see, the more errors we make. We’re good at broad long-distance generalizations; fairly good at medium levels of specificity, and terrible at close-up inspec- tions.
If we were studying cars, we’d be very good detecting patterns of traffic, fair at understand- ing a fleet of cars, and terrible at being auto mechanics. In other words, we would be very good at describing an apple orchard, fair at describing a basket of apples, and poor at understanding a single apple. We can clearly see the mountains on the horizon, we can drive without running into things, but our close-up vision is so poor that we can barely make out the numbers on our cel phone. We can….you get the idea.
Consequently, statistics is the study of groups of people. We look for patterns and trends. We compare one group to another. We interpret our findings as typically true for people in general.
For example, readers of this book generally will not become professional researchers. If you need research done, you’ll probably hire someone else to do the job. You’re not going to need to recall formulas at a moment’s notice but you may need to know where to find them.
Introduction
Even if all you do is watch television, you’ll need to know how much faith to put in what researchers tell you.
In other words, studying statistics is a lot like school. You’ll be exposed to a lot of material, some of which will stick, most of which will be forgotten. In fact, you wouldn’t choose to put yourself through the pain of lectures and homework except for the fact that you’ll gain some- thing in the experience that will serve you well the rest of your life.
I’ve tried to help make the process more efficient. I’ve prepared some worksheets, quizzes and handouts to help you master the basics of statistics. This collection is not intended as a full-featured textbook but as a supplement to the more traditional presentations. I’ve given lots of examples but you can stop reading them when it suits you. When you think you under- stand a topic, take a quiz and check on your progress. See if you need more practice problems or if you’re ready to move on to the next topic.
I want to make you know that I’m only going to be quasi-helpful. I have no intention of taking all of the pain out of activities that will help you. But I do want to do away with the busy work that can bury you. Just as a coach helps you focus your workout—to make sure that your muscle pains are not wasted—think of me as your statistics coach. I’m not going to lower the bar of quality because I want you to be proud of your success. I am going to provide you with enough help so that you can succeed but the success will be all yours.
What’s Stats all about?
Statistics allow us to use numbers to describe what we see. In particular, we typically use statistics to describe a group. Individuals are necessary (they are needed to form a group) but the emphasis is on group data.
Indeed, one of the reasons for using statistics is to reduce a large pile of data down to as few numbers as possible. We are particularly happy when we can reduce group data to a single representative number. This reduction process both summarizes the information and makes it easier for us to communicate it to others. Descriptive statistics are used, then, to make com- munication simple, clear and easy.
When we describe a group, we organize the data and look for common patterns. A typical pattern in much of the data we collect is that most people have similar enough scores that we can describe the entire group with a single number. Although there is some dispersion (indi- vidual differences), it is common to find that most scores bunch in the middle of the variable. This “central tendency” allows us to describe a large group with only one or two numbers.
In addition to description, we can use inferential statistics. We can use statistics to make estimates and predictions from which we infer causation. Inferential statistics tries to answer the basic question: “does the data look like this?”
“Data” of course is any collection of numbers. Whether we ask people’s opinion or mea- sure how fast they run, we’re going to end up with numbers to crunch. That’s data.
“This” is the “does the data look like this?” question can be a straight line, a mountain- shaped distribution, or a large difference that is unlikely to be to chance. We are asking if the data looks like our model. If the data looks like our model, we use the model to explain the results. If the data doesn’t look like our model, we assume that our results are due to chance.
Chance is a relatively recent philosophical assumption. In ancient times, every detail of life was thought to be caused by something or someone: wizards, poltergeists, or the sun-, wind- or moon-god. In the story of Jonah and the Great Fish, the guilt of Jonah is determined by casting lots (throwing dice or picking the short straw). The premise was that there was no such thing as a random act.
-6-
Introduction
-7-
In more modern times, the emphasis has shifted. The roll of dice is now seen as predictable by the laws of physics. The reason a 6 appears of on a die is due to chance, not to the presence of a poltergeist.
In research, we begin with the assumption that whatever pattern we see is due to chance. The reason for this assumption is caution. It is too easy to find spurious relationships and patterns. Our minds are so creative we can see patterns in nearly every ambiguous situation. We see animals in ink blots, a Big Dipper in the stars, familiar faces in clouds, and magical kingdoms in puddles of water.
The desire to avoid such speculation has led to a cautious approach to causation. When we see differences, we begin with the premise that the differences we see are due to chance. We don’t infer that the pattern is due to something other than chance until the differences are so large that they can’t be ignored. And when we see a pattern, we wait until we’re sure the pattern is reliable.
In general, research is limited to 6 comparisons, three for individual and three for groups. You can compare yourself to yourself, to a group, or to a standard. Self-comparison is simply a matter of counting and charting. Many people track their weight over time with this method. No statistics are usually used; establishing a baseline and tracking performance is often enough. Comparing yourself to a group often involves grades and test scores. Self-to-group compari- sons let you see where you are in relation to everyone else. Entrance test scores (SAT, GRE, LSAT) help describe your general level of achievement.
Comparing yourself to a standard shows your mastery level. Can you jump over a 2-foot high hurdle? How about an 8-foot high wall? Or can you pass the Basic Facts Test included in this course at a 90% criterion. Self-to-standard comparisons provide hallmarks of performance. Notice that in the three individual comparisons (self-self, self-group and self-standard) very little statistical knowledge is required. That’s because the focus of statistics is on groups, not individuals. Although groups can be compared to a standard, statistics come into play most often when a group is compared to itself or another group.
An introductory course in statistics begins by describing a single group. You will likely learn to graph all of the scores and look at its shape. Then you’ll learn to identify a score which is representative of the group and try to measure how similar the scores are. Other descriptive tasks will include techniques for comparing an individual to a group.
Along the way, you’ll learn about probability, causation and logic. You’ll discuss theories, models, variables and the importance of careful observation. Some time will be spent on levels of measurement and experimental design.
Later in the course, you’ll learn about inferential statistics. In general, this aspect of statis- tics comes down to selecting the proper procedure, doing it step by step, and interpreting the results.
Selecting the Proper Procedure
Clearly, this is most important thing to learn in statistics. A computer may be able to spit out the results faster than you can type the data in but it is essential to know which procedure to tell the computer to use.
In addition to general information management and summary procedures, there are 4 spe- cific techniques to you’re like to encounter: correlation, regression, independent t-test and 1- way ANOVA. Each can be used for a different purpose and shouldn’t be confused with the other.
Introductio
Correlations and regressions can be used to infer causation but They are the appropriate tools to use when subject can not be randomly assigned to group. Frequently, the most impor- tant topics must be studied with correlations and regressions.
Correlations indicate commonality and used to measure reliability. Tests of personality, intelligence and achievement are of little value unless their results are reliable (consistent). Typically, a test is given once and then given a week or so later to the same people. In a reliable test, people who scored high the first time should score high the second time around. Simi- larly, the low scores of the first session should be positively correlated with the low scores of the second testing session.
Regressions are appropriate for making predictions. Predicting future behavior based on past performance is possible if the past behavior is continually improving or declining. The problem with most events, like the stock market, is that past performance is not stable enough to make good future predictions.
Independent t-tests often are used to test the difference between two groups. When sub- jects are randomly assigned to new and old treatments of the flu, an independent t-test could be used to find out if one group did significantly better (or worse) than the other.
A 1-Way ANOVA is used to compare more than two groups. When subject have been randomly assigned to 3 or 4 groups (different drugs for example), the impact of the drugs on memory could be tested with a 1-Way ANOVA.
Calculating Statistics
If you can follow a recipe, you can calculate statistics. The formulas are straight forward and relatively easy to follow. With practice, anyone can become proficient at calculating sta- tistics with a hand calculator.
Although statistics is based on algebraic and mathematical concepts, the actual calcula- tions are simply combinations of adding, subtracting, multiplying and dividing. You need to find the square and squareroot signs on your calculator but advanced functions are not used. The first few times you calculate a statistic, the process is slow and requires careful attention to detail. But the more you practice the steps, the easier it becomes. At the beginning of the class, students are quite while they concentrate on doing their calculations. But by the end of the term they can calculate a correlation and chat about what they did over the weekend at the same time.
Think of statistics as a language and you’ll do find. As with any language, to be fluent requires practice, practice, an–of course–practice.
A Unique Design
Our “guided tour” is organized to cover the three types of content people can learn: facts, concepts and behaviors.
Facts are the details and minutia that make up much of what we read. Instead of clogging your mind with too many details, I’ve selected the essential components and summarized them for you. These “Basic Facts” are the details you should remember about statistics. There are no proofs, no lengthy descriptions and no formulas (you can always look them up when you need them). These are the bare facts, an outline of statistics structure.
There also is a conceptual presentation of statistics. It presents the essence of why research research is important and how to conduct it. Each concept has multiple illustrations.
And there are things to do. At some point knowledge must be put into practice, and statis- tics is no exception. Doing calculations is somehow helpful to understand the process. No one
-8-
n
Introduction
-9-
believes you’ll want to calculate a correlation by hand but you gain a better understanding of the process by doing problem sets. Even more important is knowing which procedure to se- lect. Our tour gives you practice at choosing the appropriate procedure, applying the formulas to a data set and interpreting the data.
Each “Day” Briefly
Each day starts with a quick reference of what’s in that chapter.
introduction
The introductions are to give you the gist of the matter. Just as a general understanding of electricity, physics and history help form a cultural literacy, a general understanding of how research is conducted can provide a backdrop for better understanding the world around us. This is a quick and easy explanation of what a statistics textbook and all of its complicated proofs are trying to say.
Understand
Calculating is less important than knowing how to approach a problem and which proce- dure to use. Computers can do the number crunching but they can’t supply the decision mak- ing and interpretation. Each lesson covers one basic procedure or process and explains what it is and why you’d want to use it.
Concepts by their nature are easy to keep in mind, remembering facts and doing behaviors help form a deeper understand of the concepts. For example, golf is an easy game….conceptually. “To play golf, take a stick and hit a ball into a hole.” It’s easy to remember, easy to understand, and easy to communicate. Playing golf, however, adds greatly to the understand of its general principle and of related principles, such as energy, force, angle, and friction.
Similarly, playing statistics—actually doing problems—helps deepen the understanding of what that activity entails. The things you need to remember and the things you need to be able to do are aids to your understanding of the principles of statistics. It is not enough to tell you the rules. It is much better to also give you experiences that reveal its nuances
Remember
No how conceptual the presentation, there is always a fair amount of factual trivia that also must be mastered. I’ve tried to put all of the information you need to memorize from each lesson in one spot. These “Basic Facts” are things I want you to carry in your head–everything else you can look up if when you need it.
Do
This course is more than concepts and facts. Each chapter has step-by-step instructions, prac- tice problems to calculate and simulations (story problems of plausible and implausible situ- ations).
Summary
Each chapter ends with a review, a quiz (on that chapter’s material) and progress check (test- ing your cumulative knowledge). Answers, of course, are provided.
START THE TOUR!
– 10 –
BRIEFLY
Day 1: Measurement Before Collecting Data
A doctoral candidate (a composite of many graduate students I’ve met) wanted to hire me as a statistical consultant. I agreed to meet with her and discuss her dissertation project but soon wished I hadn’t.
“Here’s all my data,” she said proudly. She had stacks and stacks of papers piled on the table. “I’ve finished with data collection and ready for statistics.”
“Great. What’s your hypothesis?”
“I don’t know.. What’s a hypothesis?”
“What are you trying to find? What’s the purpose of your study?”
“To graduate,” she said, confused by stupidity.
“Aside from being forced to do it, what results did you expect?” I wasn’t quite ready to give up. “That’s what you’re here to tell me. What procedure should I use?”
“For what? Committing suicide?”
“For the study. I’ve got all this information, what do I do with it now?”
“Throw yourself on the mercy of your committee and consider applying for law school.”
She had no idea what she hoped to learn from the study and yet she had already collected the
data. This is equivalent to arriving some place and then deciding whether you should drive, fly or walk there.
She didn’t understand that the most important part of research occurs before data is collected. Statistics begins and ends with thinking. In the middle you might do some calculating but the essence of research is thinking.
As with any tour, you have to know where you are planning to go. Even if you’re going to explore the entire world, you have to pick a direction. Using a globe or a map to find a beginning region of interest would be a place to start. The same is true of research. You have to know in general both where you are and where you want to be. If you want, you can change course from cancer research to studying the mating habits of ground hogs but must choose a place to begin your search.
Our first lesson concerns what must be done before collecting data. This is pre-number crunch- ing. No math is required! In general, here are 5 things to ask before collecting data:
DAY 1: Measurement
INTRODUCTION
1. What Are You Trying To Prove?
Good research begins with good ideas. Fortunately, when we are sitting in our arm chairs and thinking of the way the world functions, we can generate lots of ideas. These constructs (ideas about the way life is) are purely mental abstractions of reality. They can not be directly touched or measured. No one has seen a “self concept” or “personality trait.” They are only ideas.
We use our ideas to construct a theory. Theories are composed of constructs. They are collections of ideas; clusters of thought. Theories give us a framework for building our under- standing. They inform our inquiries, determine our theoretical questions, and guide our selec- tion of what and whom to study.
There are several criteria for evaluating theories. According to Morgan’s cannon (a rule of thumb from the 19th century researcher C. Lloyd Morgan), explanations should be as simple as possible. That is, given an equal choice, our preference is for the simplest explanation. From this general principle of simplicity, we also expect these collections to have the least amount of contradictions possible. A few contradictions is better than many; no contradictions is best. Put another way, theories should be internally consistent.
Another extension of our love for simplicity is that theories should have a small number of assumptions. Again, there is no absolute number that is acceptable; the smaller the better. Naturally, we hope these simple explanations are clear and useful. Clarity is better than
being vague but what constitutes clarity is somewhat….vague. Sorry.
Similarly, usefulness is not defined in specific terms but serves as a guideline that reminds
us that the hallmark of a theory is not pure truth. The theory that the earth was the center of the universe was useful in its day. It was not the complete truth; nor are our current theories completely true. But until we develop better theories, they are useful.
One of the ways theories are useful is that they summarize facts. Although there may be facts which cannot be explained, a theory should summarize as many facts as possible. In that sense, there should be evidence for and against theories. Indeed, theories provide a useful function of contrasting all of the evidence currently available in an area of knowledge.
Usefulness also implies that theories should produce testable hypotheses. It is not enough to have hypotheses; they must be testable. A good theory leads to the creation and testing of many models.
2. What’s It Like In Practice
In order to test a theory, we convert it into a model. Theories are pure ideas and abstractions from reality. Like Plato’s world of ideas, theories live in their own world and do not necessar- ily correspond to reality. That’s why you hear the expression “It sounded good in theory.” Many theories sound good until they are tested for their performance in reality.
Models give us a practical way of testing theories. Models differ from theories in their nature, their scope and their use. By their nature, theories are composed of constructs (ideas). In contrast, models are composed of variables. That is, the basic element of a model is a factor upon which people vary.
After only a few seconds of being with me, you would be able to describe some of my characteristics. Your list might include my sense of humor, my gracefulness (or lack there of), observations about how I am dressed (shoes tied, Dockers ironed, shirt casual but starched) or my grooming and appearance (hair standing on end, wild look in my eyes, etc.). You might
– 11 –
– 12 –
DAY 1: Measurement
guess my marital status, ethnic background, geographical upbringing and the number of lan- guages I speak.
Whatever would be on your list, each item is a variable. It is something on which people vary. Not everyone has my level of musical ability, honesty and silliness. Some people are very musical, some very unmusical; most people, however, are in the middle. Some people are totally dishonest, some totally honest, most are in the middle.
In fact, we believe that in every variable we can measure, there is middle area in which we will find most people. We believe that in general people are much alike. Each of is different in the combination of variables (high on musical ability, low on visual acuity, medium on hon- esty, etc.). But the variables themselves, when everyone is measured, will show that most people are the same on that variable.
Characteristics that do not vary are called constants. In Einstein’s famous E = mc2, c is a constant. But constants are unusual. Particularly in social science, our models are frequently, mostly, almost entirely composed of variables: factors on which people vary.
Models also differ from theories in their scope. Just as model bridges and model trains are smaller, scaled approximations of the real thing, theoretical models often test segments of a theory. Often it is impossible to test a complete model. When the underlying theory is too big, measures are unavailable or intervention is inappropriate, large models are routinely broken into smaller segments and tested separately.
Theories are used to guide research; models are used to test theories. One characteristic of variables is that they are measurable. Because theories are composed of constructs, they are untested theoretical realities. But models are build for the purpose of being tested.
We convert theories to models by operationally defining what we mean. Operational defi- nitions are explanations of what exactly was done. An operational definition for intelligence, for example, could be the score on an intelligence test. Or we could ask people to rate how intelligent they are. Or we might measure brain activity, age or brain weight. Each could be a definition of intelligence. It is up to the experimenter to identify and define what intelligence is in that particular study.
The purpose of operational definitions is clear communication. We want people to know exactly what we do in an experiment so that they could replicate it. We use replication to be build confidence in our beliefs. If you and I each do the same experiment in the same way and find the same results, we feel more confident that our findings are valid.
In essence we use models in two ways. We conduct both descriptive and inferential stud- ies. Descriptive studies do not intentionally manipulate the environment; they simply see the world as it is. Our observations are guided by a theory but we do not control what our subjects do or manipulate the environment to ascertain its effect on their behavior.
In contrast, inferential studies have a clear hypothesis and often restrict or control environ- mental factors. A hypothesis is part of that communication process. We specify what we hope to find, what we expect to find, so we can compare what we see with the model we are testing.
That’s an important point: we begin with a model and test it. We do not begin with obser- vations and create a model. We begin with a model of a theory and test it. The reason is that it is easy to decide that we see relationships between events we observe when in fact none exist. This error in judgement is unacceptable; it is like hallucinating: seeing things that don’t exist.
To avoid that problem, our theories guide our studies. Theories determine which questions we ask, which variables we include, and how we operationally define factors. Even naturalis- tic observations are guided by theories. We are very careful. We don’t want to infer causation where none exists.
DAY 1: Measurement
3. Who Is Predicting Whom Continuous-Discrete Variables
In general, we believe that most variables are continuous. People aren’t just smart and stupid, they vary on a continuous scale of intelligence. People are not just rich and poor, their earnings are better described by a continuous variable. Even drug abuse can be considered on a con- tinuous scale (amount of drugs consumed).
Continuous data is a factor which can describe people on a large scale with small steps. But even when the underlying variable is continuous, data can appear discrete. Discrete data is a continuous variable chopped up into parts (high, medium, low; fast, slow).
How a question is worded can change the type of data you collect. “Years of school” is a continuous variable. The answers can range from 0 to 2.3 years to 12 years or more. However, the question “have you ever gone to school?” would result in noncontinuous (discrete) data. Similarly, “Are you employed?” produces discrete information, but the number of days worked is continuous
A discrete variable with only two levels (e.g., yes, no) has its own name: dichotomous.
Independent-Dependent Variables
Traditionally, a differentiation is made between independent and dependent variables. It is a characterization based on locus of control. A dependent variable is an outcome. It depends on the performance of the subjects. In contrast, an independent variable is independent of the subjects’ control. It is something the researcher selects, manipulates or induces.
The distinction is clear in a traditional experiment: an independent variable is manipulated and a dependent variable is measured. Such a structure provides confidence in making infer- ences of causation. You stomp on a foot, the person says “ouch.” You don’t stomp on the foot and the person says nothing. The clear inference is that stomping on a foot causes a person to say “ouch.”
Notice that the independent variable is a discrete variable: stomp or not-stomp. It is not measured in continuous increments of pressure but is either there or absent. A variation of this theme is to select high, medium and low levels of an independent variable but, again, the independent variable is a discrete variable that is manipulated to see what impact it has on a continuous dependent variable.
In many areas of research, variables can not be directly manipulated, if at all. It would be ridiculous and unethical to assign children to abusive and non-abusive environments to see what impact the independent variable (abuse) has on the dependent variable (self-esteem, for instance). Consequently, the independence of many “independent variables” is in question.
Also, the more complicated models of human behavior include many variables, each im- pacting and being impacted upon by others. These experimental designs do not lend them- selves to the independent-dependent variable distinction. Consequently, there is much to rec- ommend the replacement of independent-dependent variables with the designation of predic- tor-criterion.
Predictor-Criterion
As an alternative to the independent-dependent variable characterization, the predictor-crite- rion designation provides more flexibility and more accurately depicts the relationships be- tween model components.
It is more flexible because it includes discrete and continuous variables. Although a dis- crete predictor (stomp or don’t stomp) is good, a continuous predictor would give more infor-
– 13 –
– 14 –
DAY 1: Measurement
mation about the amount of pressure needed before you said “ouch.”
When it is impossible to manipulate a situation (such as height, gender, or personality
type), the term “independent” doesn’t aptly describe the variable. Predictors can be discrete (like a traditional independent variable) or continuous (like a correlation or regression). Con- sequently, a predictor can be an independent or a dependent variable.
The predictor-criterion distinction also is a better description of the relationship between the variables. When subjects cannot be randomly assigned to treatments, the independence of variables is in question. It is clearer to note that a particular variable is being used as a predic- tor of another.
This approach accommodates both traditional experimental designs and complex correla- tional and causal modeling designs. In addition to simple discrete predictor and continuous criterion, the same nomenclature can be used for continuous predictors, moderator variables (ones that influence only part of a model), intervening variables (variables stuck between a predictor and a criterion) and suppressor variables (variables that filter out noise).
It is important to note that in an actual research study, any variable can be a predictor or a criterion. Annual income, level of education, self-esteem, intelligence—any could be used as a predictor of another. And each could be a criterion. Since the choice is arbitrary, the choice of model components and the hypothesized interrelationships should be determined by the theory being studied.
Although a predictor can be a independent variable or a dependent variable, a criterion is an outcome measure, a dependent variable. Criteria depend on the performance of the sub- jects. Every measured variable is a criterion.
4. Who Are You Going To Study
In addition to deciding what to measure, how to ask the questions and which variables should be predictors and which criteria, the researcher must decide who to study.
Sometimes researchers want to study an entire population: the total number of subjects in a particular area of interest. As the focus of interest changes, the size of the population being studying changes. If you’re only interesting in what happens to you, the population of inter- est is 1.
Although we think of population as the number of people in a city or country, in research, a population is any group of interest. It can be the number of people in a family, the number of dogs in a town, or the number of lights on a Christmas tree.
Sometimes the population of interest is too large to measure directly. It is usually not con- venient to talk to all of the people in a county or inspect all of the paper clips made daily. When the population is too large, a sample is chosen.
A selected part of a larger group is called a sample. Any group can be thought of as both a collection of smaller groups (a population) and a sample of a larger group. The students in Ms. Mendoza’s class are a population to her, a sample of all the 4th graders in the school, a sample of all of the students in the school district, etc.
Obviously, how a sample is chosen determines how well it represents the population. If the first 10 children who enter the class are selected, Ms. Mendoza might have excluded those who rode on the bus (if it ran late that day).
The best way to pick a sample is random sampling. If everyone in the population of interest has an equal opportunity to be selected, the sample is unlikely to be biased in favor of any particular subgroup. For practical reasons this is very seldom done.
DAY 1: Measurement
– 15 –
Although researchers might want to draw conclusions about the people in general, each person does not have an equal chance of being selected. People living in rural areas—and the disabled, elderly and very young—are generally not included in studies.
What is more practical is convenient sampling. Instead of selecting from a random pool that would include everyone in the world, subject usually are selected from a convenient pool of people we can convince to be in a study.. In many studies, subjects are selected from those students taking introductory psychology classes, want extra credit and choose to participant in the study.
Whatever size our subject pool turns out to be, researchers must decide how to select them A common practice is random selection. With this method, each person in the subject pool has an equal change of being selected. Being in the pool may not be random but how they are selected from the available subjects is random.
An alternative method is called stratification. When there are certain subgroup compari- sons you want to make (male-female, rich-poor or tall-medium-short, for example), subjects are randomly selected from within the categories. First, categories of interest are selected. Then, subjects are randomly selected within each category.
5. What Do The Numbers Mean
Case studies don’t use numbers. And N=1 studies limit the use of numbers to counting. In contrast, most other approaches to research use numbers to measure and describe groups of people. But what meaning do the numbers have?
Obviously, variables do not always use numbers in the same way. You might want to find the average age of a group of people but it’s unlikely, for example, that you will want to calculate the average ID number. You know intuitively that averaging ID numbers, room num- bers, or Social Security numbers isn’t very useful. Such numbers aren’t used for their numeri- cal value but simply used as names.
A number which substitutes for a name makes no mathematical assumptions. A marathon runner with a high number of his back doesn’t necessarily run faster than one with a small number. The numbers are only used to be able to tell the difference between contestants. Such numbers are at the lowest level of assumption, and are said to be at a nominal level of mea- surement. It makes no sense to add these numbers together, or find their average; each number is used as a name (nom).
In contrast, the second level of measurement, ordinal, makes two assumptions about its numbers. An ordinal scale distinguishes between members plus places them in order. Ranking children from tallest to shortest is an ordinal measurement. Winners of a race can be placed in order of 1, 2, and 3 (first, second, and third) but it would be silly to find the average of these numbers. An ordinal scale is like a footrace in a snowstorm: it can tell who came in first but it can’t tell how far apart the runners are.
An interval scale includes both of the previous assumptions plus the assumption that the distances between numbers (intervals) are equal. The distance between a score of 8 and a score of 9 on a spelling test is the same distance apart as 3 and 4. Using an interval scale, we could tell the difference between players, find out who came in first, and determine by how much our spelling star won.
Notice, that an interval scale assumes equal intervals. In the case of a test, equal intervals means that each item is equally difficult. When the steps are not equal, the scale is ordinal. Consequently, a lot of teacher-made tests look as though they are based on an interval scale but are in fact making ordinal measurements.
– 16 –
DAY 1: Measurement
The final level of measurement is ratio. A ratio scale includes the previous three assump- tions and adds an absolute zero. Because of their absolute zeros, ratio scales have a unique characteristic: they can be used to make ratio comparisons. We can say that a task took twice as long (a ratio of 2 to 1), or that an object weighs a third as much (a ratio of 1 to 3). Our judgments can be described in relation to each other. We can’t do that with nominal, ordinal or interval scales.
A 0 on a spelling test doesn’t mean that the person cannot spell anything at all, only that those selected words couldn’t be spelled. The zero is not absolute. Similarly, a 0 on a Fahren- heit thermometer doesn’t indicate a total lack of heat (if it did we couldn’t have minus de- grees). In contrast, time, distance, and weight are all ratio scales. A 0 on these scales indicates the total absence of that factor.
There are two problems with ratio scales. First, ratio scales are very rare. We often use interval scales (e.g., intelligence scales, reading tests, personality inventories) or ordinal scales (e.g., rating scales), but do not often use ratio scales.
The second problem is that measurement levels often are ignored. It is common for execu- tives, teachers and others to treat ordinal and interval data as if they were on a ratio scale. Rating scales (1 to 5, 1 to 7, 1 to 10) are ordinal in nature. This is important to understand because some people make the mistake of saying that Group A did twice as well as Group B in the last survey.
When our measurements do not meet the assumptions of a ratio scale, we cannot say that a person with an IQ of 140 is twice as smart as a person with an IQ of 70. Nor can we say that a person who scores 0 on our extroversion scale is not extraverted. These are interval scales.
Interpreting Numbers
The way we measure determines the strength of the conclusions we draw. If we label horses as “jumpers” and “non-jumpers,” we have not made any assumptions about which is better, only that they are different. This is a nominal scale. Similarly, if we differentiate between managers and engineers at a nominal level, we make no assumptions concerning which status is best.
At an ordinal level, we could rate horses on their jumping ability or personnel on their sales ability. We could use a scale of 1 to 5, for example. Notice, that we could use a two-level scale: thumbs up, thumbs down. The only difference between a two-level ordinal scale and the nominal scale mentioned above is in the assumptions. If we assume that jumpers are better to have than non-jumpers or that sellers are better than non-sellers, the underlying scale is not nominal but ordinal.
Prejudice is a good example of misusing measurement assumptions. Distinguishing be- tween Asians and Whites, north and south, or tall and short is a nominal description. Yet, if underlying our distinctions there is an assumption of one being better than another, we have moved to an ordinal distinction.
It should be clear that the number of spots on an ordinal scale is arbitrary. We are still at an ordinal level when rating on a 10-point scale, a 50-point scale or a 87-point scale. It is the underlying assumptions which determine an item’s level of measurement.
To move up to an interval scale in our horse testing, we could set up a course with obstruc- tions for the horses to jump. Again, the number of hurtles included in the course is arbitrary and does not affect the level of its measurement. And, it should also be clear that a score of zero on our hypothetical course does not mean that a given horse cannot jump at all. We may have made all of the jumps too high for any horse to successfully clear.
DAY 1: Measurement
– 17 –
If we measured how fast each horse ran the course, or how high each one jumped, the measurements would be on a ratio scale. Then, and only then, could we say that one horse jumped twice as high or ran half as fast.
Most of our data is ordinal. When we build a test, we usually don’t make each item of equal difficulty (one of the assumptions for an interval scale). Consequently, our measurements are more like rating scales than precision scientific instruments. Although some of our rating systems are quite complex, the data does not allow us to make fine distinctions between people. We can say one person is more generous, skilled or intelligent than another, but not by how much.
Just as horse-jumping courses usually are composed of items with varying difficulty, items of sales ability differ in difficulty. We do it to save time. With a few items of increasing difficulty, we can distinguish between poor performance and great performance. Without think- ing about it, though, we have shifted the underlying level of measurement to an ordinal scale.
This shift is not necessarily bad. It allows us to make gross distinctions with only a few items. But researchers should know which level of measurement they are using. Without such knowledge, they are relying on assumptions which might not be true. We should not fool ourselves into thinking that we are measuring with more precision than is actually present.
Clearly, every level of measurement can be useful. Our tests of increasing difficulty are valuable. We don’t have to measure everything as ratio data. We can use nominal, ordinal, interval and ratio data. All are useful. Levels of measurement are themselves nominal. One level is not better than another.
– 18 –
UNDERSTAND
Concepts are rules you carry in your head. They are easy to remember and apply to many situations. Here are 4 measurement concepts that will help you understand the research pro- cess. Each illustration describes a variation on the theme.
1. The process flows from the theory
Illustration 1: Linear flow. Think of research as a linear flow from theory to opera- tional definition. This approach starts with a theory. You describe the theory, create a model, select a variable and define that variable in operational terms. It’s a process of:
theory-model-variable-definition
By using this approach, your theory protects you from making chance errors. A chance explanation might occur if one day you clapped your hands and the sun came up. It’s descriptive but starting with the observation and not a theory allows chance occurrences to have greater weight than they should.
In contrast, if your theory is that you control the sun, the linear approach might lead you to test your theory by clapping one morning and not clapping the next. The tendency with observations alone is that they don’t lead to testing your notions; the linear model of research always leads to testing your assumptions.
Illustration 2: A circular flow. A different way of looking at the same process is to describe it as a circle. Start anywhere and continue on around the circle. By following the circle, watching the sun come up would lead to revising your theory about control- ling the sun, which in turn would produce a new model to test, which would help create a new theory, and so on. The wheel reminds us that science is an on-going process.
Illustration 3: A webbed flow. In practice, research feels interactive. You get an idea, think of a procedure, find an interesting fact, argue over coffee. Ideas and methods for testing them seem to come together, in any order, or only after much mental fighting. Research is creative. After years of working with different models, thinking of DNA as a twisted spiral ladder might come as a sudden insight. You might start with a problem and search for its solution. Or you might find a new mathematical procedure such as factor analysis and look for ways to use that tool. In practice, the flow of ideas feels both organized and messy. All in all, it feels like this:
DAY 1: Measurement
DAY 1: Measurement
2. All variables are continuous (or can be measured continuously)
– 19 –
Illustration 1: Age is a continuous variable when measured in years, months, days or hours. But when we group people into categories (babies, teens, middle-aged, seniors) we treat it as a discrete variable. Groupings of ages (19-24, 25-39, etc.) also are discrete (everyone in the group receives the same score) but the underlying variable (age) is continuous, not categorical.
Illustration 2: The size of your cereal box can be a discrete variable (large, medium or small) or continuous (net weight).
Illustration 3: When depression is measured by a rating scale (very depressed, slightly depressed, not depressed), the result is a discrete vari- able. But we could ask the question as a continuous variable: number of
statements of sadness the client makes. Similarly, categories of types of depression (dys- thymia, major depression, unipolar, bipolar, etc.) would form a discrete variable but levels of brain chemicals would be a continuous variable.
3. Anything can be a predictor
Illustration 1: Age can predict happiness or happiness can predict age. You can study any potential relationship, many will turn out not to be significant.
Illustration 2: How many beach balls you have could be a predictor of wealth, love of the beach, love of small children or love of odd toys on sale. Unusual variables can be predictors too.
Illustration 3: How fast you drive, how slow you turn, how many cars you own, where you bought your car, and the color, type, make and model of your car, each can be a predictor. The options are endless.
4. Theoretical questions determine which are predictors
Illustration 1: Variables are used to test the model your theory hypothesizes. Without a theory, researchers would study an endless supply of silly variables. The reason no one studies the impact of sunshine on how many wheels a car has is that there is no theory which suggests that it matters. If, however, there was a theory which suggested that sunshine and car design were related, then studying the issue would be logical and ap- propriate.
Illustration 2: If your theory is that education helps people mature emotionally, your model has two variables: education and maturity. The variables were not picked at ran- dom; they were the result of a theoretical relationship.
Illustration 3: Freudians therapists might use displacement and transference as predic- tors but would not use bilateral stereophonic sound. Behaviorists might use positive reinforcement but would not use identification, and ego strength as predictors. The theory determines which variables are selected.
– 20 – DAY 1: Measurement REMEMBER
Facts are the details of who, what, where and when. Often there are so many facts we can’t remember them all and must look them up. I’ve tried to collect the facts for each day of the tour in one place. And I’ve organized them into basic facts (statements you should try to remem- ber), formulas (statements in mathematical form) and terms (vocabulary). Underlined words are Tangenian expressions and might not be found in other books. Here are the facts about measurement:
Basic Facts:
Theories are composed of constructs. Models are composed of variables. Laws have accuracy beyond doubt. Principles have some predictability. Beliefs are personal opinions.
Four basic types of variables: independent, dependent, intervening, modifying. Four levels of measurement: nominal, ordinal, interval, and ratio.
Formulas:
There are no complicated formulas for measurement. All that is required is thinking.
Terms
constant vs variable
construct
convenient pool
continuous vs discrete
criteria vs predictors
dependent vs independent variable descriptive vs inferential statistics dichotomous
hypothesis
interval scale intervening variable levels of measurement model vs theory moderator variable nominal scale operational definition ordinal scale population vs sample population of interest random sampling random selection
ratio
stratification suppressor variable testable hypothesis
DAY 1: Measurement
DO
Step-by-Step
Research starts with an idea (a construct of a theory), converts that idea into measurable activities (operationally defining the variables), and collects the data. Naturally, the results of model testing impact the theoretical ideas and complete the research cycle. In other words, before you collect any data:
1. Come up with an idea. Ideas come from reading the literature (what other people have done), previous research you’ve done, and idiosyncratic circumstances (getting hit on the head with an apple).
2. State a hypothesis. The formal statement of a theory’s conclusion is a hypothesis.
3. Make a model. Convert your idea into something measurable.
4. Pick a population. Decide who to study, how many should be included, and where to
find them.
5. Decide on names or numbers. Decide how the numbers you collect are to be interpreted
(which level of measurement to use).
My Example
I come up with an idea when reading the newspaper, I see that the NBA basketball finals will be played tonight. I begin thinking about sports, the skill it takes, and how one would learn to become better at hook shots. In the past I’ve studied pupillary reactivity to emotional stimuli and begin formulating some ideas about how motions and sports might be linked.
My hypothesis is that anger lowers the accuracy of a player’s shot.
My model is that pictures of people playing basketball will reveal strong emotions. I opera- tionally define emotions as the ratings of 3 judges who are hired to study the photographs.
I choose young children as my population of interest because their faces are so expressive. If a relationship exists, it will show on childrens’ faces. An older group might not show their emotions so openly, even if they felt them.
Since judges will be rating the amount of emotion, my results will be measured on an ordinal scale.
YOUR EXAMPLE:
Idea
Hypothesis
Model
Population Measurement scale:
– 21 –
– 22 –
Practice Problems
1. Operationally define AGE at the nominal, ordinal, interval and ratio levels:
2. Give an illustration of the following: a. A theory:
b. A model:
Simulations
1. In a study which looks at the impact of television on imagination, which is the dependent variable:
a. television b. imagination
2. You work for a pharmaceutical company which is testing a cure for baldness. Subjects are randomly assigned by you into 4 groups. Each group differs in the number of days they have taken the wonder drug you’ve created. At the end of each group’s treatment, the number of new hairs were counted.
What is the predictor (independent variable)?
What is the criterion (dependent variable)?
DAY 1: Measurement
DAY 1: Measurement
SUMMARY
Statistics begins with thinking, not calculating. We start with a theory, convert it to a model and decide who, when and who to test it. We decide which variables to measure, how to measure them, and how to collect the data.
– 23 –
Ordinal
Used to report rank or order.
Assumes the numbers can be ar- ranged in order. Allows descriptions of 1st, 2nd and 3rd place but steps need not be the same size. Winning a close race receives the same score as an easy win.
Examples:
Finish order in contest College sports ranking
Rating scales
The finish order of your horse.
Nominal
Used as a name.
Makes no mathematical assumption. 0, 12 and 1 have no preference.
Examples:
The # on a race car
Bank ID number
Airplane model #
Part numbers
The # on the side of your horse
Ratio
Used to measure physical characteris- tics.
Assumes 0 is absolute (indicates lack of entity being measured). Allows 2:1, 3:2, “twice as much” and “half as much” comparisons. Zero means no time has elapsed or no distance has been traveled, etc.
Examples:
Distance, time and weight Temperature in Kelvin Miles per gallon
How fast your horse runs
Interval
Used to count conceptual characteristics (IQ, aggression, etc.)
Assumes numbers indicate equal units. Allows distinctions to be made between difficult and easy races but does not al- low “twice as much” comparisons. Zero does not mean lack of intelligence, etc.
Examples:
The # of test items passed. Temperature in Fahrenheit Temperature in Celsius
The # of hurdles your horse jumps
– 24 –
Progress Check 1
1. Which of the following do we select, manipu- late, impose or induce:
a. independent variable
b. dependent variable
c. moderator variable
d. suppressor variable
e. intervening variable
2. The type of relationships between model com- ponents is determined by our:
a. theoretical questions
b. empirical analysis
c. sampling error
d. statistical bias
e. control groups
3. In an actual research study, annual income could be:
a. b. c. d. e
6. Selected parts of a larger group are called: a. samples
b. percents
c. percentiles
d. population analyses
e. frequency distributions
7. A test which only orders traits in magnitude is best described as:
a. cardinal b. interval c. nominal d. ratio
e. ordinal
8. The number of obstacles a horse jumps over would be a(n) ___________ measurement.
a. cardinal b. interval c. nominal d. ratio
e. ordinal
9.Models are composed of: a. variables
b. percentiles
c. hodos
d. confounds
e. central-limit theorems
10. The process of converting theories to testable models includes:
a. confounds
b. random assignment
c. random selection
d. operational definitions e. an analysis of variance
a predictor
a criterion type I error either A or B A,BorC
4. Studies which do not intentionally manipulate the environment are said to be:
a. experimental
b. inferential
c. descriptive
d. consequential
e. inconsequential
5. An ordinal scale meets which of the following assumptions:
a. ordering in magnitude
b. equal units
c. AandB
d. A and B plus other assumptions
e. none of the above
Progress Check 1
– 25 –
Identify the independent and dependent variables in the following scenarios:
1. As a coach, you are interested in how well your athletes have learned to run. You ran- domly assigned your athletes into 2 training teams. Team A trained the old fashioned way. Team B used computer-assisted training. After training, you measured the number of minutes
needed to run around the track once.
What is the predictor (independent variable)?
What is the criterion (dependent variable)?
2. You work for a pharmaceutical company which is testing a cure for baldness. Subjects are randomly assigned by you into 4 groups. Each group differs in the number of days they have taken the wonder drug you’ve created. At the end of each group’s treatment, the number of new hairs were counted.
What is the predictor (independent variable)?
What is the criterion (dependent variable)?
3. Complete the following:
a. Theories are composed of:
b. Models are composed of:
c. Laws:
d. Principles: e. Beliefs:
– 26 –
Progress Check 1
4. List six criteria for evaluating theories: a.
b. c. d. e. f.
5. List 4 levels of measurement: a.
b. c.. d.
Progress Check 1
Answers Basic Facts
1. Complete the following:
a. Theories are composed of: constructs b. Models are composed of: variables c. Laws: accuracy beyond doubt
d. Principles: some predictability
e. Beliefs: personal opinions
2. List six criteria for evaluating theories: a. Clear
b. Useful
c. Summarizes facts
d. Small number of assumptions e. Internally consistent
f. Testable hypotheses
3. List 4 levels of measurement: a. nominal
b. ordinal c. interval d. ratio
Multiple Choice
A; A; D; C; A; A; E; B; A; D
Application
11. The predictor was Training and the criterion was Running Time (in minutes).
12. The predictor was Drug Level (number of days treated) and the criterion was number of New Hairs.
– 27 –
– 28 –
Day 2: Central Tendency Describing a group
BRIEFLY
Your tour has landed you on a desert island and two great illustrations of descriptive statis- tics come to mind: one for social science and one for natural science.
On one hand, you think like a social scientist. Encountering a new species no one have ever seen before–a cross between a Martian and a turtle–you wonder how you are going to describe this historic finding to those back home.
First you think of Freud (there’s something Freudian in that but you let it go). What would Sigmund Freud do in a situation like this? Remembering he was the father of psychoanalysis, you improvise a couch and have the Martles (Turians?) lie down and free associate whatever comes into their minds. Freud’s case study approach appeals to you because it doesn’t use any numbers. But to improve your skills, you’ll want to consider the observer effect and other information related to naturalistic observations, case studies and self-reports.
Second, you recall Skinner developed his learning theory studying one subject at a time. So you select one individual and begin a single-subject or N = 1 study. You soon realize that having only one individual to study doesn’t mean there is a lack of numbers generated. So you want to learn how to organize and plot the data.
The third approach you chose is to study the entire group. There are too many Martles to meet everyone of them, so you need to find a group representative. Since the goal is descrip- tion of the entire group, you don’t choose the tallest or the shortest. You look for a typical, most common or middle-most representative So you’ll want to know more about the mean, median and mode, finding the center of a group and why it matters.
On the other hand, the natural scientist in you wants to explore the island itself. It is quickly clear that the island is basically a mountain. Like the pile of data that sits on your desk back home: there is one large heap. It has a peak in the middle and symmetrically gets smaller and smaller on each side.
Realizing that I like giving multiple illustrations of the same point, your suspicion is that finding the center of a mountain of data is the same as finding a group representative. Let’s see if I accomplish that by the end of this chapter.
The other realization I hope you achieve is that the goal of statistics (as with most stu- dents) is to get the most information with the least amount of numbers.
DAY 2: Central Tendency
INTRODUCTION
1. Case Studies
Before trying to change situations, it’s often helpful to observe them. This process of natu- ralistic observation tries to describe the way things are. There is no attempt to change things. Indeed, there is a concerted effort to report the current conditions.
One problem with the process is that it is easy to impact a situation simply by observing it. Often the presence of an observer causes subjects to change their behavior. If someone is watching you walk, do you walk the same way? What happens if people laugh at your jokes or if they read over your shoulder? This observer effect comes from being aware of being watched.
The obvious solution is to make sure that the subjects don’t know that they are being observed. But it is not always possible to remain unobserved. Any group can be naturally observed but ethical and practical considerations limit where, when and how the study is conducted.
For one subject or a small group, clinicians use case studies. Typically, after each counsel- ing session, therapists write notes on the progress made, goal established and key points to remember. These case notes are often expanded and presented (with identifying information removed) as a case history or case study. Sigmund Freud’s theories are based on his clinical observations and case studies. Although Freud was a gifted biological researcher, he is best known for his theoretical interpretations and clinical interpretations. Like Freud, most clini- cians rely on words, not numbers.
Although case studies tend to personalize observations, they are potentially biased. The validity of the case study relies completely on the interpretations of the clinician making the study. Consequently, findings can vary widely from clinician to clinician. At their best, case studies give enough detail that the reader gets a good feel of the circumstances and the events reported.
Self-reports are case studies which are autobiographical in nature. Typically, the client counts the number of times an event occurs, takes a personal inventory of motives or describes an event in great detail. Self-report checklists, diaries and graphs are widely used by teachers, counselors and parents to track educational and behavioral progress.
2. N = 1 Studies
The experimental version of a case study can be found in studying an individual subject. These “N = 1 studies” don’t use lots of clinical subjects, there is only 1 subject.
The German learning theorist Ebbinghaus began the first person to use experiments to study memory. He made long lists of nonsense words and learned them himself. Using him- self as his only subject, Ebbinghaus charted the decline of memory over time.
The American learning theorist B.F. Skinner was another who relied on studying one sub- ject in depth. Skinner avoided complex statistical analyses in favor of counting the number of times a behavior occurred. His a-theoretical approach (no theory) made few assumptions and used replication (re-running studies) to insure the accuracy of his conclusions. By checking the results of one study as part of conducting a new study, Skinner could show his findings were robust (applied in a low of situations).
In contrast to case studies, N=1 studies use experimental controls. For example, Skinner created an experimental apparatus to hold his subject and control the amount of light, heat and other environmental factors. The apparatus (colloquially called a “Skinner box” was home for the pigeon, chicken or mouse used in the experiment. Each animal was used repeatedly and each experiment generated a tremendous amount of data about a single subject.
– 29 –
– 30 –
3. Describing A Group Organize The Data
When you study a group people, one thing becomes quickly apparent: there are a lot of numbers to handle. And the larger the group and the more complex the study, the more data there is.
Like a messy room, data must be collected and sorted until it is easy to handle. Conse- quently, numbers often are put into rows and columns. This arrangement, formally called a data matrix or data table, is a popular way to organize sales, financial and tax data. Anyone who has used a spreadsheet on a computer is familiar with this row-column organization. Every row is an entity, a person. Every column represents a measurement of some attribute.
For the sake of illustration, let’s say that you have eight clients. In order to better under- stand them, you have measured their age, level of education, their artistic ability and job performance. To protect their identity, ID numbers have be used. A data matrix which profiles this group might look like this:
ID# Perform Artistic
1 5 1 2 5 9 3 4 6 4 3 5 5 3 5 6 3 7 7 2 4 8 2 6
Ed. Age
0 17 12 18 14 23 16 21 15 33 18 42 12 55 14 26
DAY 2: Central Tendency
In this example, the first column contains the ID number of each person. The rest of the columns contain a performance rating, the number of items correct on a test of artistic ability, the number of years in school, and each person’s age (in years). Each column is a different characteristic, each row a different person.
It is important to note that once data is put into an organized form, it cannot be changed on whim. Some changes can be made but there are limits. For example, it is permissible to move an entire row or column, because their order is arbitrary. The third column can be moved between the first and second ones without disturbing the structure. Row 7 can be relocated between rows 3 and 4. But the information within a column without its affecting the other information.
If the order of items in one column changes, the other columns must be changed. Since each column describes only one characteristic of an individual, all of an individual’s scores must be kept together. In other words, you are free to move an entire person, but you cannot mix-up the component parts.
Regardless of the type of variable, a data matrix lets you scan all of the data at once. People are very good at pattern recognition, better than any computer ever built. We do it fast and effortlessly. Look at the variables listed. Is there important information which should be in- cluded here but is absent? A data matrix makes it easy to see if you remembered to include everything which interested you.
DAY 2: Central Tendency
Plot The Data
– 31 –
Some variables can be summarized with a pie chart. Expressed in percentages, each “piece” of the pie represents a segment of that variable. Simple information can be presented quickly with pie charts and they are widely used to summarize information. Researchers use them to
show the results of surveys and opinion polls. Businesses use
pie charts to show sales by region or sales by product line.
A bar graph is another categorical graph. The categories are listed from left to right and the values in each category determine the height of each bar. Like pie charts, bar graphs often display percent-
ages.
For more complex information a histogram is can be used. Pie
charts show frequencies (how much or how many) but for cumu- lative non-categorical information a histogram is a better choice. A histogram is a grouped frequency distribution. This imita- tion of a skyline of a city not only shows frequencies (height on the bars but also shows the relationship between group
members. Scores are grouped into sections and the sections are arranged from low-
est to highest (left to right).
If this was a graph of age, it would be clear that there are
lots of children, lots of elderly, but few in-betweeners. Of course,
the price of this clarity is some loss of detail. People who are
62, 63, and 71 are grouped together as if there was no differ-
ence between them. Children who are 4, 5, and 8 1/2 also are treated as equals.
Frequency Distributions
Obviously, the shortest description of a group is how many people are in it. So we don’t have to write it out each time, this “number” is usually referred to as N. A group with 10 people would then have a N of 10. N = 123 would describe a group of 123 scores. Unfortu- nately, N is not very graphic.
A better description of what a group looks like can be found by plotting its scores. This graphic representation of the scores, called a frequency distribution, gives an overview of all of the scores at once. It shows how frequently the same score occurred.
Each score is listed left to right, from lowest to highest. If no one had the same score, it would be represented by a straight line drawn from left to right.
If everyone had the same score, it would be represented by a vertical straight line.
If more than one person has the same score, the line looks more like a mountain or a bell. The number of people who have each score is indicated by the height of the figure. A fre- quency distribution is also often called a “normal curve.”
– 32 –
The Normal Curve
One of the assumptions of science is that variables are normally distributed. That is, there is a pattern to frequency distributions in general. This pattern is consistent no matter what is being measured. Whether you measure mechanical ability, knowledge of world history or tap dancing, when you arrange the scores from low to high they will look like this:
The normal “bell-shaped” curve is a symmetrical distribution. If folded in half the left side looks the same as the right side. Notice that there are more scores in the middle than anywhere else. There are a few people at each extreme, but most are lumped together.
This particular mountain-view has no special significance. It a matter of convention; a habit developed over the years. A group of people from the side might look like they are standing in line but from the top, you’d see that they are standing in a circle. Arranging people in order of height would look like a staircase from the side and like a straight line from the top. Similarly, our normal curve looks like a mountain from the side. In an aerial view, a normal curve could be drawn more like a pair of lips.
DAY 2: Central Tendency
How it is drawn isn’t as important as the principle underlying this diagram. Most people are in the middle, with a few on either end. Some of the cars manufactured are extremely reliable, some are lemons. But most are in the middle. Some people are very administratively-, sales-, or musically-talented, some are very untalented, and a whole lot of people are in the middle. Scores on a test of talent, then, could be described as a normal curve.
Skewed Distributions
When a frequency distribution doesn’t look like a bell, an alarm should ring. If your people are all the same, you need to look at the data in more detail. There should be differences between individuals. Frequency distributions are reality tests. They remind us that people are generally alike but individuals differ. Most people have the same or similar scores but there will be individual differences. Some people will score low, some high.
DAY 2: Central Tendency
– 33 –
Plots, of course, look different each time data is changed. There is some variation. Al- though two distributions seldom look exactly alike, they usually look like a bell. When the scores don’t look like a bell, the distribution is said to be skewed. A skewed distribution has an irregular, asymmetrical shape.
One cause of a skewed distribution is the presence of an outlying score (an unusually high or an unusually low score). An unusually high score impacts the distribution by pulling the high end even farther to the right. This high outlier gives the distribution a long tail on the positive (right) side. The result is a positively skewed distribution.
In a negatively skewed distribution, the distribution’s tail is pulled toward the negative direc- tion (toward the left side).
Skewed distributions also occur when a restricted sample is selected from a normal popu- lation. Even though the total distribution is normal, taking only a part makes it appear skewed. For example, if we only take the top part of a normal curve, the result would look like a positively skewed distribution:
If we select people who are exceptionally gifted in sales ability, the distribution of their scores would be positively skewed. The hard part is to decide whether the skewed distribution is a false description (due to our pre-selection process) or if it accurately reflects the factor being measured.
Here is another frequency distribution:
This double-humped camel is a bimodal distribution. Since the height of a distribution is a measure of frequency, this particular irregular shape indicates that a low score and a high score were both more common than the rest of the scores.
Bimodal distributions often are caused by selecting widely-varied groups. This pattern indicate, for example, that you serve young and old clients, and relatively few middle-aged people.
Find A Group Representative
Frequency distributions and graphs give us an overview of the data. But often we want a simple description, a more personalized description. We want to describe the entire group with as few scores as possible.
– 34 –
DAY 2: Central Tendency
Sometimes we select a few individuals and highlight their accomplishments. We might choose outstanding teachers, top athletes or best public speakers. Finding an appropriate group representative is a little like a scene in science fiction books or old movies. When the heroes land on Mars they say, “Take us to your leader.” Similarly, when confronted with a whole pile of data, we try to find one special score from the entire group.
But there are limits to this approach. The “best” are not good descriptions of what is “typi- cal.” When most are followers, finding a leader will not describe the group. Leaders are good group representatives only when most of those in the group are leaders.
And there are legal and ethical problems in putting someone’s picture on the wall and calling them “Mental Patient of the Month.” What we need is a numerical description to represent a group of scores. We need to find one score, or make up a hypothetical score, which reflects the nature of the group.
Part of the reason we look for a representative is that we believe that people are more alike than different. On any variable we measure, most people are in the middle of that distribution. If scores were toys, they would not be spread out evenly across a table top. There would not be a line of uniform depth.
They would be not be placed in nice neat stacks or columns. There are not 2 kinds of people: those in Stack A and those in Stack B.
People scores are best described as a heap. There is a pile, a hill or a mountain of scores, and most scores are somewhere in the middle of that pile. There are some scores at each end of the distribution but most scores are in the middle.
That’s good news. It means we can describe an entire group of scores with only one or two scores. Because humans are mostly alike on any given variable, we often can describe an entire group of us with a single number.
Most people have about the same amount of musical ability; some are very musical, some are very unmusical, but most are in the middle. The same is true for creativity, aggression, vocabulary and mechanical ability. Consequently, to best summarize a group of scores, we try to find the middle of that distribution, regardless of what it measures.
There are three ways to find the center of a group: mean, median, and mode. Why three? Because using statistics is like peeking at treasure through tiny, dirty windows. What we see, and how clearly, is determined by which window we use. When the view from all of the windows agree, we can be more certain that what we see is not illusion. When we see different things from different windows, we are less certain of our observations.
The three major measurements of central tendency are the mean, median and mode. If creatures from Mars came and asked to meet one person most representative of Earth’s people, the mean representative would be average person. The median person would be the one stand- ing in the middle of the group and the mode would be the most popular (most common score).
DAY 2: Central Tendency
Mean
– 35 –
The first measure of central tendency, the mean, is more commonly called the average. It is symbolized by a line over the top of variable name; if X is a variable, the mean of X is called
bar-X and is symbolized as X with a bar over it:
The mean represents the hypothetical, average, typical person. It represents the hypotheti- cal middle point that balances the entire distribution. That’s why we end up with 2.4 children or 3.1 cars; they are hypothetical estimates of the middle.
Calculating the mean
To calculate the mean:
4 Sum (add) all of the scores in a variable
4 Divide that sum by the number of scores in the distribution.
Calculate the mean of these numbers: 7
6 5 5 5 4 3
The sum of the variable called X is 35. That is:
= 35
N (number of scores) is 7.
The mean of these scores is calculated by dividing 35 by 7. So, the mean of these scores is 5. That is:
=5
Impact of outlying scores on the mean
Notice that changing an outlying score changes the mean but the median and mode remain the same.
XY 7 700 66 55 55 55 44 33
The mean of variable X is 5. But the mean of the Y is 104. Obviously, means are very sensitive to all of the scores in a distribution. Means are very democratic; every score has a say.
– 36 –
Median
In contrast to the mean, the second method of finding the center is not affected by outlying scores. The median ignores the other scores. It doesn’t care about their values, only which one is in the middle. Some people think of the median as the team maker, equally dividing the group into two teams with an equal number of scores on each side. Or the barrier running down the middle a busy street dividing it into equal halves going opposite directions.
When the scores are arranged from low to high, the median is the hypothetical middle- most score. That is, it is the point where 50% of the distribution is above and 50% is below. Since it is a hypothetical point, a median can result in such odd things as 3.2 people or 4.7 buildings, just like a mean.
Calculating a median
Finding the median in a distribution of integers is relatively easy. When there is an odd number of scores: it is the one left over when counting in from either end. When there are an even number of scores, the median is whatever the middle two scores are (if they are the same) or the halfway point between the middle-most two scores when they differ from each other.
Medians are most often used when distributions are skewed. Indeed, when data is pre- sented in medians, ask about the means. If they are quite different, the distribution is highly skewed, and the sample may not be as representative as you would like.
To calculate the median, arrange the scores in order of magnitude from high to low or from low to high (it doesn’t matter which one you choose). Select the score in the middle.
In the following numbers, the median is 7:
9 8 7 4 2
What if there’s no middle score
The median is the hypothetical middle-most score. If there is no actual middle-most score, the median is the average of the middle two scores of a distribution. So, the median for these numbers is 9:
27 17 15 11
7 5 3 1
Impact of outlying scores on the median
The median is not impacted by outlying scores. It is affected by adding or subtracting a score but not from changing an end score to a larger number. Notice that the median goes unchanged when the surrounding scores are changed.
DAY 2: Central Tendency
DAY 2: Central Tendency
The median for these scores is 5: 7
6 5 5 5 4 3
The median of these scores also is 5: 70
6 5 5 5 4 3
Remember, unlike means, medians are not affected by outlying scores. The median is sim- ply the middle-most point, regardless of who surrounds it
Median: middle of the distribution
The median is the middle of the distribution, not the middle of the raw scores. The median of the following numbers is 7, not 11:
14 12 4 11 3 7 5
First the scores must be put in order, then the middle-most score found. So, the previous scores would make the following distribution:
14 12 11
7 5 4 3
And the median of this distribution is 7.
Notice that the median of the following data is also 7:
164 5 7 222 2
– 37 –
– 38 –
Mode
In addition to means and medians, there is a third way to find a good representative. This third indicator of central tendency is the mode. To people, this score represents the most popular person. To a computer, it is the score most frequently found.
A distribution can have more than one score that appears more often than the rest. In cases where are two modes, the distribution is said to be bimodal. Anything over two modes is called multi-modal.
Calculating the mode
There are two ways to calculate this popularity. First, the mode may be found by sorting the scores and selecting the one most frequently given. The mode of this distribution is 5:
11 9 5 5 5 2
Second, and more practical in a distribution of many scores, the mode is the highest point on a frequency distribution. If a frequency distribution is accurately drawn, both approaches will yield the same result.
When we make a distribution, the scores are arranged from left to right, with the lowest scores on the left and the highest scores on the right. When everyone has the same score, the distribution is a straight horizontal line. When more than one person has the same score, the scores are stacked vertically. Consequently, a distribution where everyone had the same score would be represented by a straight vertical line
Mode: impact of outlying scores
The mode of the following distribution is 5: 5
5 3 4 5 6 7
Note, the mode is not impacted by outlying scores. The mode of the following also is 5: 700
5 5 3 4 5 6 7
DAY 2: Central Tendency
DAY 2: Central Tendency
Central tendency: why calculate it
Lots of books on statistics tell you how to calculate a mean. This is the only one which explains why. You’ll find mathematical presentations, calculator steps and lots of practice problems. But not one of those books explains why anyone would want to calculate a mean.
The reason we calculate central tendency is that chance has a pattern. If you take a bucket of blocks and dump it on the table, they will fall out in a random pattern. The blocks will not stack themselves in neat rows and columns. They will not form a picture or arrange them- selves in a circle. Yet there is a pattern.
We believe our universe is orderly. Even chance has a pattern. It’s not a pattern of arrange- ment; it’s a pattern of disarrangement. It’s a heap, a lump, a mound. A cluster with a center.
We calculate central tendency because everything we measure has a tendency to have a center. People are organized into those who clap and those who don’t. They vary on their ability to clap (jump, dance, sing, surf, type, run….). Yet in their variety there is a tendency for the variable to have a center.
Random variables look like mountains, not stair-stepped plateaus or Stonehenge geomet- ric shapes. People are pretty much alike…regardless of what we measure. There is a common- ality to us. There is a center.
We are not lost in the flat desert of data but are faced with a symmetrical mountain. So, it’s only natural to figure out how tall the mountain (mode), how balanced (mean) and where the midpoint (median) is.
Central tendency: which one to use
As you can see, it matters which number we select to represent a group. Choosing a mean when the curve is skewed gives a false impression. Although sometimes the terms, mean and median, are used as if they were interchangeable, the distinction is an important one.
When the measures of central tendency do not agree, the conclusions drawn from them can be quire different. Our median data may show that we are profitable while the means indicate that we are losing money.
In general, when one is reported, the other should also be given. Having both the means and the medians gives you a better understanding of the data. If they agree, or are close, you can feel confident that either is a good representative of the entire group. When they differ, take a look at the whole distribution.
When a presenter uses means to describe sales, ask about the median values. When medi- ans are used, ask about means. Both measures of central tendency should be close at hand.
– 39 –
– 40 –
UNDERSTAND
Concepts are rules you carry in your head. They are easy to remember and apply to many situations. There are 5 concepts related to your initial description that I want to highlight.
1. Get a feeling of what it’s like
Illustration 1: Research is path but not always a straight one. Sometimes we don’t have really clear ideas about the what to study or how to ask the right questions. Exploratory research can be helpful too. Find an area of interest and see what’s out there. Many researchers will run a little pilot study before
devoting all of their time and effort on an area of interest.
Illustration 2: One way to get a feeling for what it’s like is to have subjects talk
while they do. Verbal protocols are descriptions subjects give about how they are solving a problem. Although not usually objective enough data for making strong predictions, these talk-it-out sessions can show areas where instructions are needed, tasks need to be make easier or other controls need to be added.
2. Not all research uses numbers
Illustration 1: Metaresearch, for example, is a special kind of research that uses other people’s numbers. Sometimes it is helpful to summarize the state of science in a particular area. These meta-analyses look at all of the studies which have been done in a particular fields (such as learning, memory, etc.) and show how many studies found significant results, how many found no relationship, and where both sides should be applauded and criticized.
Illustration 2: Clinical case studies don’t use numbers. Patients are described in terms of their symptoms, unique characteristics and their response to treatment alternatives. The focus is on the symptoms.
Illustration 3: Historical analyses rely on notes, interviews with acquaintances, and the reading of diaries, books, correspondence and historical documents. Psycho- analysis of Abraham Lincoln, for example, would use words, not
numbers, to describe his relationship with his mother and the
importance of breast feeding and toilet training.
3. The emphasis is on group data
Illustration 1: Although clinicians are interested in how each patient does, in gen- eral research relies on group data. Since we are looking for general principles which apply to all people, there should be consistency in the results. Everyone might not act the same way but general principles assumes general patterns of behavior.
Illustration 2: A new drug is thought to be effective if it helps a large group of people, even though it has no impact on your heath.
Illustration 3: Similarly, elections are described by group results, not how an indi- vidual votes.
DAY 2: Central Tendency
DAY 2: Central Tendency
4. One subject is sometimes enough
Illustration 1: Most scientists only study one planet. Illustration 2: Some anthropologists only study one
– 41 –
society or tribe
2. Central Tendency
Illustration 3: Piaget built his
theory of development based on observing one family–his.
Illustration 1: A frequency distribution is like a pile of junk. It does not lie flat and organized. The more you pile on, the more it begins to look like a hill.
Illustration 2: A frequency distribution is like a camel: it has a hump…a center.
Illustration 3: A frequency distribution is like an ant hill: it has its highest spot at the center..
Illustration 4: A frequency distribution is like a hut. It is not scattered across the ground but built so that its highest spot is at the center..
Illustration 5: A frequency distribution is not like a flat desert. It is like a mountain on the horizon.
Illustration 6: A frequency distribution is not like a desert. It is like a sand dune.
– 42 –
REMEMBER
Facts are the details of who, what, where and when. Often there are so many facts we can’t remember them all and must look them up. I’ve tried to collect the facts for each day of the tour in one place. Here are the facts about central tendency.
Basic Facts:
There are 3 measures of central tendency: mean, median and mode.
Formulas:
There are no complicated formulas for central tendency. The mean is what’s commonly called an average. It’s so simple to calculate, you don’t really need a formula. To calculate a mean, simply add up the scores and divide by the number of scores. The symbol for the mean is X (or whatever letter you’re using to specify the variable of interest) with a
bar over it:
For the median, find the middle-most score; if there are 2 different scores in the middle, find the average of those two scores. For the mode, find the score that appears most often; find the highest point of a frequency distribution.
Terms:
a theoretical
average
bell-shaped curve bimodal distribution case notes; case study central tendency
data matrix
frequency distribution histogram
interval scale
mean
median
mode
N
N = 1 experiment
naturalistic observation negatively skewed distribution nominal scale
normal curve
observer effect
ordinal scale
outlying score
positively skewed distribution ratio scale
replication
self-report
skewed distribution
Skinner box
X
ΣX
DAY 2: Central Tendency
DAY 2: Central Tendency
DO
Step-by-Step: Mean
1. Sum (add) all of the scores in a variable
2. Divide that sum by the number of scores in the distribution.
Median
1. Arrange the scores in order of magnitude (from high to low or from low to high) 2. Select the score in the middle
3. If there are two middle scores, add them and divide by 2.
Mode
1. Select the most frequent score
Practice Problems:
Item 1
For the following data, calculate the measures of central tendency:
X
11
7
9
6
3
10
7
8
5
2
7
N=
ΣX = mean = median = mode =
– 43 –
Now that we’ve covered the facts and concepts of z-scores, it’s time to put what we know into practice. This section includes Step-by-Step instructions, practice problems, simulations (word problems), quiz and a progress check.
– 44 –
DAY 2: Central Tendency
Item 2
Calculate the measures of central tendency:
X
11 7 9 6 2
N=
ΣX = mean = median = mode =
Item 3
Calculate the measures of central tendency: 10
16 11 6 6 16 3
N=
ΣX = mean = median = mode =
Item 4
Calculate the measures of central tendency: 24
N=
ΣX = mean = median = mode =
2
5 11 3 2 2
Item 5
Calculate the measures of central tendency: 800
17 4 2 13
ΣX = mean = median = mode =
DAY 2: Central Tendency
– 45 –
– 46 –
DAY 2: Central Tendency
Item 6
Calculate the measures of central tendency: X
6
6 11 5 6 2
N= ΣX = mean =
Item 7
4 5 7 8
11 5
mean = median = mode =
Calculate the measures of central tendency: 2
Item 8
Calculate the measures of central tendency: 10
12 8 10 8 10 8 3
N= mean = median =
DAY 2: Central Tendency
– 47 –
Item 9
Calculate the measures of central tendency: 8
mean = median =
4 2 9 6 4
Item 10
Calculate the measures of central tendency: 101
13 14 7 7
N=
ΣX = mean = median =
– 48 –
Simulations: Item 1
As a realtor, you wonder how much money you typically make per sale ($ in thousands):
6 19 2 6 4 6
Calculate the following: mean =
median =
mode =
Is the data skewed or normally distributed?
Which measure of central tendency is the best representative for this data?
Item 2
As a home buyer, you wonder how much money you’ll have to spend fixing up your “new” house. Calculate the appropriate measure(s) of central tendency for these numbers ($ in thou- sands your neighbors spent on their places):
2 1 2 6 2
26
Which measure of central tendency is the best representative for this data?
Why is this the best representative for this data? What did you calculate the value to be?
DAY 2: Central Tendency
DAY 2: Central Tendency
SUMMARY
Review
Our goal is to find a way to summarize a large group of numbers. One part of that process is to find a group’s representative. We want one number that will tell us about the entire group. There are 3 basic choices: mean, median and mode. The mean is hypothetical average person. The median is the middle-most person. The mode is the most popular person. Frequency Distributions can be normal, positively skewed, nega-
tively skewed, constant, flat or bimodal.
– 49 –
Mean Median Mode
Item A
11 6 6 6 7 6 5 1
Item B
2 3 2 5 2 1
16
Mean Median Mode
Mean Median Mode
Item C
9 8 8 8 7 6 5 8 1 6
Item D
X
5 5 5 5 5 5
N6 SX 30 mean 5 median 5 mode 5
Item E
X
2 4 5 7
11
N
SX mean median mode
Item F
9 8 8 8 6 2 5 2 1 2
Mean Median Mode
– 50 –
1.Which of the following is defined as the middle- most score:
a. mean
b. median
c. mode
d. standard deviation
e. frequencydistribution
2.Which of the following is defined as the-most common (popular) score:
a. mean
b. median
c. mode
d. standard deviation
e. frequencydistribution
3.When a curve is skewed, which of the following is the middle-most score:
a. mean
b. median
c. mode
d. standard deviation
e. frequencydistribution
4.Which of the following is most affected by out- lying scores:
a. mean
b. median
c. mode
d. BandC
e. A, B and C are equally affected
5.When a distribution is positively skewed, the mean is __________ the median:
a. lower than
b. equal to
c. higherthan
d. twice as large as e. twice as small as
Progress Check 2
For the next five items, consider the following num- bers:
76
69
68
67
67
67
66
65
58
6.What is the median of these numbers: a. 18
b. 33 c. 58 d. 67 e. 76
7.What is the mode of these numbers: a. 18
b. 33 c. 58 d. 67 e. 76
8.What is the range of these numbers: a. 18
b. 33 c. 58 d. 67 e. 76
9. This distribution is best described as: a. normal
b. positivelyskewed c. negativelyskewed d. bimodal
e. extensive
10. Without using a calculator, what is the mean of the above data (hint: see item #9):
a. 47.5 b. 64.3 c. 67 d. 68.4 e. 68.7
Progress Check 2
1. Complete the following:
a. Theories are composed of:
b. Models are composed of:
c. Laws:
d. Principles: e. Beliefs:
2. List 4 levels of measurement: a.
b. c.. d.
3. Find the mean for these scores:
X
22 24 10 17
3 Mean ______
4. Find the mean, median and mode for these scores:
X
4 4 5 6 4 1
Mean ______ Median ______ Mode ______
– 51 –
– 52 –
Progress Check 2
5. Find the mean, median and mode for these scores:
X
4 4 9 8
13 4
Mean ______
Median ______
Mode ______
Which is the best measure of central tendency for this group of scores?
6. Find the mean, median and mode for these scores:
X
14 5 5 3 5 2 1
Mean ______
Median ______
Mode ______
Which is the best measure of central tendency for this group of scores?
Progress Check 2
Answers
Practice Problems Item #1
N= 11,ΣX=75,mean=6.8182(rounditto6.82),median=7,mode=7 Item #2
N=5,ΣX=35,mean=7,median=7,mode=nomode(notenoughscores; as- sume it takes 3 of a kind to be a mode)
Item #3
N = 7, ΣX = 68, mean = 9.71, median = 10, mode = none
Item #4
N = 7, ΣX = 49, mean = 7, median = 3, mode = 2
Item #5
ΣX = 836, mean = 167.20, median = 13, mode = none
Item #6
N = 6, ΣX = 36, mean = 6, median = 6, mode = 6
Item #7
Mean = 6, median = 5, mode = none
Item #8
N = 8, mean = 8.63, median = 9, mode = 10
Item #9
Mean = 5.5, median = 5
Item #10
Mean = 28.40, median = 13
Simulations
Item #1 Mean = 7.17, median = 6, mode = 6. Positively skewed. Median.
Item #2 Median (median = 2; mode = 2 also; mean = 6.50). Negatively skewed.
Multiple Choice:
b, c, b, a, c, d, d, a, a, c
Progress Check:
1. Complete the following:
a. Theories are composed of: constructs b. Models are composed of: variables c. Laws: accuracy beyond doubt
d. Principles: some predictability
e. Beliefs: personal opinions
2. List 4 levels of measurement: nominal, ordinal, interval, ratio
3. Mean = 15.20
4. Mean = 4, median = 4, mode = 4
5. Mean = 26, median = 6, mode = 4. The median is best measure of central tendency
for this group of scores because the distribution is positively skewed. The mean would over estimate the distribution. The mode could also be used but it is less useful than the median
6. Mean = 5, median = , mode = 5. The median is best because…. Ask a friend (or look online).
– 53 –
– 54 –
Day 3: Dispersion Measuring Diversity
BRIEFLY
Still investigating the lives of Murtles, it is clear to you that not all of them are like. There is diversity…variability…heterogenity…dispersion. There is similarity between individuals but they are not all identical. Consequently, in order to describe the entire group, some measure of similarity must also be included. There is a central tendency to the group but there is variety too.
Descriptive statistics is really a group sport. Although some research is done on individu- als, most research studies groups. By studying a group, the focus is on what we have in com- mon. A group that has a clear center point can be described by its mean, median and/or mode. Each is a possible representative of the group.
The usefulness of a central representative is not only influenced by if the group has a center but also by how much the scores vary from each other. If everyone has the same score, any of the measures of central tendency can fully represent the group. However, if the scores vary greatly from each other, central tendency is less absolute.
Means do not tell how diverse the scores are. A very homogeneous distribution (very simi- lar scores) and a very heterogeneous distribution (widely varied scores) can have the same mean. But the more varied the scores, the farther they are from the middle, the more difficult it is to summarize a distribution.
Today we tour the following topics: What is dispersion?
5 ways to measures dispersion 1. Range
2. Mean Absolute Deviation (absolute mean variance) 3. Sum of Squares
4. Variance
5. Standard deviation
Areas under the curve 1. Percentages
2. Percentiles
3. Quartiles
4. Stanines
DAY 3: Dispersion
INTRODUCTION
What is dispersion?
Dispersion is a measure of how different scores are. It is an inverse measure of cohesiveness. If everyone has the same score, there is no dispersion. The more people who have different scores, the higher the dispersion. If everyone has a different score, dispersion is at its maximum.
Dispersion is calculated by measuring how far every score is away from the mean. If all of the scores are close to the mean, dispersion is low. The more scores differ from the mean, the more dispersion there is. So a group of scores that are tightly clustered around the mean have a low amount of dispersion. A large amount of dispersion indicates the scores are more widely distributed.
When almost no one has the same score, the frequency distribution will be quite wide. There will be more width than height. The mean, median and mode are in the center but there is little agreement
amount the scores. A distribution with lots of
dispersion has little consensus. It is heteroge-
neous.
– 55 –
When almost everyone has the same score, the frequency distribution will be quite narrow. In a homogeneous group, there is more height than width. In this case, the mean, median and mode are excellent representatives of the en- tire distribution because almost all of the scores are at or near the center of the distribution.
If the center of a distribution were a state- ment of truth, dispersion would be a measure of error. When dispersion was high, the errors would be balanced on each side of the argu- ment but there would be little agreement among them. When dispersion was low, the error would be balanced but huddled around the mean.
5 Measures of Dispersion Range
Large amount of disperion
Small amount of dispersion
Medium amount of dispersion
Like all measures of dispersion, the range of scores gets larger when the dis- tribution of scores is more heterogeneous (dissimilar). The more homogeneous (similar) the scores, the smaller the range.
Range is easy to calculate. It is the highest score minus the lowest score. If the largest score is 12 and the smaller score is 10, range equals 2. If the highest score is 11 and the lowest score is 3, the range equals 8.
Although easy to calculate, range is not terribly helpful for describing a distri- bution. Without knowing what is being measured, a range of 12 is ambiguous. If we were measuring the number of points each basketball player made during a game, a range of 12 would not be surprising. But if we were measuring the num- ber of goals each hockey player made during a game, a range of 12 would be very unusual.
DAY 3: Dispersion
Range is a good way to check for input errors. If your were inputting scores from a 10-point quiz, a range of 72 would alert you to an input error. The maximum possible in a 10-point quiz is 10 and the lowest possible score is 0, the range should not be more than 10.
Consider these scores: 15
5 4 4 7 2
The high score is 15 and low score is 2, so the range of these numbers is 13.
Here are some more scores: 121
77
44 155 32 6
The high score is 155 and low score is 6, so the range of these numbers is 149.
Mean Absolute Deviation (MAD)
Like all measures of dispersion, mean absolute deviation (MAD) gets larger when the distribution of scores is more heterogeneous (dissimilar). The more homogeneous (similar) the scores, the smaller the MAD.
Let’s break MAD down into its component parts from right to left. The D stands for devia- tion. MAD is a measure of variation from the mean. To calculate MAD, the mean is subtracted from each score.
In the first column is a variable we’ll call X. The mean of this variable is 5. So 5 (column 2) is deducted from each score and the result forms column 3. Since the result is a measure of deviation from the mean, the third column is labeled d (little d).
X mean d
752 550 550 550 3 5 -2
Mean deviation sounds like it should be the mean of those little d’s (column 3). We would simply sum the column and divide by the number of scores. But there is a problem. When the little d’s are added up, they total zero (2+0+0+0-2 = 0).
But this is to be expected. We started at the mean, which is the balance point of the vari- able, and measured deviations from it. Since the mean is the center point of the distribution, deviations from it will always add up to 0. So we have two choices. We can take the absolute value of the deviations (which leads us to MAD) or we can square them (as we’ll do in Sum of Squares below).
– 56 –
DAY 3: Dispersion
– 57 –
The A of MAD stands for “absolute value” and the M stands for “mean.” When we take the absolute value of a number we ignore the sign (positive or negative) of the number. By ignor- ing the sign, the magnitude of the deviation is added and the result is no longer 0. In the above example, ignoring the positive and negative signs results in a sum of 4 (2+0+0+0+2) and a mean absolute deviation (average of the little d’s) of .80 (4 divided by 5).
So MAD (sometime called mean variance) is the average of the absolute values of the deviations from the mean. The mean is subtracted from each raw score and the resulting little d’s are averaged (ignoring whether they are positive or negative).
As you can see MAD is a bit more complicated to calculate than range but more useful as a measure of dispersion. MAD is tied to the mean, gives a quick way to describe dispersion from the mean, and is useful when describing skewed distributions.
The down side is that mean variance doesn’t describe the underlying distribution. A mean variance of 7 is larger than a mean variance of 1.2, but otherwise difficult to interpret.
Sum of Squares
Like all measures of dispersion, Sum of Squares (SS) gets larger when the distribution of scores is more dissimilar (heterogeneous). The more homogeneous (similar) the scores, the smaller the distribution.
Deviation Method
Conceptually, Sum of Squares is an extension of mean variance. Instead of taking the absolute values of the deviations, we square the critters. It doesn’t hurt them. It just gets rid of the negatives. For example:
X mean d d2 7524 5500 5500 5500 3 5 -2 4
Squaring the little d’s produces the 4th column. Then just add up (sum) the squared deviations. In this example, the sum of the squared deviations (SS) is 8.
This “deviation method” of calculating Sum of Squares is used to illustrate that it is a measure of dispersion from the mean. After using this method several times, it should be clear that is the deviations are deviating from the mean (since you have to subject the mean from each score).
It also becomes clear that they are squared deviations because you have to square each of the scores.
Once this concept is clear, you’ll be ready to know the secret: there is an easier way to calculate Sum of Squares.
Raw Score Method
The problem with the deviation method is clear when the mean is not an integer. When the mean is 5, it’s not hard to subtract it from every score. When the mean 5.387, it is difficult to know how many places to carry out each of the sub-answers. It’s not impossible to do; it’s just a pain.
It seems like some mathematician with nothing better to do must have come up with a easier way to calculate Sum of Squares. And, in truth, they did..
– 58 –
DAY 3: Dispersion
The raw score method only uses the raw scores. There are no deviations to calculate. No rounding in the middle of the problem. We just use the X values raw, unprocessed and in the order in which they come. There is no fancy setup.
All we’re going to do is add some numbers (we call is “sum”), square some numbers (mul- tiply a number by itself) and divide. In fact there’s nothing more difficult than that in the entire book. Oh, later on we get to push the square root sign but that’s more fun than anything else. It usually has its own button on the calculator; it’s easy.
I think in visual terms. Here’s a map of how I envision calculating Sum of Squares. To me it says add up the first column. I like to call that number Fred. Then square each of the num- bers in the second column and add them up. I call this number Ralph. Then square Fred, divide by the number of people in the study and call it Jack. Take Ralph and subtract Jack from it and you have SS (sum of squares). Now that makes perfect sense to me. I can see me going down the first column adding as I go. I drop the answer on the first green dot (Fred). The sideways lines remind me to square and then
add, resulting in Ralph (the second green dot).
The x and slash beside Fred reminds me to square him and divide by N (number of X scores). And Fred be-
comes Jack.
Then bring back Ralph, subtract Jack and the re-
sult is SS.
As you might guess, not everyone is as gifted at following my maps as I am. My notes always make sense to me but seldom to others. I think of it as the curse of genius.
Some people like math formulas. They think it’s easier to communicate how to calculate with Greek letters and funny squiggles. It lacks the grace of my lines and circles but those traditionalists, here is the formula we use to calculate Sum of Squares:
For those who hate math. Relax. It’s not as scary as it looks (I could have put a longer scarier version here just to impress you but I’d rather impress you with the simpleness of statistics than with its difficulty). In fact, nothing we are going to calculate is going to be any more difficult than this.
Let’s go step by step through the process. Assume this is the distribution at issue:
X
11 7 3 4 5 8
DAY 3: Dispersion
First, each number is squared:
X X2 11 121 7 49 39 4 16 5 25 8 64
Second, we sum each column. That gives us 38 for the sum of X and 284 for the sum of X2.
Third, square the sum of the first column (38) and divide it by the number of people in the study (N), which is 6. If you’re keeping track, 38 times itself = 1444. Then we divide 1444 by 6 (which equals 240.67).
Fourth, subtract what we just calculated (240.67) from the sum of second column (284). That is, 284 – 240.67 equals 43.33.
The Sum of Squares for this distribution is 43.33.
Both Ways At Once
Let’s compare the two methods. They will produce the same results but you’ll see the raw score method is much easier to calculate. For deviation method, calculate the mean of X. When you add up the X scores, you get 38. And 38 divided by 6 = 6.44.
Subtract the mean from the first X score, put the result in the second column (d). Then square d and put in in the next column (d2). Do it for each X score. And add up the d-squares (d2). The Sum of Squares is 44.33.
– 59 –
X d d2
11 4.67 21.78 7 .67 .44
3 -3.33 11.11
4 -2.33 6.44
5 -1.33 1.78
8 1.37 2.78
_________________________________________________
SS
44.33
For raw score method:
Square each X.
Sum each column.
Square the sum of the first column and divide by N. Subtract the result from the sum of the second column. Sum of Squares is 44.33
X X2 11 121 7 49 3 9 4 16 5 25 8 64
________________________ Sum 38 284 N6
SS 44.33
– 60 –
Variance
The fourth measure of dispersion is variance. Like the other measures of dispersion, the larger the variance, the more distributed the scores are. The smaller the variance, the more homogeneous the scores.
Variance of a population is always SS divided by N. This is true whether it is a large population or a small one. Regardless of how many scores are in the population, variance is Sum of Squares divided by N. Using the numbers from our previous example where the SS was 43.33 and N equaled 6, variance would be 43.33 divided by 6 (which equals 7.22).
Variance of a large sample (N is 30 or more) is also calculated by Sum of Squares divided by N. If there are 40 or 400 in the sample, variance is SS divided by N.
However, if a sample is less than 30, it is easy to underestimate the variance of the popula- tion. Consequently, it is common practice to adjust the formula for a small sample. If N is less than 30, variance is SS divided by N-1. Using N-1 instead of N results is a slightly larger estimate of variance and mitigates against the problem of using a small sample. If the above example was not a population but in fact was a small sample, variance would be 44.33 divided by 5, which gives us 8.67.
Obviously, you can’t determine whether a group of numbers is a sample or a population just by looking at them. Since a sample is a selected group of scores from a larger group, you must know who you are studying in order to distinguish between a sample and a population. Your family can be a sample of a larger group (all families living in a particular region) or a population (if you don’t want to generalize beyond your family). The 50 states can be a popu- lation of the United States or a sample the regional sections of the world.
So to calculate variance, use SS/N for populations (large and small) and for large samples. For small samples, use Sum of Squares divided by N-1. Since most research tends to use large samples of subjects, Sum of Squares divided by N is the most widely used measure of vari- ance.
Standard Deviation
Like the other four measures of dispersion, the standard deviation gets smaller as the scores get more homogeneous and larger the more heterogeneous they become. A small standard deviation indicates the scores are quite similar to the mean; a large standard deviation says the score vary from the mean.
This measure of dispersion is calculated by taking the square-root of variance. Regardless of whether you used N or N-1 to calculate variance, standard deviation is the square-root of variance. To calculated standard deviation, just push the button on your calculator with this funny symbol on it:
If variance is 7.22, the standard deviation is 2.69. If variance is 8.67, the standard deviation equals 2.94.
Technically, the square-root of a population variance is called sigma and the square-root of a sample variance is called the standard deviation. As a general rule, population measures use Greek symbols and sample parameters use English letters. Since we tend to use large samples, we’ll focus on the standard deviation.
The standard deviation is “standard” in the sense that it takes steps of equal distance from the mean. Think of it as standing at the mean and taking 3 steps in one direction. If doesn’t matter if you step toward the high end or the low end. It only takes three steps to get from the mean to the end of a distribution. If you start at the mean and go toward the positive end,
DAY 3: Dispersion
DAY 3: Dispersion
– 61 –
you’re there in 3 steps. And it’s 3 steps from the mean to the lowest end of the distribution. So the entire distribution is comprised of 6 steps (3 positive steps and 3 negative steps).
Each of these steps is equal in distance but accounts for a different amount of people. The normal curve is like a mountain. If you’re standing on top of the mountain, your first step is always your largest. In a frequency distribution of a normally distributed variable, your first step accounts for the most people. Because most scores are close to the mean, most scores fall within plus or minus one standard deviation from the mean.
In fact, that’s our definition of normal. Normal is being close to the mean. Normal musical ability is scoring at the mean plus or minus one standard deviation. Normal basketball throw- ing is at the mean, plus or minus one standard deviation.
In a normally distributed variable, the percentages are consistent, regardless of what is being measured. Starting from the mean, the first step accounts for just over 34% of the scores. The next steps has 14% and the last step has 2%. Since normal frequency distributions are symmetrical, the percentages work on either side of the man. So the entire distribution looks like this:
The majority of scores fall at the mean, plus and minus one standard deviation. Normal, then, includes 68% of the scores. There are 68% of the scores from one standard deviation below the mean to one standard deviation above the mean.
Areas under the curve
The nice thing about percents is that they can be from anywhere. You can select the top 1%, the bottom 6%, or the middle 3%. Also, you can randomly select 10% or make sure that the sample is representative of the entire distribution.
Conversely, the problem with percents is that they are not location specific. A sample of 3% can be from anywhere in the distribution.
– 62 –
DAY 3: Dispersion
In contrast, percentiles are location specific. Percentiles are cumulative percentages. Start- ing from the left of the distribution, percentiles get larger as more and more scores are in- cluded. A percentile is the percentage of scores below your score.
If you’re at the 50th percentile, 50% of the scores at below you. If you’re at the 81st percen- tile, 81% of the scores are below you. If you’re at the 4th percentile, 4% of the scores are below you.
Interestingly, your score doesn’t take up any room. If you’re at the 50th percentile, 50% are below you and 50% of the scores are above you. A percentile is a hypothetical point and doesn’t deduct anything by its presence. Eighty percent of the scores are below the 80th per- centile and 20 percent of the scores are above it.
The nice things about percentiles is that they are location specific. The 50th percentile is always at the mean, the 80th percentile is always above the mean. Although 2 percent can be from anywhere, the 2nd percentile is always the bottom 2 percent of the distribution. There can be no confusion of a percentile’s location.
The problem with percentiles is their interpretation. They sound like equal steps (65, 66, 67, etc.) but the variable being measured is not is shaped that way. Percentiles sound like the variable is shaped like this:
In this example, scores are even distributed across the variable. But that’s inconsistent with what we know about people. Most people fall in the middle of the distribution, not in equal amounts along its width.
DAY 3: Dispersion
– 63 –
Percentiles give a false impression about the middle of a distribution. Instead of being evenly spaced along a distribution, most percentiles are clustered around the mean. The per- centiles in the middle of a distribution are bunched closely together. Although it sounds much better to be at the 60th percentile than the 40th percentile, both positions are virtually at the mean.
Percentiles sound like they are evenly distributed but both the 16th percentile and the 84th percentile are only one standard deviation away from the mean. One standard deviation below the mean is the 16th percentile; one standard deviation above the mean is the 84th percentile. Although most scores are in the middle of a distribution, percentiles are too easily interpreted as being evenly distributed.
Percentiles are not alone in this problem of interpretation; they are merely the most popu- lar. Years ago, the government came up with a measure called the stanine, which divides a distribution into 9 parts. During WWII, the army needed to categorize people quickly on a number of skills (pilot, navigator, shooting, etc.). The psychologists who were doing their patriotic duty by designing the tests, took a normal bell shaped curve and drew 8 lines through it. The result was nine regions. Nine standard regions. Standard-nine regions. Stanines.
The prime feature of the stanine was that region 5 (the one with 4 regions on each side of it) held 20%. The next regions (4 and 6) had 17%, followed by regions 2 and 7 (12% and ending with regions 1 and 9 (4%). So choosing people with stanine scores of 4, 5 and 6 would equal 54% of the population.
It is easy to misinterpret stanines. Stanines are ordinal scores and not very helpful in comparing distributions. They were derived from the normal dis- tribution but don’t accurately re- flect its characteristics. As you can see, stanines and standard deviations from the mean don’t line up.
There is no theoretical basis for stanines. Back when computers were fed data on paper cards, the maximum number of holes that could be punched in a paper card without its tearing apart was nine. So we have the sta-nine. If 12 holes could have fit, we would have had the sta- twelve. Stanines were created out of convenience, not as a way of reflecting a normally dis- tributed variable.
Another approach has been the use of quartiles. Quartiles divide a distribution into 4 parts. Take the median of a distribution. You now have 2 halves. Take the median of each half. You now have 4 quartiles.
Quartiles are composed of a median and two cutoff points. Q1 is the bottom 25 percent; it’s at the 25th percentile, Q2 is at the mean, Q3 has the next 25% (and is at the 75th percentile),
DAY 3: Dispersion
and Q4 (which you never hear about) is at the extreme end of the distribution. Since Q3 is at the 75th percentile and Q1 is at the 25th percentile, 50% of the scores are between Q1 and Q3. This area is called the interquartile range.
Like percentiles and stanines, quartiles don’t reflect actual differences between scores. And they don’t remind us that most people are in the middle of distribution, with less and less on either side. They have a lot in common with school grades based on set percentages. Percentiles, stanines and quartiles are a lot like grades where an A is 90%, a B is 80% and a C is 70%. They work fine when we want everyone to pass a set standard. A criterion is where everyone is required to spell 100 words correctly, do 20 push-ups or identify 50 state capitals. In these cases, individual performance is judged against a standard, not against other people. With a criterion, all who can spell Mississippi are rewarded equally. There is no comparison between an individual and a group.
Grades, percentiles, stanines and quartiles are not good measures for person-to-group com- parisons. They work for comparing individuals to a set standard (able to multiply factions) but don’t give a good measure of whether most people can do that skill. Grades, in particular, are difficult to interpret because they are used both as criterion measures and group measures. Sometimes tests are said to be “graded on the curve.” That is, the group sets the standard, not the teacher. The “curve” in question in a normal distribution and C is the middle 50% or the middle 68%.
In reality, however, grades are more better used to measure criterion performance. In actual practice, grades don’t reflect the middle of the distribution; a C in a course does not necessar- ily represent the middle of a grade distribution. Many schools suffer from “grade inflation,” where most students are getting C+’s or B’s. In many graduates schools, the average grade in a B+. The problem with grades is that it is impossible to tell whether it was a criterion-based or group-based decision.
What we need is a measure that allows us to show where a score is located. What we need is a way to precisely place a score in a distribution. What we need is a way to compare an individual with a group. What we need is a z-score (our topic for tomorrow).
UNDERSTAND
Concepts are rules you carry in your head. They are easy to remember and apply to many situations. There are 5 concepts related to dispersion that I want to highlight.
1. The 2-14-34 Rule
Illustration 1: When you’re standing on a mountain, your first step is the largest, so the old saying goes. In statistics, a normal distribution is like a mountain. If you stand at the peak (where the mean, median and mode are), your first step is the largest. From the mean to the -1 standard deviation accounts for 34% of the scores. Similarly, from the mean to +1 standard deviation holds 34%. The next segments hold 14% and the furthest away from the mean hold 2%.
Illustration 2: A frequency distribution is like a triangle. Each side holds 50% in 3 sections: 2% at each end, 14% when getting closer to the mean, and 34% for the section closest to the mean.
Illustration 3: A frequency distribution is like a staircase with 3
– 64 –
DAY 3: Dispersion
– 65 –
2 .
steps going up and 3 steps going down. The steps are of unequal size. The first (lowest) steps each hold 2% of the scores. The next steps each holds 14%. The top steps can handle 34% each (or 68% if they are used together).
Percents are from anywhere; percentiles start at the bottom
Illustration 1: Percents can be taken from across the distribution. They can be little clusters of blobs or tiny dots systematically selected from the entire distribution. Per- cents can be the top 10%, the bottom 7% or the middle 12%.
The 5th percentile is a specific location, approximately where the cross is on this distri- bution. 5% can come from anywhere, including being se-
lected from the orange circles spread across the distribu-
tion.
Illustration : Percentiles and percents are like the differ-
ence between penguins and penguin suits (tuxedos). They
sound similar but they mean two different things. Pen-
guin suits (an old name for tuxedoes) can be found anywhere around the world. Pen-
guins are native only to one region. Percentages can be taken from anywhere in a distribution. Percentiles are location specific.
3. Grades, percentiles, stanines and quartiles are not good measures for person-to-group comparisons
Illustration 1: Everyone in the class could get the same score (if was a very smart class.
Illustration 2: Percentiles around the mean are all crowded together because that’s where most people are. Out at ends, percentiles are few and far between.
REMEMBER
Facts are the details of who, what, where and when. Here are the facts I’ve
collected for dispersion. Notice that there are 5 measures of dispersion: range, mean absolute deviation (MAD; mean variance), Sum of Squares, variance and standard deviation. Grades, percentiles, stanines and quartiles are not good measures for person-to-group comparisons.
Formulas:
There are two methods of calculating Sum of Squares: deviation and raw score. The deviation method (which is done to illustrate the concept of dispersion) subtracts the mean for every score, squares the resulting deviations and sums them. When the data includes decimal num- bers, the deviation method is very prone to rounding error. The preferred method is the raw score method. It uses this formula:
– 66 –
DAY 3: Dispersion
SS=∑X2 −
(∑X)2 N
Terms:
Tangen’s 2-14-24 rule area under the curve
d & d2
degrees of freedom (df) deviation method dispersion
frequency distribution interquartile range.
MAD (mean absolute deviation) N
N-1
percent
percentile
proportion
quartile
range
raw score method
standard deviation
stanine
Sum of Squares (SS)
variance
DO
Step By Step Instructions
1 Range
Like all measures of dispersion, the range of scores gets larger when the distribution of scores is more heterogeneous (dissimilar). The more homogeneous (similar) the scores, the smaller the range.
Range is easy to calculate. It is the highest score minus the lowest score. If the highest score is 11 and the lowest score is 3, the range equals 8. Range is the highest score minus the lowest score.
2 MAD
The mean average deviation (MAD) is a bit more complicated to calculate than range but more useful as a measure of dispersion. As the name suggests, mean variance is a measure of variation from the mean.
First, subtract the mean from each score. In the first column is a variable we’ll call X. Second, subtract the mean from each X. The mean of this variable is 5. So 5 (column 2)
DAY 3: Dispersion
– 67 –
is deducted from each score and the result forms column 3. Since the result is a measure of deviation from the mean, the third column is labeled d (little d).
X Mean d
752 550 550 550 3 5 -2
Mean variance sounds like it should be the mean of those little d’s (column 3). We would simply sum the column and divide by the number of scores. But there is a prob- lem. When the little d’s are added up, they total zero (2+0+0+0-2=0).
But this is to be expected. We started at the mean, which is the balance point of the variable, and measured deviations from it. Since the mean is the center point of the distribution, deviations from it will always add up to 0. So we have two choices. We can take the absolute value of the deviations (which leads us to mean variance) or we can square them (as we’ll do in Sums of Squares below).
Third, take the absolute value of the deviations. That is, ignore the sign (positive or negative) of each number.
Fourth, add up the absolute values. By ignoring the sign, the magnitude of the deviation is added and the result is no longer 0. In the above example, the absolute values of deviations are 2+0+0+0+2 and the sum is 4.
Fifth, divide the sum by the number of scores. In the present example that would be 4 divided by 5, giving a mean variance (average of the little d’s) of .80.
3 Sum of Squares
a. Deviation method
Conceptually, Sum of Squares is an extension of absolute mean variance. As in mean variance, we subtract the mean from each score to make a column of deviations.
Then, instead of taking the absolute values of the deviations, we square the little critters. For example:
X Mean d d-squared
7524 5500 5500 5500 3 5 -2 4
Finally, we add up the squared deviations. The sum of the squared deviations is called Sum of Squares.
In this example, the sum of the squared deviations is 8. The Sum of Squares (SS for short) is the sum of the squared deviations. Like all measures of dispersion, the larger the number, the more dispersed the distribution of raw scores. The smaller the SS, the less dispersed the scores are.
– 68 –
DAY 3: Dispersion
This “deviation method” of calculating Sum of Squares is illustrate that it is a measure of dispersion from the mean. After using this method several times, it should be clear that the Sum of Squares is the sum of the squared deviations. It is a measure of squared deviations from the mean.
Once this concept is clear, you’ll be ready to know the secret: there is an easier way to calculate Sum of Squares.
b. Raw score method
The problem with the deviation method is clear when the mean is not an integer. When the mean is 5, it’s not hard to subtract it from every score. When the mean 5.387, it is difficult to know how many places to carry out each of the sub-answers. It’s not impos- sible to do; it’s just a pain.
It seems like some mathematician with nothing better to do must have come up with a easier way to calculate Sum of Squares. And, in truth, there is an easier way.
The raw score method only uses the raw scores; there are no deviations to calculate. Here is the formula we use:
Here’s a step by step description of the process. Assume this is the distribution at issue:
X
11 7 3 4 5 8
First, each number is squared:
X X2 11 121 7 49 39 4 16 5 25 8 64
Second, sum (Σ) each column.
X X2 11 121 7 49 39 4 16 5 25 8 64
_______________ Σ 38 284
DAY 3: Dispersion
4 Variance
In 3 out of 4 possibilities, variance is SS divided by N. Use SS/N for large populations
Use SS/N for small populations
Use SS/N for large samples (30 or higher)
If N is less than 30, variance is SS divided by N-1
Using N-1 instead of N results is a slightly larger estimate of variance and mitigates against the problem of using a small sample. If the above example was a small sample, variance would be 44.33 divided by 5, which gives us 8.67.
5 Standard Deviation
This measure of dispersion is calculated by taking the square-root of variance.
Regardless of whether you used N or N-1 to calculate variance, standard deviation is the square-root of variance. If variance is 7.22, the standard deviation is 2.69. If variance is 8.67, the standard deviation equals 2.94.
Technically, the square-root of a population variance is called “sigma” and the square- root of a sample variance is called “standard deviation.”. As a general rule, population measures use Greek symbols and sample parameters use English letters. Since we tend to use samples, we’ll call it standard deviation regardless if our data is from a popula- tion or sample.
Practice Problems
Practice Item 1
Calculate the range of the following scores: 19
15 8 5 5 2
High Score _____ Low Score _____ Range _____
– 69 –
– 70 –
DAY 3: Dispersion
Practice Item 2
Assume this is a population and calculate the range and SS of the following scores: 11
9 8 7 5 7
ΣX _____ ΣX2 _____ (ΣX)2 _____ N _____ SS _____ variance _____ stdev _____
Practice Item 3
Calculate the variance of the following population: 5
9
7 18 4 2
ΣX _____ ΣX2 _____ (ΣX)2 _____ N _____ SS _____ variance _____ stdev _____
DAY 3: Dispersion
– 71 –
Practice Item 4
Calculate the SS, variance and stdev of the following population: 77
18 9 6 10 60 4
_____
_____
_____
SS variance stdev
Practice Item 5
Calculate the range, SS, variance and stdev of the following sample: 12
3 9 5 5 5 7
11 14 -1
_____
_____
_____
SS variance stdev
– 72 –
Simulations Simulation 1
As a manufacturer, you want the widgets you produce to all be identical in length. Any varia- tion is considered error. If your criterion is plus and minus 1 standard deviation from the mean, what would be “acceptable” widget length? You are a small manufacturer, here is a sample of widgets you’ve made:
2 18 9 4 51 5 2
ΣX = _____
SS = _____
stdev _____
“Acceptable” widget length is from _____ to _____ inches long.
Simulation 2
As a train conductor, you want to know how much variation occurs in amount of time it takes to collect tickets. Here is a sample of the times.
7 8 4 5 7 3 8
mean
median
SS =
variance (remember it’s a population) _____ standard dev _____
_____
_____
_____
DAY 3: Dispersion
DAY 3: Dispersion
– 73 –
Simulation 3
You want to know how much variation occurs in amount of time it takes to walk your dog. Here is sample of the times.
8 6 2 4 4
12
SS _____
variance (remember it’s a sample) _____ stdev _____
Simulation 4
As a pilot, you want to know the average number of airline employees “deadheading” with you on your daily flight to Paris. What limits does it fluctuate between 68% of the time? Here is a sample of your data:
10 13 12 12
8 6
Calculate only the following statistics:
mean median SS variance stdev
_____
_____
_____
_____
_____
68% of the time, the number of deadheaders is between _____ and _____.
– 74 –
SUMMARY
To finish off our discussion of measurement, there is a review, quiz, progress check, and chapter answers.
Review
If everyone has the same score, there is no dispersion from the mean. If everyone has a differ- ent scores, dispersion is at it’s maximum but there is no commonality in the scores. In a normal distribution, there are both repeated scores (height) and dispersion (width).
Percentiles, quartiles and stanines imply that distributions look like plateaus. Scores are as- sumed to be spread out evenly, like lines on a ruler. People are nicely organized in equal-sized containers.
SS, variance and standard deviation imply that distributions look like a mountain. Scores are assumed to be clustered in the middle, people are more alike than different. People are mostly together at the bottom on the bowl with a few sticking to the sides.
You can describe an entire distribution as 3 steps (standard deviations) to the left and 3 steps to the right of the mean. The percentages go 2, 14, 34, 34, 14, and 2. This is believed to be true of all normally distributed variables, regardless of what it measures.
DAY 3: Dispersion
Progress Check 3
– 75 –
1.The more variability in a distribution, the larger its:
a. mean
b. median
c. mode
d. standard deviation
e. all of the above
2. Sum of Squares is a measure of:
a. dispersion
b. central tendency
c. covariance
d. interpolation
e. extrapolation
3. Which of the following is the average of the squared deviations:
a. mean
b. mean absolute deviation (MAD)
c. sum of squares
d. standard deviation
e. variance
4. From one standard deviation above to one
standard deviation below the mean accounts
6.
7.
8.
9.
To calculate the variance of a population, Sum of Squares is divided by:
a. N
b. N-1
c. N-2
d. N-3
e. mean
Which of the following is the best measure of the fluctuation of a company’s stock price:
a. mean
b. median
c. mode
d. two modes
e. standard deviation
When looking for a homogeneous group of people, we should choose the one with:
a. the largest mean
b. the smallest mean
c. the largest standard deviation d. the smallest standard deviation e. the mode in the middle
A score of 50 on a test with a mean of 100 would be considered normal if the standard deviation was:
a. 1
b. 5
c. 10
d. 20
e. none of the above
A score of 50 on a test with a mean of 55 would be considered quite unusual if the standard deviation was:
a. 1
b. 5
c. 10
d. 20
e. none of the above
for what
a. 17%
b. 25%
c. 34%
d. 68%
e. 95%
percent of the scores:
following is the square root of vari-
5. Which of the
ance: 10.
a. mean
b. median
c. mode
d. mean variance
e. standard deviation
– 76 –
Progress Check
1. Complete the following:
a. Theories are composed of: b. Models are composed of: c. Laws:
d. Principles: e. Beliefs:
2. List 5 measures of dispersion: a.
b. c. d. e.
3. List 2 ways of calculating standard deviations: a.
b.
Progress Check 3
Progress Check 3
Answers
Practice items Item 1
High Score = 19 Low score = 2 Range = 17
Item 2
High score = 11
Low score = 5 Range = 6
ΣX = 47
ΣX2 = 389
(ΣX)2 = 472 = 2209 N = 6
SS = 20.83 Variance = 3.47 Stdev = 1.86
Item 3
ΣX = 45
ΣX2 = 499
(ΣX)2 = 2025
N=6
SS = 161.50
population variance = 26.92 standard deviation = 5.19
Item 4
SS = 5048
variance = 721.14
stdev = 26.85 Item 5
High score = 14
Low score = -1
Range = 15
SumX = 70
SumX2 = 676
SS = 186
sample variance = 20.67
sample standard deviation = 4.55
Simulations Item 1
mean = 13
SS = 1872
stdev = 17.66
“Acceptable” widget length is from – 4.66 to 30.66 inches long.
– 77 –
Item 2
The train conductor takes an average of 6
minutes to collect the tickets, with a meadian of 7 minutes. The SS = 24. Because it’s a sample, variance is divided by N-1. So variance = 4 and the standard deviation = 2.
Item 3
On average it takes 6 minutes to walk your
dog. SS = 64, sample variance = 12.8, and stdev = 63.58.
Item 4
As a pilot, you have an average of 10.17
employees “deadheading” with you on your daily flight to Paris. The median = 11, SS = 36.83, sample variance (N-1) = 7.37, and stdev = 2.71. 68% of the time, the number of deadheaders is between 7.46 and 12.88.
Multiple Choice
d, a, e, d, e, a, e, d, e, a
Progress Check
1. Complete the following
a. Theories are composed of: constructs b. Models are composed of: variables c. Laws: accuracy beyond doubt
d. Principles: some predictability
e. Beliefs: personal opinions measures of dispersion:
a. range
b. mean absolute deviation (MAD) c. Sum of Squares (SS)
d. variance
e. standard deviation
3. List 2 ways of calculating stanard deviations:
a. deviation method b. raw score method
2. Five
– 78 –
BRIEFLY
Day 4: Z-Scores (Self Comparisons & z-scores)
You have learned about operational definition and levels of measurement.
You’ve learned 3 measures of central tendency (mean, median and mode). You know that means work great for normal distributions but medians are better for skewed distributions.
You’ve learned 5 measures of dispersion (range, MAD, Sum of Squares, variance and stan- dard deviation). You know that dispersion is a measure of heterogeneity. A large SS indicates that almost everyone has a difference score and a small SS (or range, etc.) shows that the scores are quite homogeneous. You have learned everything there is to know about describing a single group of scores (one variable).
Now we’ll explore the individual. Although each single score represents a different indi- vidual, by itself one score doesn’t reveal much. A score of 95 doesn’t indicate anything by itself. It needs context to be properly interpreted. Scoring 95 on a test with 100 items is quite different than 95 out of 12,000. It’s the context that allows us to make comparisons.
There are only 3 things you can compare yourself to:
1. You can compare you to yourself. Acting as your own control, you might track
your weight, exercise habits, driving skills, piano playing etc. It doesn’t matter what other are doing, only how well you are doing compared to your previous experience.
2. You can compare you to a standard. You can use pass-fail comparisons for this approach. This is the statistical equivalent of a To-Do list. Just count the number of tasks achieved.
3. You can compare you to a group. The mean and standard deviation provide the context to locate a score in relation to the rest of the distribution. Using the mean as a starting point, a score can be expressed in terms of the direction and number of standard deviations it is away from the mean. This process is the statistical equivalent of “Where’s Waldo?”
DAY 4: Z-Scores
INTRODUCTION
Self to Self
Most self-to-self comparisons don’t require any statistics. Usually we simply count and chart our changes. When we want to change our exercising pattern, we might count the num- ber of laps we run or the number of push-ups we do. All in all, it’s quite simple.
First, we begin with a baseline. Before making any changes, it’s a good idea to see what the current level of performance is. We might keep a diary, a sign-in sheet or a chart, and write in every day how much exercise we did.
Second, we make a change in one aspect of the situation. In our exercise example, we might decide to reward ourselves for every lap around the track. Rewarding ourselves with cookies might not be a good health choice but it doesn’t matter to the research design.
From a design point of view, all that matters is that we make a change; we change from Condition A to Condition B. And we continue to chart the same behavior. Hopefully, we see an increase in the number of laps we run.
Third, we need to make sure that it’s not just time that in making the difference. To do that we switch back from Condition B to Condition A, and continue to chart our progress. If the re- ward was responsible for the change in behavior, performance will drop back to its original level.
Fourth, once we’ve eliminated time as an alternative explanation, we return to Condition B.
This ABAB design allows us to test the influence of a situational change on performance. We continuously monitor performance and systematically change the circumstances.
No complex statistical analyses are required.
Self to a Standard
Comparing yourself to a standard is the second type self comparison. It also doesn’t require much number crunching.
Typically, we set standards by specifying the behaviors which must be visible and the tasks which need to be accomplished. We might require students to memorize a list of words, col- lect 5 kinds of leaves or identify 32 world rivers. At their best, standards should use 3 prin- ciples:
Clear, not vague. Obviously, it is easier to make a comparison to a standard if the criteria are clearly specified. “Clean up your room” is more vague than “Hang up your clothes.” Simi- larly, “You never help out” is more vague than “Take out the garbage.”
Increase, not decrease. Usually, it is better to increase behavior than to decrease it. Con- sequently, “Eat more carrots” is better than “Eat less junk.” Similarly, “Be friendly” is better than “Don’t be rude.”
Reward, not punish. Giving a reward for good behavior is usually better than punishing bad behavior. Like pulling weeds, punishment stops a behavior but doesn’t replace it with anything else. Planting good behaviors is a better plan.
To evaluate whether people meet a standard, a checklist is created. Every accomplished task is checked off the list. Mark all that apply. You’ve had mumps or not. Performance is evaluated on a yes-no basis The amount of time it takes to mow the back yard is not important, only whether or not the task was completed.
Self to standard comparisons look a lot like To Do lists. In spelling tests, medical profiles, and the Ten Commandments, no complex statistical processes are required.
– 79 –
– 80 –
Self to Group
Comparing yourself to a group is the third type self comparison. It is a matter of judging your behavior by comparing it to what everyone else did. In contrast to comparing yourself to a standard (did you jump off the roof into the pool; yes or no), self-group comparisons allow “everyone was doing it” explanations. The important thing is whether you did what the group did.
A common self-group comparison is grade equivalents. If you are reading at a 2nd grade level, your performance is equivalent to that group.
Another way of comparing an individual to a group is with percentiles, which are cumula- tive percentages. You can take the top 5 percent, the middle 23%, the bottom 9% or 2% selected evenly from across the entire distribution. But percentages accumulate.
A percentile is the percentage of scores below you. At the 26th percentile, there are 26% of the scores below you. At the 2nd percentile, 2% of the scores are below you and 98% of the scores are above you.
Each percentile is a specific point on the distribution; it never varies. In a normal distribu- tion, 50% of the scores are below the mean and 50% of the scores are above the mean. Conse- quently, the mean is always at the 50th percentile. At the 35th percentile, you are always below the mean, never in the top group. It is a set point.
An even better indicator of score location is called a z-score. Like percentile, z scores have a set location. A z score of -1.0 is always in the same spot: one standard deviation below the mean.
Z scores are the number of standard deviations you are away from the mean. Positive z scores are above the mean and negative z scores are below the mean. If z = 0, you’re at the mean.
A z score tells you the number of steps you’ve taken from the mean and which way you’ve headed (positive or negative). The standard deviation tells you how big those steps are.
z score
This is measure that indicates the distance an individual score is from the mean of a distri- bution. If a score is at the mean, it has a z-score of 0. Scores above the mean are positive and scores that are located below the mean are negative.
In practical terms, z-scores range from -3 to +3. A z of -3 indicates that the raw score is 3 standard deviations below the mean (at the extreme left end of the distribution). A z of 3 indicates that the raw score is at the extreme right end of the distribution.
Since z-scores are expressed in units of standard deviation, they are independent of the variable being measured. A z-score of -1.5 is one and a half standard deviations below the mean, regardless If z = .5, the score is located at one half standard deviation above the mean.
DAY 4: Z-Scores
DAY 4: Z-Scores
– 81 –
Composed of two parts, the z-score has both magnitude and sign. The magnitude can be interpreted as the number of standard deviations the raw score is away from the mean. The sign indicates whether the score is above the mean (+) or below the mean (-). To calculate the z-score, subtract the mean from the raw score and divide that answer by the standard deviation of the distribution. In formal terms, the formula is:
Using this formula, we can find z for any raw score, assuming we know the mean and standard deviation of the distribution. What is the z-score for a raw score of 115, a mean of 100 and a standard deviation of 10? First, we find the difference between the score and the mean, which in this case would be 115-100 = 10. The result is divided by the standard devia- tion (10 divided by 10 = 1). With a z score of 1, we know that the raw score of 115 is one standard deviation above the mean for this distribution being studied.
There are 5 primary applications of z-scores.
5 z-Score Applications
1. Description
First, z-scores can be used for describing the location of an individual score. What is the z- score for a raw score of 104, a mean of 110 and a standard deviation of 12? 104-110 equals – 6; -6 divided by 12 equals -.5. The raw score of 104 is one-half a standard deviation below the mean.
Second, raw scores can be evaluated in relation to some set z-score standard; a cutoff score. For example, all of the scores above a cutoff z-score of 1.65 could be accepted. In this case, z-scores provide a convenient way of describing a frequency distribution regardless of what variable is being measured.
Each z score’s location in a distribution is associated with an area under the curve. A z of 0 is at the 50th percentile and indicates that 50% of the scores are below that point. A z score of 2 is associated with the 98th percentile. If we wanted to select the top 2% of the individuals taking a musical ability test, we would want those who had a z score of 2 or higher. Z scores allow us to compare an individual to a standard regardless of whether the test had a mean of 60 or 124.
Most statistics textbooks have a table that shows the percentage of scores at any given point of a normal distribution. You can begin with a z score and find an area or begin with an area and find the corresponding z score. Areas are listed as decimals: .5000 instead of 50%. In order to save space, tables only list positive values are shown. The tables also assume you know that 50% of the scores fall below the mean and 50% above the mean. The table usually has 3 columns: the z score, the area between the mean and z, and the area beyond z.
The area between the mean and z is the percentage of scores located between z and the mean. A z of 0 has an area between the mean and z of 0 and the area beyond (the area toward the end of the distribution) as .5000. Although there are no negatives, notice that a z score of -0 would also have an area beyond (toward the low end of the distribution) of .5000.
A z score of .1, for example, has an area between the mean and z of .0398. That is, 3.98% of the scores fall within this area. And the third column shows that the area beyond (toward the positive end of the distribution) is .4602. If the z has -.1, the area from the mean down to that point would account for 3.98% of the scores and the area beyond (toward the negative end of the distribution) would be .4602.
– 82 –
DAY 4: Z-Scores
Areas under the curve can be combined. For example, to calculate the percentile of a z of .1, the area between the mean and z (.0398) is added to the area below z (which you know to be .5000). So the total percentage of scores below a z of .1 is 53.98 (that is, .0398 plus .5000). A z score of .1 is at the 53.98th percentile.
Third, an entire variable can be converted to z-scores. This process of converting raw scores to z-scores is called standardizing and the resulting distribution of z-scores is a normal- ized or standardized distribution. A standardized test, then, is one whose scores have been converted from raw scores to z-scores. The resultant distribution always has a mean of 0 and a standard deviation of 1.
Standardizing a distribution gets rid of the rough edges of reality. If you’ve created a nifty new test of artistic sensitivity, the mean might be 123.73 and the standard deviation might be 23.2391. Interpreting these results and communicating them to others would be easier if the distribution was smooth and conformed exactly to the shape of a normal distribution. Con- verting each score on your artistic sensitivity test to a z score, converts the raw distribution’s bumps and nicks into a smooth normal distribution with a mean of 0 and a standard deviation of 1. Z scores make life prettier.
Fourth, once converted to a standardized distribution, the variable can be linearly trans- formed to have any mean and standard deviation desired. By reversing the process, z-scores are converted back to raw score by multiplying each by the desired standard deviation and add the desired mean. Most intelligence tests have a mean of 100 and a standard deviation of 15 or 16. But these numbers didn’t magically appear. The original data looked as wobbly as your test of artistic sensitivity. The original distribution was converted to z scores and then the entire distribution was shifted.
To change a normal distribution (a distribution of z scores) to a new distribution, simply multiply by the standard deviation you want and add the mean you want. It’s easy to take a normalized distribution and convert it to a distribution with a mean of 100 and a standard deviation of 20. Begin with the z scores and multiply by 20. A z of 0 (at the mean) is still 0, a z of 1 is 20 and a z of -1 is -20. Now add 100 to each, and the mean becomes 100 and the z of 1 is now 120. The z of -1 becomes 80, because 100 plus -20 equals 80. The resulting distribu- tion will have a mean of 100 and a standard deviation of 20.
Fifth, two distributions with different means and standard deviations can be converted to z-scores and compared. Comparing distributions is possible after each distribution is con- verted into z’s. The conversion process allows previously incomparable variables to be com- pared. If a child comes to your school but she old school used a different math ability test, you can estimate her score on your school’s test by converting both to z scores.
If her score was 65 on a test with a mean of 50 and a standard deviation of 10, her z score was 1.5 on the old test (65-50 divided by 10 equals 1.5). If your school’s test has a mean of 80 and a standard deviation of 20, you can estimate her score on your test as being 1.5 standard deviations above the mean; a score of 110 on your test.
DAY 4: Z-Scores
UNDERSTAND
Concepts are rules you carry in your head. They are easy to remember and apply to many situations. There are 3 concepts related to z-scores and comparisons that I want to highlight.
1. Compare to Self
Illustration 1: When trying to fix a problem, it’s a good idea to get an idea of its current status. Teachers, counselors and parents often start with a baseline of the behavior in question.
Illustration 2: When I weigh myself, I usually say something about how the current number relates to previous measurements. If I’ve gained 2 pounds or lost 10 pounds, it is always comparing myself to myself.
Illustration 3: I had a friend whose business was losing money. He was sinking into bankruptcy. When things improves slightly, he noted that he was still sinking but slower than he had before.
Illustration 4: Self-comparison is a company saying that they sold more than they did last year.
2. Compare to Standard
Illustration 1: Many people use the 10 Com-
mandments of Moses or the 5 Pillars of Islam as standards. They measure personal performance against moral absolutes.
Illustration 2: When doctors use a checklist to diagnose a condition, they are comparing their patient to a standard.
Illustration 3: When a police officer pulls you over for speeding, saying “I drove slower than I usually do” isn’t much of defense. Traffic speed is usually compared to a standard: the posted speed limit.
Illustration 4: Comparing to a standard is when a company says that they are selling enough to break even.
3. Compare to Group
Illustration 1: Baserates for self-comparison, checklists for standards. And normal curves for comparing groups.
Illustration 2: For your horse win, it only has to run faster than the other horses in the race.
Illustration 3: If you enter a talent contest, your singing would be compared to all the other contestants.
Illustration 4: Comparing to a group is when a company says that they are selling better than other companies.
– 83 –
– 84 –
REMEMBER
Facts are the details of who, what, where and when. Here are the facts I’ve collected for z- scores.
Basic Facts
A z-score indicates how many standard deviations away a score is from the mean.
Z-scores can be used to find an individual, standardize a distribution or set a cutoff
Formulas
z=X−X s
Terms
baseline
checklist
cutoff score
grade equivalent
linear transformation magnitude
normalized distribution percentile
sign
standardized distribution standardized score z-score
DO
Step-by-Step Calculate z-scores
In formal terms, here is how to calculate a z-score:
Using this formula, we can find z for any raw score, assuming we know the mean and standard deviation of the distribution. For example, here’s how to find the z-score for a raw score of 115, a mean of 100 and a standard deviation of 10:
First, find the difference between the score and the mean, which in this case would be 115-100 = 10.
Second, divide the result by the standard deviation (10 divided by 10 = 1).
With a z score of 1, we know that the raw score of 115 is one standard deviation above the mean for this distribution being studied.
In practical terms, z-scores range from -3 to +3. Composed of two parts, the z-score has
DAY 4: Z-Scores
Now that we’ve covered the facts and concepts of z-scores, it’s time to put what we know into practice. This section includes Step-by-Step instructions, practice problems, and simulations (word problems).
DAY 4: Z-Scores
– 85 –
both magnitude and sign. The magnitude can be interpreted as the number of standard deviations the raw score is away from the mean. The sign indicates whether the score is above the mean (+) or below the mean (-).
Use z-scores as cutoff scores
A paper company has discovered that short trees make better typing paper. In their plant, they want to set the equipment to automatically reject any trees taller than 1 standard deviation above the mean. Which trees from the following population would be ac- cepted and which rejected?
14
12
11
10
10
10
9 8 6
Create a normalized distribution
Frequency distributions based on raw scores often have means and standard deviations that are inconvenient or difficult to handle. To convert a frequency distribution to a more convenient format, raw scores are converted to z-scores and a distribution of z- scores is created. This process is called “standardizing” or “normalizing” a distribution.
Here is a distribution of raw scores:
10 8 7 7 7 6 4
Simply convert each raw score to a z-score. In the raw scores above, the mean is 7 and the standard deviation is 1.69. So 10-7 is divided by 1.69 and the first z score is 1.77.
When all of the raw scores have been converted, there will be a z-score for each raw score:
Raw z
10 1.77 8 .59 70 70 70
6 – .59 4 – 1.77
This column of z-scores is a distribution of z-scores. The mean will be 0 and the stan- dard deviation will equal 1. And the distribution is now called a “standardized” or “nor- malized” distribution.
– 86 –
Make a linear transformation
Once a distribution has been “standardized” or “normalized,” it’s mean and standard devia- tion can be changed to any desired value. Here is a raw score and its “standardized” distribu- tion.
Raw z
10 1.77 8 .59 70 70 70
6 – .59 4 – 1.77
First, select the standard deviation you want and multiple each z-score by it. Assuming the new standard deviation was 10, here is how the data would look:
Raw
10 8
z * new stdv
1.77 17.7 .59 5.9
DAY 4: Z-Scores
700 700 700
6 – .59 -5.9 4 – 1.77 -17.7
Second, add the desired new mean to the each number. Assuming you wanted the new mean to be 50, here is how the data would look:
Raw
10 8 7 7 7 6 4
Compare two variables
Compare Blake’s score on the reading test from her old school to the test used her new school.
Old Test New Test
3 22 5 27 7 11 3 31
11 17 13 14 7 21 7 17
Blake’s score on the old test was 8, what would her likely score be on the new test?
z
1.77 .59
z* stdv
17.7 5.9
+new mean
67.7 55.9 50.0 50.0 50.0 44.1 32.3
0 0
0 0
0 0
– .59 -5.9
– 1.77 -17.7
The new distribution now has a mean of 50 and a standard deviation of 10.
DAY 4: Z-Scores
Practice Problems
Item 1
Calculate the z-score, assuming: X = 10
mean = 50 stdev = 10
z score = _____
Item 2
Calculate the z-score, assuming: X = 80
mean = 50
stdev = 15
z score = _____
Item 3
Calculate the z-score, assuming: X = 112
mean = 100 stdev = 15
z score = _____
Item 4
Calculate the raw score, assuming: z = 1.5
mean = 115 stdev = 10 X= _____
Item 5
Calculate the raw score, assuming: z = 2.10
mean = 100 stdev = 20 X = _____
– 87 –
– 88 –
Item 6
Calculate the raw score, assuming: z = -1.37
mean = 100 stdev = 20 X =_____
Solve for z:
1. What is the z-score for a raw score of 115, a mean of 100 and a standard deviation of 10? 2. What is the z-score for a raw score of 104, a mean of 110 and a standard deviation of 12? 3. What is the z-score for a raw score of 109, a mean of 100 and a standard deviation of 6? 4. What is the z-score for a raw score of 24, a mean of 56 and a standard deviation of 12?
Now solve for X, instead of z:
5. What is the raw score when z = 2, the mean is 100 and the standard deviation is 15? 6. What is the raw score when z = 1.3, the mean is 94 and the standard deviation is 12? 7. What is the raw score when z = -.82, the mean is 90 and the standard deviation is 11?
DAY 4: Z-Scores
DAY 4: Z-Scores
Simulations
As a broker, you are interested in the consistency of XYZ Corp.’s stock price. Your client only wants you to buy this stock when it usually low. Consider the following data (assume it is a sample):
6 6 4 6 5 3 3 2 7
What is the sum of Price: _______
What is the median of Price: _______
What is the mean of Price: _______
What is the mode of Price: _______
What is the SS of Price: _______
What is the variance of Price: _______
What is the stdev of Price: _______
At what price is at the 84th percentile?
What is the price of the stock when z = 3?
What is the price of the stock when z = -1.3
What is the price when the stock is at the 95th percentile? What is the price when the stock is at the 5th percentile?
– 89 –
– 90 –
SUMMARY
A z-score indicates how many steps a person is from the mean. A raw score below the mean corresponds to negative z score; a score which is above the mean would have a positive z. The standard deviation indicates how big each step is. Approximately 68% of the scores lie within one standard deviation of the mean. That is, a majority of the distribution is from z = -1 to z = +1.
There are 5 primary applications of z-scores: a. locating an individual score
b. using z as a standard. Individual raw scores are converted to z-scores and compared to a set standard. Two common standards are z = 1.65, which represents a 1-tailed area of 95%, and z = + 1.96 or – 1.96 (between which is a 2-tailed area of 95%).
c. standardizing a distribution and smoothing its data.
d. making a linear transformations of variables; converting the mean and standard deviation to numbers that easier to remember or handle.
e. comparing 2 raw score distributions with different means and standard deviations.
DAY 4: Z-Scores
Progress CHeck 4
1.Az-scoreof+2onatestwithameanof100and a standard deviation of 10 would equal a score of:
a. 80 b. 95 c. 100 d. 102 e. 120
2.Which of the following is calculated by subtract- ing the mean from the score and dividing by the standard deviation:
a. intelligencequotient b. z-score
c. t score
d. stanine
e. percentile
3.What area under a normal curve is associated with a z-score (1-tailed) of 1.52:
a. .063 b. .437 c. .737 d. .747 e. .937
4.How much area is beyond a z-score of .61: a. .027
b. .229 c. .271 d. .374 e. .427
5.What percentage of scores are between a z-score of -1.3 and a z of +1.65:
a. 52% b. 64% c. 68% d. 72% e. 85%
– 91 –
6.Az-scoreof-1onatestwithameanof90anda standard deviation of 10 would equal a score of:
a. 80 b. 95 c. 100 d. 102 e. 120
7.What area under a normal curve is associated with a z-score (1-tailed) of .52:
a. .0636 b. .4375 c. .6985 d. .8475 e. .9375
8.If z = – 2.61, how much area is “beyond:” a. .0045
b. .1745 c. .4570 d. .5045 e. .9045
9.What percentage of scores are between a z-score of -1.04 and a z of +.45:
a. 52% b. 64% c. 68% d. 72% e. 86%
10. A z-score of -1.5 on a test with a mean of 100 and a standard deviation of 10 would equal a score of:
a. 85 b. 95 c. 100 d. 105 e. 120
– 92 –
Progress Check 4
11. List six criteria for evaluating theories: a.
b. c. d. e. f.
12. List 4 levels of measurement: a.
b. c.. d.
Progress CHeck 4
13. Calculate the following using this population data::
X
12 8 7 7 7 5 22
Mean ______ Median ______ Mode ______ SS ______ Variance ______ Stdev ______
14. Convert the raw score distribution above to a standardized score distribution.
– 93 –
– 94 –
Answers
Item 1
X = 10, mean = 50, stdev = 10, z = – 4.00
Item 2
X = 80, mean = 50, stdev = 15, z = 2.00
Item 3
X = 112, mean = 100, stdev = 15, z = .80
Item 4
z = 1.5, mean = 115, stdev = 10, X = 130
Item 5
z = 2.10, mean = 100, stdev = 20, X = 142
Item 6
z = -1.37, mean = 100, stdev = 20, X = 72.60
Solve for z:
1. X = 115, a mean of 100 and a standard deviation of 10? 2. X = 104, a mean of 110 and a standard deviation of 12? 3. X = 109, a mean of 100 and a standard deviation of 6? 4. X = 24, a mean of 56 and a standard deviation of 12?
Now solve for X, instead of z:
Progress Check 4
5. When z = 2, the mean is 100 and the standard deviation is 15, X = 130
6. When z = 1.3, the mean is 94 and the standard deviation is 12, X = 109.60 7. When z = -.82, the mean is 90 and the standard deviation is 11, X = 80.98
Simulation
What is the sum of Price: 36
What is the mean of Price: 4.50
What is the median of Price: 5
What is the mode of Price: 6
What is the SS of Price: 20
What is the variance of Price: 2.88 (sample) What is the stdev of Price: 1.69 (sample)
At what price is at the 84th percentile? What is the price of the stock when z = 3? What is the price of the stock when z = -1.3 What is the price at the 95th percentile? What is the price at the 5th percentile?
[1*1.69] + 4.50 = 6.19 [3*1.69] + 4.50 = 9.57 [-1.3*1.69] + 4.50 = 2.30 [1.65*1.69] + 4.50 = 7.29 [-1.65*1.69] + 4.50 = 1.71
1.50
– .50 1.50
– 2.67
Progress CHeck 4
Multiple Choice
a,a,a,a,a,a,a,a,a,a
Progress Check:
1. Complete the following: a. clear
b. useful
c. small number of assumptions d. summarizes facts
e. internally consistent
f. testable hypotheses
2. List 4 levels of measurement: nominal, ordinal, interval, ratio 3. Calculate the following population:
Mean 7 Median 7 Mode 7 SS 200 variance 28.57 stdev 5.35
4. Convert the raw score distribution above to a standardized score distribution. 0.070
0.000 -0.105 -0.105 -0.105 -0.185
0.420
– 95 –
– 96 –
BRIEFLY
Day 5: Correlation Compare a group to itself
When it comes to an individual variable, you have seen all there is to see. You know which level of measurement to use, how to have central tendency (mean, median and mode), how to measure dispersion (range, MAD, Sum of Squares, variance and stdev) and how to use a z- score to compare a individual to a group. Having conquered single-variable models, it’s now time to explore something more complex.
Correlation is the first 2-variable model we’ll consider. Both variables (designated X and Y) are measures obtained from the same subjects. These are pairs of observations on one person. Twice as many scores but the same number of people. Each column will represent a different variable. Each row will signify one person.
We are still only observing. We are not twisting the tail of an elephant to see what happens. We’re watching the group and reporting the findings.
Correlations can be used in two ways. First, we can compare the same people on two characteristics. A correlation between eye color and hair color would be such a study. The aim would be to describe the relationship between the two variables for the subject (people). Two measures of different characteristics might reveal the strength of relationship between the two characteristics.
Second, correlations can be like two snapshots of the same people at different times. A test- retest study would try to describe how reliable a measure is. If you take the temperature of a group of people and re-take it 10 minutes later, you expect a reliable thermometer to provide consistent results. If the test-retest correlation is low, the chances are you have a faulty ther- mometer.
There are 5 sites to see on this day of our tour:
Correlations Scatterplots Causality
Reliability & Validity Types of correlations Significance
DAY 5: Correlation
INTRODUCTION
Scatter-plots
To use this simple and yet powerful method of description, we must collect two pieces of information on every person. These are paired observations. They can’t be separated. If we are measuring height and weight, it’s not fair to use one person’s height and another person’s weight. The data pairs must remain linked. That means that you can’t reorganize one variable (how highest to lowest, for example) without reorganizing the other variable. The pairs must stay together.
Once the data is collected, X is plotted against Y. Make two columns of numbers (X and Y), begin at the top and plot each pair of numbers. Go across the X value and up the Y value. Ifinthefirstofdata,Xis5andYis2,goacross5andup2andputadot.Onedotforeachpair of numbers results in a graph of scattered dots.
Scatterplots tend to have 1 of 3 patterns. First, the dots might look as if scattered by chance. In this case, dots are everywhere and no particular tend is obvious.
Second, the dots might look like they form a positive trend (from lower left to upper right). This pattern emerges when the low scores of X are paired with low scores on Y. Notice that when X is getting larger, Y usu-
– 97 –
20 15 10
5
Correlation
SCATTER PLOT
ally gets larger too.
Third, the low scores of X might tend to have partners that are
high on Y. This trend (from upper left to lower right) indicates a negative pattern. In a nega-
tive trend (also called an in-
verse relationship), when X
gets bigger, Y tends to get smaller.
0
0 5 10 15 20
Now that you’ve mastered one variable, let’s add another.
Everything up to now has been based on observing one de-
pendent variable (one criterion). All we have done is to ob-
serve; we haven’t manipulated, stapled or mutilated anything; just observed.
With correlations we are going to continue that practice, we’re only observing, but we’re going to look at two variable and see how they are related to each other. When one variable changes, we want to know what happens to the other variable. In a perfect correlation, the two variables with move together. When there is no correlation, the variables will act indepen- dently of each other.
Basically, a correlation is a mathematical representation of a scatterplot. Two observations are acquired on each subject and the relationship between those two variables is examined. You can compare a group to itself using a correlation. Here are 5 things to note:
2 parts
A correlation has both sign and magnitude. The sign (+ or -) tells you the direction of the relationship. If one variable is getting larger (2, 4, 5, 7, 9) and the other variable is headed in the same direction (2, 3, 6, 8, 11), the correlation’s sign is positive. In a negative correlation, while the first variable is getting larger (2, 4, 5, 7, 9), the second variable is getting smaller (11, 8, 6, 3, 2).
20 15 10
5
SCATTER PLOT
0
0 5 10 15 20
– 98 –
DAY 5: Correlation
The magnitude of a correlation is found in the size of the number. Correlation coefficients can’t be bigger than 1. If someone says they found a correlation of 2.48, they did something wrong in the calculation. Since the sign can be positive or negative, a correlation must be between -1 and +1.
The closer the coefficient is to 1 (either + or -), the stronger the relationship. Weak correla- tions (such as .13 or -.08) are close to zero. Strong correlations (such as .78 or -.89) are close to 1. Consequently, a coefficient of -.92 is a very strong correlation. And +.25 indicates a fairly weak positive correlation.
Magnitude is how close the coefficient is to 1; sign is whether the relationship is positive (headed the same same) or negative (inverse).
Causality
Correlations don’t prove causation. A strong correlation is a necessary indicator of causa- tion but it is not sufficient. When a cause-effect relationship exists, there will be a strong correlation between the variables. But a strong correlation does not mean that variable A causes variable B.
In correlations, A can cause B. Or, just as likely, B can cause A. Or, just as likely, something else (call it C) causes both A and D to occur. For a simple example, let’s assume that we know nothing about science. But we do notice that when the sun comes up, it gets warm outside. From a statistical point of view, we can’t tell which causes which. Perhaps the sun coming up makes it get warm. But it is as likely that when it gets warm the sun comes up. Or the sun and warmth are caused by something else: a dragon (pulling the sun behind it) flies across the sky blowing it’s hot breath on the earth (making it warm).
You might laugh at this illustration but think how shocked you’d be if tomorrow it got warm and the sun didn’t come up!
It is, of course, perfectly OK to infer causation from correlational data. But we must re- member that these inferences are not proofs; they are leaps of faith. Leaping is allowed but we must clearly indicate that it is an assumption, not a fact
reliability & validity
Although correlations can’t prove cause and effect, they are very useful for measuring reliability and validity. Reliability means that you get the same results every time you use a test. If you’re measuring the temperature of liquid and get a reading of 97-degrees, you would expect a reliable thermometer to yield the same result a few second later. If your thermometer gives different readings of the same source over a short period of time, it is unreliable and you would throw it away.
We expect many things in our lives to be reliable. When you flip on a light switch, you expect the light to come on. When you get on an elevator and push the “down” button, you don’t expect the elevator to go sideways. If you twice measure the length of a table, a reliable tape measure will yield the same result. Even if your measuring skill is poor, you expect the results to be close (not 36 inches and then 4 inches). You expect the same results every time. Reliability, then, is the correlation between two observations of the same event. Test reliability is determined by giving the test once and then given the same test to the same people 2 weeks later. With this test-retest method, you would expect a high positive correlation between the first time the test was given and the second time.
A test with a test-retest reliability of .90 (which many intelligence tests have) is highly reliable. A correlation of .45 shows a moderate amount of reliability, and a coefficient close to
DAY 5: Correlation
– 99 –
zero indicates the test is unreliable. Obviously, a negative test-retest reliability coefficient would indicate something was wrong. People who got high scores the first time should be getting high scores the second time, if the test is reliable.
There are 3 basic types of reliability correlations. A test-retest coefficient is obtained by giving and re-giving the test. A “split half” correlation is found by correlating the total score for the first half with the total score for the second half for each subject. A parallel forms correlation shows the reliability of two tests with similar items.
Correlations also can be used to measure validity. Although a reliable test is good, it is possible to be reliably (consistently) wrong. Validity is the correlation between a test and an external criterion. If you create a test of musical ability, you expect that musicians will score high on the test and those judged by expects to be unmusical to score low on the test. The correlation between the test score and the expert’s rating is a measure of validity.
Validity is whether a test measures what it says it measures; reliability is whether a test is consistent. Clearly, reliability is necessary not sufficient for a test to be valid.
Types of Correlation
2 Types of Relationship
Correlations can test both linear and monotonic relationships. In a monotonic relationship, the 2 variable move in the same direction but at different rates. Ordinal data is monotonic; both variables move from 1st place to 2nd place but in different sized steps. If one variable goes from 6 to 8 and the other variable moves from 6 to 17, the relationship is monotonic but not linear.
In a linear relationship, two variables move at the same rate. That is, a linear relationship forms a straight line, while a monotonic zigzags in one direction.
3 Kinds of Correlations
In general, correlations can be classified by the type of data they use. In a Pearson r, two continuous variables are used. For example, a correlation between age (in years) and height (in inches) would involve two continuous variables.
In contrast, a phi (pronounced “figh”, as in fee, phi, foe, fum) correlation uses two discrete variables. If age and height were measured as discrete variables (high, medium and low), the correlation would be a phi correlation.
A point-biserial correlation uses one continuous and one discrete variable. A correlation between age (in years) and whether you voted in the last election (yes-no) would be a point- biserial correlation.
Significance
It is possible to test a correlation coefficient for significance. A significant correlation means the relationship is not likely to be due to chance. It doesn’t mean that X causes Y. It doesn’t mean that Y causes X; or that another variable causes both X and Y. Although a correlation cannot prove which causes what, r can be tested to see if it is likely to be due to chance.
First, determine the degrees of freedom for the study. The degrees of freedom (df) for a correlation are N-2. If there are 7 people (pairs of scores), the df = 5. If there are 14 people, df = 12.
Second, enter the statistical table “Critical Values of the Pearson r” with the appropriate df. Let’s assume there were 10 people in the study (10 pairs of scores). That would mean the degrees of freedom for this study equals 8.
– 100 –
DAY 5: Correlation
Go down the df column to eight, and you’ll see that in order to be significant a Pearson r with this few of people the magnitude of the coefficient has to be .632 or larger.
Notice that the table ignores the sign of the correlation. A negative correlation of -.632 or larger (closer to -1) would also be significant.
Evaluate r-squared
A correlation can’t prove that A causes B; it could be that B causes A…or that C causes both A & B. The coefficient of determination is an indication of the amount of relationship between the two variables. It gives the percentage of variance that is accounted for by the relationship between the two variables.
To calculate the coefficient of determination, simply take the Pearson r and square it. So, .89 squared = .79. In this example, 79% of the variance can be explained by the relationship between the two variables. Using a Venn diagram, it is possible to see the relationship be- tween the two variables. It is the area of overlap:
To calculate the amount of variance that is NOT explained by the relationship (called the coefficient of non-determination), subtract r-squared from 1. In our example, 1-r2 = .21. That is, 21% of the variance is unaccounted for.
UNDERSTAND
Comparing a group to itself
Illustration 1: Using a correlation in a test-retest study is like using a mirror twice. Look once, look twice and compare the results. You can compare one image of the group to another image of the same group.
Illustration 2: Using a correlation in a test-retest study is like a family reunion. You see how much they look like they did last year.
Correlation
Illustration 1: A correlation is like a toothache: it’s a hint. Your tooth aches might be caused by a cavity. The symptom is a hint but not proof. Similarly, correlations hint at relationships but don’t reveal cause and effect.
Illustration 2: A correlation is like gossip. A couple having dinner together might indi- cate that he invited her, that she invited him, or that they both are guests of someone else.
Significance
Illustration 1: Significance is like clarity. Without clarity, you can’t be sure what you see. But when the widow is clean and clear, you are sure you are seeing something. You still can’t tell whether what you see is cause or consequence or chance. The view might be of truth or fiction but the window is clean.
DAY 5: Correlation
Coefficient of determination
Illustration 1: Measuring correlations is like measuring house size. Instead of square footage, correlations have squared area of relationship. The larger the coefficient of determination (r2), the more relationship between the two dependent variables. If r2 = 1.00, 100% of the variation of one variable can be explained by the variation of the other variable. When 100% of the house’s square footage is accounted for, there is nothing else to hunt. We completely understand how much influence one variable has on an- other.
Coefficient of non-determination
Illustration 1: Correlations also include measuring the part not explained by the rela- tionship between variables. The coefficient of non-determination (what we don’t under- stand) is our percentage of stupidity. When 1-r2 = 1.00, we are 100% stupid. Fortunately, the higher the coefficient of determination, the smaller the coefficient of non-determi- nation. When r2 is high, 1-r2 is low.
REMEMBER
Facts are the details of who, what, where and when. Here are the facts I’ve collected for corre- lations.
Basic Facts
There are 3 major types of correlations: Pearson (2 continuous variables), phi (2 discrete
variables) and point biserial (1 continuous and 1 discrete variable).
Formulas:
There are two methods of calculating Sum of Squares: deviation (which is done to
– 101 –
∑ XY − (∑ X )(∑Y) N=
SSxy SSxSSy
r=
Terms:
coefficient coefficient correlation correlation df
linear
magnitude monotonic
negative correlation parallel forms Pearson r
phi
point-biserial positive correlation r
r2
magnitude
or more simply it is seen as: r
(∑ X )2 (∑Y)2 (∑X2− N )(∑Y2− N )
of determination
of nondetermination
coefficient
monotonic
negative correlation parallel forms Pearson r
phi
point-biserial positive correlation r
r2
reliability scatterplot
sign
significance
split half
test-retest
validity
– 102 –
DO
Step-by-Step
1 Calculate SSx
Like all measures of dispersion, Sum of Squares (SS) gets larger when the distribution of scores is more heterogeneous (dissimilar). The more homogeneous (similar) the scores, the smaller the distribution. To calculate the SSx, we use this formula:
2. Calculate SSy
To calculate the SSy, we use the same procedure as for SSx: First, square each raw score and then add them up.
Second, add each raw score and then square the result.
Third, divide Step 2 by the number of scores in the distribution. Finally, subtract Step 3 from Step 1.
3. Make a new variable: XY
We create a new variable by multiplying every X by its Y partner. So this:
XY
11 10 79 32 46 55 88
becomes this:
X Y XY
11 10 110 7 9 63 326
4 6 24 5 5 25 8 8 64
4. Calculate SSxy
This is an unusual Sum of Squares. All the other SSs are measuring dispersion from a mean, so they can’t be smaller than 0 (everyone is at the mean). But since we’ve created this variable, the SSxy can be either positive or negative.
First, sum the XY’s. In our example, the Sum of XY = 292.
Second, multiple the Sum of X by the Sum of Y. That is, 38 times 40 = 1520.
Third, divide the result of Step 2 by N (the number of scores). So, 1520 divided by 6 = 253.33
Now that we’ve covered the facts and concepts of z-scores, it’s time to put what we know into practice. This section includes Step-by-Step instructions, practice problems, and simulations (word problems).
DAY 5: Correlation
DAY 5: Correlation
– 103 –
Fourth, subtract the result of Step 3 from the result of Step 1. And 292 minus 253.33 = 38.67. So the SSxy = 38.67.
5. Find r (correlation coefficient)
Here is the formula we use:
r=
First, multiple the SSx times SSy. In our example, that is 43.33 times 43.33, which equals 1877.49.
Second, take the square-root of Step 1. The result in this case, of course, is 43.33. Third, divide the SSxy by the result of Step 2. So, 38.67 divided by 43.33 =.89. The Pearson r = .89
6. Test significance of r
Although a correlation cannot prove which causes what, r can be tested to see if it is likely to be due to chance.
First, determine the degrees of freedom for the study. The degrees of freedom (df) for a correlation are N-2. If there are 9 people (pairs of scores), the df = 7. If there are 22 people, df = 20. Since there are 6 people in the current example, df = 4.
Third, enter the table at df = 4. Go down the column to 4 and across to the .05 alpha level (2 tailed). In our table (Critical Values of the Pearson r), only the .05 alpha values are listed (we’re trying to make this easy!).
Fourth, find the critical value the table lists. In this example, the critical value = .811.
Fifth, compare the r you calculated with the book. If your r is larger than the book, you win: the correlation is significant and the relationship is not likely to be due to chance. Since the coefficient you calculated (.89) is larger than the critical value (.8111), the correlation in our example is significant.
7. Evaluate r-squared
A correlation can’t prove that A causes B; it could be that B causes A…or that C causes both A & B.
The coefficient of determination is an indication of the amount of relationship between the two variables. It gives the percentage of variance that is accounted for by the rela- tionship between the two variables.
To calculate the coefficient of determination, simply take the Pearson r and square it. So, .89 squared = .79. In this example, 79% of the variance can be explained by the relationship between the two variables.
Using a Venn diagram, it is possible to see the relationship between the two variables. It is the area of overlap:
DAY 5: Correlation
To calculate the amount of variance that is NOT explained by the relationship (called the coefficient of non-determination), subtract r-squared from 1. In our example, 1-r2 = .21. That is, 21% of the variance is unaccounted for.
An example
We create a new variable by multiplying every X by its Y partner. So this:
XY
2 17 13 3 10 4
3 18
2 19 12 11
becomes this:
X Y XY
2 17 34 13 3 39 10 4 40
3 18 54
2 19 38 12 11 132
Then, we sum each column. The sum of X = 42, the sum of Y = 72, and the sum of XY is 337.
Calculate the SS for X (136) and the SS of Y (256).
And calculate the SS of XY. Multiple the sum of X by the sum of Y (42 * 72 = 3024). Now divide the result by N (the number of pairs of scores = 6); 3024/6 = 504. Subtract the result from the Sum of XYs (337-504 = -167.
Notice the SSxy is negative. It’s OK. The SSxy can be negative. It is the only Sum of Squares that can be negative. The SSx or the SSy are measures of dispersion from the variable’s mean. But we created the XY variable; it’s not a real variable when it comes to dispersion. The sign of SSxy indicates the direction of the relationship between X and Y.
So we have a negative SSxy because X and Y have an inverse relationship. Look at the original data: when X is small (2), Y is large (17). When X is large (13), Y is small (3). It is a consistent but inverse relationship. It’s like pushing the yoke down and the plane going up.
Let’s finish off the calculation of the Pearson r. Multiple the SSx by the SSy (136 * 256 = 34816). Take the squareroot of that number (sqrt if 34816 = 186.59). Divide the SSxy (-167/186.59 = -.895). Rounding to 2 decimal places, the Pearson r for this data set equals -.90. It is a strong, negative correlation.
– 104 –
DAY 5: Correlation
Practice Problems Item 1
What is the correlation between these two variables 11 11
– 105 –
SSx = SSy = SSxy = r =
r2 =
Item 2
_____
_____
_____
_____ _____
53 78 67 23
14 13
What is the correlation between these two variables:
Laughter Statistics
11 19 36 56
15 12 78 67 9 14
SSx _____
SSy _____
SSxy _____ r= _____
– 106 –
Item 3
How much variance is accounted for between these two variables:
DAY 5: Correlation
Strength Speed
10 3 97 69 3 11 1 15 76
SSx = _____ SSy = _____ SSxy = _____ r = _____
r2 = _____
Item 4
What is the correlation between these two variables:
r = _____ r2 = _____
Strength Peace 22 65 78
5 19 9 12 98 45
DAY 5: Correlation
Item 5
How much variance is NOT accounted for between these two variables:
Strength Anxiety
23 97 34 11 7 12
– 107 –
r =
r2 =
1-r2 = _____
Item 6
_____ _____
Is there a significant relationship between the number of petals on a flower and how much the elephant plant weighs:
r = _____
r2 = _____
Item 7
Petals Weight
18 1 15 7 8 11 56 4 19 2 22
What percent of variance do love and peace have in common:
Love Peace
12 1 22 3 80 18 1 2 11 94 25 7 23
r = _____ r2 = _____
– 108 –
Simulation 1
As a political pollster, you wonder if there is a relationship between what people paid for their car and their annual salary (in $10,000). You measure everyone in the country (it’s a very small country) and here is the resulting data:
Salary Car Cost 12 25
14 17 11 8 76 34
What is the sum of Salary: What is the SS of Salary: The variance of Salary is: What is the mean of Car: What is the SS of Car Cost:
_______
_______
_______
_______
_______
Since you are interested in commonality, which of the following tests should you perform: a.t-test
b.ANOVA c.correlation d.regression e.multiple regression
Perform the comparison you selected in the item above. Select only the appropriate one(s). What was the result of your calculation?
a= b= r= t= F=
Calculate the coefficient of determination for this data?
DAY 5: Correlation
DAY 5: Correlation
Simulation 2
– 109 –
As a researcher, you are interested in the relationship between depression and sugar. You have measured several patients at the local hospital on each variable, and now hope to find how related these two variables are.
Depression Blood Sugar
73
83 11 2 68 77 3 12 1 14
What is the sum of Depression:_______ WhatistheSSofDepression: _______
The stdev of Depression is: What is the median of Sugar: What is the SS of Sugar:
_______
_______
_______
Since you are interested in commonality, which of the following tests should you perform: a.t-test
b.ANOVA c.correlation d.regression e.multiple regression
Perform the comparison you selected in the item above. Select only the appropriate one(s). What was the result of your calculation?
a= b= r= t= F=
How much joint variance is accounted for?
– 110 –
Simulation 3
As a governor, you are interested in how similar are the patterns of serving in the state legisla- ture and being rich. You have measured each representative on each, and now hope to find how related these two variables are. The numbers below represent the number of years in the legislature and net worth (in billions) of each member.
Years Net Worth
52 10 14 85 10 16 5 13 38 15
What is the SS of Years:
What is the variance of Years:
What is the SS of Net Worth:
What is the stdev of Net Worth:_______ What is the SSxy: _______
Since you are interested in commonality, which of the following tests should you perform: a.t-test
b.ANOVA c.correlation d.regression e.multiple regression
Perform the comparison you selected in the item above. Select only the appropriate one(s). What was the result of your calculation?
a= b= r= t= F=
What are the degrees of freedom (df) for this study?
What is the critical value for this statistic?
Is there a significant relationship between Years & Net Worth at the .05 alpha level?
_______
_______
_______
DAY 5: Correlation
DAY 5: Correlation
SUMMARY
To finish off our discussion of measurement, there is a review, quiz, progress check, and chapter answers.
Review
To measure the strength of relationship between two variables, it would be best to use a corre- lation
A correlation can only be between -1 and +1.
The closer the correlation coefficient is to 1 (either + or -), the stronger the relationship. The sign indicates the direction of relationship.
The coefficient of determination is calculated by squaring r. The coefficient of determination shows how much area the two variables share; the percentage of variance explained (accounted for).
The coefficient of nondetermination is calculated by subtracting the coefficient of determina- tion from 1. The coefficient of nondetermination shows how much the two variables don’t share; the percentage of unexplained variance.
To calculate the correlation between two continuous variables, the Person product-moment coefficient is used. To calculate the correlation between two discrete variables, the phi coeffi- cient is used. To calculate the correlation between one discrete and one continuous variable, the point biserial coefficient is used.
Correlations are primarily a measure of consistency, reliability, and repeatability.
Correlations are based on two paired-observations of the same subjects.
A cause-effect relationships has a strong correlation but a strong correlation doesn’t guarantee a cause-effect relationship. In a correlation, A can cause B or B can cause A or both A and B can be caused by another variable. Inferences of cause-effect based on correlations are danger- ous. A correlation shows that a relationship is not likely to be due to chance but it cannot indicate which variable was cause and which effect.
Test-retest coefficients are correlations.
In order to make good predictions between two variables, a strong correlation is necessary.
– 111 –
– 112 –
1. Which of the following gives the correlation be- tween two discrete variables:
a. phi
b. Pearson r
c. least squares criterion
d. point biserial
e. confidence level
2. Which of the following correlation coefficients shows the greatest amount of relationship:
a. .23
b. .45
c. .56
d. .71
e. -.89
3.A correlation between two variables:
a. proves A causes B
b. proves B causes A
c. proves C causes A
d. proves C causes B
e. none of the above
4. Which of the following best describes this cor- relation:
Progress Check 5
5. Which indicates the percentage of explained vari- ance::
a. correlation coefficient
b. normalized coefficient
c. standardized coefficient
d. coefficient of determination
e. coefficient of nondetermination
6. Which of the following is the correlation be- tween two continuous variables:
10
8
6
Y
TEST RELIABILITY
7.
8.
9.
a. phi
b. Pearson r
c. point-biserial
d. 1-Way ANOVA
e. normalized distribution
Correlations are used as measures of:
a. reliability
b. discreteness
c. continuation
d. dispersion
e. central tendency
Correlations assume that:
a. subjects are randomly assigned
b. confounds are obvious
c. 2 independent variables are equal
d. the null hypothesis is false
e. none of the above
Correlations have both sign and:
a. power
b. magnitude
c. independence
d. 1 independent variable
e. 2 independent variables
4 2 0
0 5 10 15 20
X
a. strong, positive
b. strong, negative
c. weak, positive
d. weak, negative
e. no correlation
10. Correlations are numerical expressions of:
a. confounds
b. thoughts
c. dispersion
d. scatter plots
e. central tendency
Progress Check 5
Progress Check
1. List five measures of dispersion: a.
b. c. d. e.
2. List three types of correlation and the kind of variables with which they are used: a.
b.
c.
3. Calculate the following characteristics of people’s closets:
Hats Gloves
15 13 10 5 75 75 22 43
The sum of Hats: _______
SS of Hats:_______
Mean of Gloves: _______ Population variance of Gloves:_______ SSxy: _______
r=
df =
The critical value for r =
Is there a significant relationship between Hats & Gloves at the .05 alpha level?
– 113 –
– 114 –
Answers
Practice Problems Item 1
SSx = 93.50 SSy = 83.50 SSxy = 84.40 r= .956
r2 = .915
Item 2
SSx = 98 SSy = 145.43 SSxy = 85 r= .712
r2 = .507
Item 3
SSx = 60
SSy = 87.50
SSxy = – 69
r = – .95
r2 = .907 (round to .91)
Item 4
SSx = 40
SSy = 189.71
SSxy = 31
r = .356 (round to .36) r2 = .127 (round to .13)
Item 5
SSx = 98 SSy = 145.43 SSxy = 85 r= .712
r2 = .507
1-r2 = .507
Item 6
SSx = 207.33 SSy = 326 SSxy = – 211 r = – .81
r2 = .66 1-r2 = .34
Progress Check 5
Item 7
SSx = 354 SSy = 414 SSxy = – 155 r = – .40
r2 = .16 1-r2 = .84
Simulation Sim 1
sum of X = 38
SSx = 139.33
variance of x = 23.22
mean of Y = 7
SSy = 140
correlation
r = .90
r2 = .81 (coefficient of determination)
Sim 2
sum of X = 43 SSx = 64.86 stdev of x = 3.29 median of Y = 7 SSy = – 87 correlation
r = – .94
r2 = .88 (coefficient of determination; shared variance accounted for)
Sim 3
SSx = 72
population variance of X = 10.29 SSy = 172
population stdev of Y = 4.96 SSxy = 66
correlation
r = .59
df = 7-2 = 5
critical value = .755
not significant at .05 alpha level
Multiple Choice
a, e, e, c, d, b, a, e, b, d
Progress Check
1. List five measures of dispersion: a. Range
b. Mean Absolute Variance c. Sum of Squares
d. Variance
e. Standard Deviation
2. List three types of correlation and the kind of variables with which they are used:: a. Pearson r 2 continuous variables
b. Phi 2 discrete variables
c. Point biserial 1 continuous and 1 discrete variable
3. Calculate the following characteristics of people’s closets: The sum of Hats = 36
SS of Hats = 38
Mean of Gloves = 5.50
The population variance of Gloves = 12.58 SSy: = 75.50
SSxy: = 83.50
r = .30
df = 4 (6 people minus 2 equals 4)
The critical value for r = .81
Is there a significant relationship between Hats & Gloves at the .05 alpha level? No.
– 115 –
– 116 –
Day 6: Regression Predicting the future & past
BRIEFLY
When there is a strong correlation between two variables, you can make accurate predic- tions from one to the other. If sales and time are highly correlated, you can predict what sales will be in the future…or in the past. You can enhance the sharpness of an image by predict what greater detail would look like (filling in the spaces between the dots with predicted values).
Regressions allow us to compare data to a straight line and to make predictions about the future and the past. Here are 4 things to consider:
Regression
Components
Why regress?
How accurate are our predictions?
DAY 6: Regression
INTRODUCTION
Linear Regression
An extension of the correlation, a regression allows you to see how much the data you collect looks like a straight line. Obviously, it your data is cyclical a straight line won’t repre- sent it very well. But if there is a positive or negative trend, a straight line is a good model.
If the data approximates a straight line, you can then use that information to predict what will happen in the future. Predicting the future assumes, of course, that conditions remain the same. The stock market is hard to predict because it gets changing, up and down, slowly up, quickly down. It’s too erratic to predict its future.
If you roll a bowling ball down a lane and measure the angle it is traveling, you can predict where the ball will hit when it reaches the pins. The size, temperature and shape of the bowl- ing lane are assumed to remain constant for the entire trip, so a linear model would work well with this data. If you use the same ball on a grass lane which has dips and bulges, the condi- tions are not constant enough to accurately predict its path.
Predicting the future also assumes that the relationship between the two variables is strong. A weak correlation will produce a poor line of prediction. Only strong (positive or negative) correlations will produce accurate predictions.
Linear regression can also predict the past. Carbon dating of a relic assumes that carbon has always burned at the same rate. If we discovered that for some odd reason, carbon burned very quickly 3000 years ago (or very slowly), our predictions of the age of our relic would be substantially off.
Components
A regression is composed of three primary characteristics. Any two of these three can be used to draw a regression line:
First, the regression line always goes through the point where the mean of X and the mean of Y meet. This is reasonable since the best prediction of a variable (knowing nothing else about it) is its mean. Since the mean is a good measure of central tendency (where everyone is hanging out), it is a good measure to use.
If you’re asked to guess someone’s weight or height (without seeing them), the best guess is the mean of each of those variables. Even if you’re wrong, you will typically be less wrong at the mean than any other guess because most of the values in a distribution are at or near the mean.
Second, a regression line has slope. For every change in X, slope will indicate the change in Y. If the correlation between X and Y is perfect, slope will be 1; every time X gets larger by 1, Y will get larger by 1. Slope indicates the rate of change in Y, given a change of 1 in X.
Third, a regression line has a Y intercept: the place where the regression line crosses the Y axis.
Why regression?
We typically use a least-squares criterion to decide where the line should be drawn. That is, the line goes through the data in such as way as to minimize the dispersion from that line.
Regression means to go back to something. We can regress to our childhood; regress out of a building (leave the way we came in). Or regress back to the line of prediction. Instead of looking at the underlying data points, we use the line we’ve created to make predictions. Instead of relying on real data, we regress to our prediction line.
– 117 –
– 118 –
DAY 6: Regression
In a regression, we have moved from the reality of data to a hypothetical straight line. That line of prediction is a representation of what reality might be like. It is not so much that we apply the model to the data; more like we collect the data and ask if it looks this model (linear), that model (circular or cyclic) or that model (chance).
Accurate predictions
There are two major determinants of a prediction’s accuracy: (a) the amount of variance the predictor shares with the criterion and (b) the amount of dispersion in the criterion.
Taking them in order, if the correlation between the two variables is not strong, it is very difficult to predict from one to the other. In a strong positive correlation, you know that when X is low Y is low. Know where one variable is makes it easy to the general location of the other variable.
A good measure of predictability, therefore, is the coefficient of determination (calculated by squaring r). R-squared (r2) indicates how much the two variables have in common. If r2 is close to 1, there is a lot of overlap between the variables and it becomes quite easy to predict one from the other.
Even when the correlation is perfect, however, predictions are limited by the amount of dispersion in the criterion. Think of it this way: if everyone has the same score (or nearly so), it is easy to predict that score, particularly if the variable is correlated with another variable. But if everyone has a different score (lots of dispersion from the mean), guessing the correct value is difficult.
The standard error of estimate (see) takes both of these factors into consideration and produces a standard deviation of error around the prediction line. A prediction is presented as plus or minus its see.
The true score of a prediction will be within 1 standard error of estimate of the regression line 68% of the time. If the predicted score is 15 (just to pick a number), we’re 68% sure that the real score is 15 plus or minus 3 (or whatever the see is).
Similarly, we’re 96% sure that the real score falls within two standard deviations of the regression line (15 plus or minus 6). And we’re 99.9% sure that the real score fall within 3 see of the prediction (15 plus or minus 9).
Regression
Strong relationships allow us to predict behavior. We may not understand causation but with a strong correlation we can make predictions. We can predict into the future, into the past or fill in the gaps between time periods.
Regression tries to draw a single line through a scatterplot and use that line to make predictions. Obvi- ously, the more a scatterplot approximates a straight line, the easier it is to draw a line through it. Consequently, regression is only possible with strong correlations, ei-
ther positive or negative. Correlations of medium to low magnitudes do not lend themselves to regression.
In regression, there is a directionality to the pre- diction. X may predict Y, or the other way around, but the relationship is not bidirectional. It makes a differ-
ence which variable is the predictor and which the criterion. Perspective changes the predic- tions.
DAY 6: Regression
– 119 –
The choice of which variable should be the predictor is based on theory. It is our theoretical questions that determine whether X or Y is the predictor. We use sunset to predict that it will be dark at night, and not the other way around, because of our theory of how the sun functions. If one day the sun goes down and it doesn’t get dark, we’ll have to dramatically readjust our theory.
To draw a line through a scatterplot of data, we must know where the line begins or a point through which is will go. We also need to know the angle of the line. These two things (a starting point and slope) are essential to drawing a line. The formula for a straight line puts it this way:
Y’ = a + bX
In this formula, “a” is the intercept (the point where the regression line crosses the Y axis), b is the slope (rate of increase) of the line and X is any raw score. Given these these elements, individual points (called Y-primes) can be predicted.
Calculating a regression line begins by finding the slope of the line. Here’s the formula for the slope of a line:
b = SSxy SSx
Notice that it’s a lot like the formula for correlation. The SSxy is on the top and the SSx is on the bottom. What’s missing is the SSy, which isn’t needed because slope is a measure of com- monality to changes in the predictor variable. Let’s use this data:
XY
23 45 66 86
10 8 12 10
The SSx for this data is 70 and the SSxy is 34. To find b, divide 34 by 70. And 34/70 = .486, so b = .486. In other words, the line of regression moves in a positive direction (in this example b is positive and not negative). And to plot the line, we would move over 1 and up .49 units all along the line. All we need is a starting point.
A common point of interest is where the regression line intercepts the y axis. This inter- cept is a good visual starting point for drawing a line and can be found if we know the slope of the line and the means of each variable. Here is the formula for intercept:
a = Y − bX
Notice that the formula includes slope (which we just calculated) and the mean of each vari- able. The reason the two means are in this formula is that regression lines always go through the point where the two means meet. This makes sense when you realize that the best predictor of a variable– if you knew nothing else about it–is the mean. Since the mean is our best measure of central tendency and most scores are in the center of a distribution, the mean has the least amount of prediction error.
We calculated the slope to be .486. The mean of Y is 6 and the mean of X is 7. So intercept is 6 – (.486 times 7). Working it out, .486 times 7 is 3.40. Then 6 – 3.40 equals 2.60. So the intercept of the Y axis (called “a”) equals 2.60.
– 120 –
DAY 6: Regression
To plot the line, we would go up the Y axis to 2.60 and use that for our standing point. Then we could plot the coordinates of the line by going to the right 1 and up .49, over 1 and up .49, etc.
We don’t have to plot the regression line in order to use it. Having calculated what the characteristics the line would have, we can use that information to predict values of Y from values of X.
Using the formula for a straight line, we simply plug in a X value with the slope and intercept we’ve calculated. The X values we plug in need not be known X values. We can extend the line beyond the current data and estimate what Y would be at that point. This extrapolation (estimating beyond known the current data set) is very useful for projecting into the future. Budget and sales predictions that are based on previous data are extrapolated into the future.
It is also possible to extrapolate into the past. Carbon dated of archeological artifacts is done by extrapolating the decay rate of carbon into the past. Predicting inter-points of data is called interpolation. It is possible to estimate midpoints in annual data by interpolation.
Regression assumes that the data is consistent. Future performance is based on past perfor- mance, assuming that conditions do not change. Extrapolating sales for next year based on previous years can’t take into account a sudden drop in the market or being bought out by a competitor. Interpolating the midyear score on a math achievement test assumes that the learning is linear and doesn’t happen in spurts.
Clearly, regression works best when conditions are constant and less well in real life. Real- ity has too many bumps to make precise predictions about health, wealth and happiness. But if you’re planning a trip into space and want to know where your ship will be in 3 years (assum- ing it isn’t hit by an asteroid), regression is the right tool.
Regression also assumes that errors of prediction are (a) consistent along the entire regres- sion line and (b) true values are normally distributed at any given point along that line. A regression assumes that there are an equal amount of errors along the line; error isn’t small a one point and larger at another. When a poll says the President’s popularity is 38% plus or minus 3, it assumes that those 3 points of error are consistent; the error isn’t 3 points at 38 but 5 points at 50%, it’s always 3 points. Similarly, a regression assumes that predicting congres- sional budgets isn’t more prone to error at 8 months than it is at 6 months. Error is assumed to be consistent along the line of regression.
At any point along the regression line, prediction forms a normal distribution. The true value of the President’s popularity is not known, it is estimated to by 38% but there is some error in that estimate.
However, it is assumed that the true value is likely to be close to the regression line and less and less likely the father away one gets from that line. Prediction values are assumed to follow a normal distribution, becoming less and less accurate. The regression line is assumed to be the mean of the distribution and that a standard deviation of prediction can be found.
Consequently, predictions along the regression line are made with a standard deviation in mind. The President is at 38% plus or minus 3; sales will be 4 million next year plus or minus 2; you’ve have to pay $100 more in income tax next year plus or minus 4. The prediction is stated and the standard deviation follows it.
This standard deviation, called the standard error of estimate, should be interpreted in the same way other standard deviations are used. As a standard deviation, 68% of the scores fall within one standard deviation of the mean, 96% within two standard deviations of the
DAY 6: Regression
– 121 –
mean, and virtually all of the scores are within three standard deviations of the mean. With regressions, we’re saying that the true score (the real thing when it happens) is likely to fall within one standard deviation of the predicted value.
In other words, we’re 68% sure that sales with be between 2 and 6 million next year (4 million plus and minus one standard deviation. We’re 96% sure that sales will be between 0 and 8 million (within 2 standard deviations). But we’re 99.99% sure that sales next year will be between losing 2 million and making 10 million! Of course, this assuming that nothing changes in the meantime.
As you can see, our margins of error can make our predictions sound silly. We want life to be easy to predict but often it’s not. We can make accurate predictions when there is very little error along the regression line. And our predictions become quite vague when the standard error of estimate is large.
There are two things that impact our estimates the most. Both can be seen in the formula for the standard error of estimate:
Prediction errors go up (a) when the standard deviation of Y gets larger and/or (b) when the correlation between the two variables gets closer to zero. First, error increases when the Y’s standard deviation increases. That is, it is easier to predict Y when its scores are more homo- geneous and it is harder to predict Y when its score are more dispersed. It makes sense that if everyone has the same score (or close to it), predicting Y is easy. The challenge is to predict Y when its values are dispersed, and the standard error of estimate reflects this difficulty.
Second, the less correlation between the two variables, the harder it is to make accurate predictions. As r gets smaller, the coefficient of nondetermination gets larger. The less vari- ance that can be attributed to the relationship between the two variables, the harder it is to predict values.
Analysis of Regression
Linear regression can be used to test the significance of a relationship. An analysis of the regression variance can indicate whether the impact of X on Y is likely to be due to chance. The procedure is called an F test and compares the variances of the two variables as follows:
First, the total SS of the regression are partitioned into component parts. The Sums of Squares are divided into the portion that is accounted for by the regression and the amount not yet explained (error). The total SS equals the SSy, assuming X is predicting Y.
Working through a problems might help. Consider these numbers:
XY
24 57 39 68
11 10
12 10
In this example, the SSx is 85.5, the SSy is 26 and the SSxy is 36. The correlation between the two variables equals .76.
– 122 –
DAY 6: Regression
In an Analysis of Regression, the SStotal equals the SSy; in this example it equals 26. To partition this into the potion explained by the regression, multiple SStotal by r-squared. In this example, it is 26 times .58, which equals 15.20
The SSerror is the the SStotal times the coefficient of nondetermination (1-r2); in this case that would be .42 times 26 = 10.80. Of course you also could subtract the SSregression from the SStotal. Either way will work.
So far the summary table would look like this:
SS regression SSerror SStotal
Sum of Squares df mean squares
15.20 10.80 26
We’ve partitioned the Sum of Squares into the portion explained by the regression (15.20) and the portion that is due to error (10.80). In this context, anything that isn’t explained by the regression line is considered error.
Second, we select the appropriate degrees of freedom. Degrees of freedom are the number of items that are free to varying, assuming a fixed mean. Usually, we start with a few scores (2, 3,7) and try to find the mean (4); we begin with the numbers being fixed and the mean free to be calculated. But degrees of freedom requires backward thinking. We start with the mean and try to figure out the numbers used. If we begin knowing that the mean is 4, it seems like the three numbers in our data set could be any three numbers. But not quite.
Let’s assume the first number is free to vary; it can be as large or small as you wish. And let’s assume that the second number is free to vary; again it is entirely your choice. But having selected these two numbers, there is only one number that can be used in combination with the other two numbers to produce a mean of 4.
If the first number was 6 and the second number was 2, the third number has to be 4 in order for the mean of the data set to be 4. If the first number was 100 and the second number was 2, the third number has to be some big negative number. It might take me a bit of time to figure out that it’s -90, but the third number has got to be that number. It is not free to vary.
Degrees of freedom is like a parlor trick that starts with a known mean and allows all the numbers to vary except one (or sometimes a few). Each Sum of Squares in our summary table has its own degrees of freedom. The degrees of freedom for SStotal is N-1. Since there are six scores, the total amount of degrees of freedom is 5.
The df (degrees of freedom) for the SSregression is the number of variables minus 1. We’ll call it k-1 (kolumns?, groups with a k?). There are two variables in the regression, so SSregression df is 2-1 = 1.
The df for SSerror is the number of people (N) minus the number of groups (k); so N-K is 6-2 = 4. Notice that the degrees of freedom, like the Sum of Squares, add up. You can check your work by added the regression portion and the error portion; they should add up to their respective totals.
Our summary table now looks like this:
Sum of Squares df
15.20 1 10.80 4 26 5
mean squares
SS regression SSerror SStotal
DAY 6: Regression
– 123 –
Mean squares is another name for variance. As you will recall, variance for a single vari- able is SS divided by N (it’s degrees of freedom). So each of the SS is divided by its degrees of freedom. The results are shown here:
SS regression SSerror SStotal
Sum of Squares df
15.20 1 10.80 4 26 5
mean squares
15.20 2.71 5.20
Of course, the mean squares won’t add up like the other columns because we divided by different amounts But the resulting variance terms are appropriate for their respective por- tions.
Third, the mean squares are compared by dividing the mean squares of regression by the mean squares of error. The result is called F. In this example F = 5.59. In order for the value we calculated to be deemed significant, if must be larger than a standard value for that size of data set.
Fourth, we compare the F we calculated to the F table at the back of nearly any statistics book. To find the right value, we select the first column (the same value as the df for SSregression). And to find the correct row, we go down to the row labeled 4 (the same value as the df for SSerror). In this case the book value is 7.71. Our F value was 5.59 so we lose.
Well, lose isn’t really the right word but that’s always how it seems to me. I think of it as trying to beat the book value and if our F is larger than the book’s, we win. If our F is small than the book’s, we lose.
The proper explanation is that F indicates the likelihood that what we see is not due to chance. If our F is smaller than the book, what we see is likely to be due to chance. If our F is larger than the book, the relationship between variables is likely to be due to something other than chance.
The F test doesn’t tell us what causes what, only whether it is a likely occurrence or not. In thisexample,thereisnosignificantimpactofXonY.Anyapparentcausalrelationshipcanbe explained by chance.
The F test, and other tests of significance, start with the assumption that the relationships we find are due to chance. This presumption is maintained until we are highly confident that it is not true. We want to be very careful that we don’t see relationships that don’t exist (the statistical equivalent to being psychotic).
We’re not as upset about not seeing relationships that do exist (being blind) because we figure that if we replicate the findings enough we will eventually discover its true nature. We may not be fond of being blind (called Type II error) but we hate the idea of being psychotic (called Type I error).
How much error is acceptable?
In social science we typically are willing to be wrong 5% of the time on F tests.
This level of error would be total unacceptable in other areas. Cars, trains, planes and elevators would be considered totally unsafe if 5% of them crashed every day. It might seem odd that we’d complain if 1 out of 20 donuts was rotten, but we’re fine when 1 out 20 experi- ments finds results that aren’t true. But we expect more from our donuts than we do from people.
Our measures in social science are so vague that 5% error seems about right. Manufactur- ing tolerances can be measured to .0001 but measuring people on interval scales is still merely
– 124 – DAY 6: Regression
getting a general idea of what they are like. Rating scales and spelling tests will never be as accurate as measures of inanimate objects.
Fortunately, we don’t allow chance to be on our side and there are only two types of deci- sion errors we can make. We set up our studies so that chance is our primary explanation. Unless there is substantial evidence to the contrary, we assume that the relationships between variables are due to chance.
We phrase our hypotheses in terms of null findings. We assume there is no relationship between variables, no difference between groups, and no significant impact of one variable on another. Unless we have evidence to the contrary, we accept the null hypothesis. We need a significant amount of evidence for us to reject the null hypothesis and to say that there is a significant relationship between variables.
This approach is equivalent to the presumed innocence of a criminal suspect. People are assumed to be innocent until proved guilty. And when they are not proved guilty, they are not innocent, they are not guilty. Similarly, the lack of a significant finding doesn’t mean that a causal relationship doesn’t exist, only that we didn’t find it. This is particularly true of the F test, which is a 1-tailed test of significance. We are testing whether the F ratio of explained to unexplained variance is significantly different from what we would expect by chance. If F is not significant, we can make no negative causal statement (i.e., that X does not cause Y). If F is not significant, the decision is that the relationship is likely to be due to chance, not that it is due to chance.
When we inappropriately reject the null (find people guilty who aren’t), we are making a Type I decision error. Notice that it isn’t the data that is incorrect, it is the decision that is in error. The responsibility for the decision lies with the researcher, not the data.
If we accept the null when we shouldn’t (find people innocent who aren’t), we are making a Type II decision error. Just as in the judicial system, we dislike making Type I error but we believe that it is better to error on the side of caution.
In a F test, we set the alpha level (the amount of errors we are willing to accept) to .05. That is, we are willing to be wrong 5 times out of 100. If we select .10 or .15 or .20 alpha levels, we would be choosing to allow more Type I error. If we select a .01 alpha level, we would have less Type I error. Most researcher choose an alpha level of .05; it is the standard level of acceptable error.
Type II error has less to do with where we set the alpha level than with the quality of the test itself. Usually we make Type II errors because our instruments are refined enough to detect significant changes in variables. We don’t find significant results because our tests use ordinal information for more than we would like to admit.
UNDERSTAND
Concepts are rules you carry in your head. They are easy to remember and apply to many situations. There are 3 concepts related to regression.
1. Line of regression
Illustration 1: Regression is a border between countries. It is an imaginary line we use to make decisions.
Illustration 2: Regression is a line on a map. You can draw a straight line from Los Angeles to New York but you have to fly in a curve (the earth is round remember; no fair tunneling thru the middle of it). The line is useful
DAY 6: Regression
– 125 –
for making decisions but is an approximation of reality, not reality itself.
Illustration 3: Regression is like the imaginary line we use to describe where a wall has been built. It is not the wall itself but is a representation of the wall.
2. Consistent errors
Illustration 1: The errors around the regression line are not thick at one end and thin at the other. They are consistent all along the regression.
Illustration 2: The regression line is like a boat with outriggers on each side. There is a center evenly balanced each
side.
3. Three standard deviations of error
Illustration 1: The errors around the regression line are line a rod through a tunnel of normal curves. Each point on the line has a nor- mal distribution of error surrounding it.
The predicted value (dot on the line) is
our best guess. We are 68% sure that
the real score will be at the dot, plus or minus one standard error of estimate. We are 96% sure that the
real score will be within two standard errors of estimate. And we’re 99.7% sure that the real score will be within 3 standard errors of estimate.
Illustration 2: The regression line is like a series of guesses surrounded by confidence bands. We’re fairly confident that the real score is with one standard error of estimate (SEE) of our predicted score. We’re quite sure it will be within 2 SEE and we’re really-really sure that it will be within 3 SEE (plus and minus of course).
– 126 – DAY 6: Regression REMEMBER
Facts are the details of who, what, where and when. Here are the facts I’ve collected for regres- sions.
Basic Facts
There are 6 things associated with a linear regression: intercept, slope, interpolate, extrapo- late, least square criterion, and standard error of estimate.
Formulas:
Y′ = a + bX
Terms:
a
b
egression line:
extrapolation
intercept
interpolation
regression
slope
standard error of estimate (see) straight line
time series
X predicting Y
Y predicting X
DO
Step-by-Step
1 Calculate SSx
2. Calculate SSy
3. Calculate SSxy
3. Find the slope (b)
In order to make a prediction based on a regression line, the slope of the line must be calculated. To find the slope, divide the SSxy by the SS of the predictor.
Assuming that X is predicting Y, it is the SSxy divided by the SSx. In our example, it is 38.67 divided by 43.33. Consequently, the slope = .89.
b = SSxy SSx
Now that we’ve covered the facts and concepts of z-scores, it’s time to put what we know into practice. This section includes Step-by-Step instructions, practice problems, simulations (word problems), quiz and a progress check.
DAY 6: Regression
– 127 –
6. Find the Y intercept (a)
In addition to slope, it is necessary to calculate the Y intercept (the value at which the regression line crosses the Y axis).
First, multiple the mean of X by the slope of the regression line. In our example, X’s mean = 6.33 and b (the slope) = .89. The result is 5.63.
Second, subtract the result of Step 1 from the mean of Y. In our example, Y’s mean is 6.67. So, the Y intercept (a) is 6.67 minus 5.63 = 1.04.
7. Make a prediction (Y’)
The predicted value (called y-prime) is the result of this formula:
Having already calculated a and b, it is possible to generate a predicted value for any given value of X.
For example, if X = 14, the predicted value (based on the regression line) would be 1.04 + (.89 times 14). That is, Y’ =13.51.
8. Find standard error of estimate
Having made a prediction, it is possible to estimate how accurate that prediction is. To calculate the standard deviation around the prediction (called the standard error of esti- mate).
First, calculate the Pearson r. Having already done this previously in our example, we know r = .89.
Second, calculate the standard deviation of Y. In our example, the SSy = 43.33 and N = 6, so the variance of Y = 7.22. Since a standard deviation is the square-root of variance, the standard deviation of Y = 2.69.
Third, plug the numbers into the formula for sest or see (the standard error of estimate):
The result is 1.49. Our estimate of 13.51 is plus or minus 1.49. That is, 69% of the time, the true value will be between 12.02 and 15. And 96% of the time, the true value will be between 10.53 and 16.49. And 99.9% of the time, the true value will be between 9.04 and 17.98.
– 128 –
example
Let’s try this example:
XY
25 47 66 98 12 14 15 11 15 12
First, calculate the mean of X and the mean of Y. Each mean = 9.
Second, calculate the slope of the line (called b). To find the slope, divide the SSxy by
the SS of the predictor. That is:
The SSx is 164; the SSy is 68; and the SSxy 92. So the slope (b) = .56.
Third, calculate the Y intercept (called a). The formula is: So a = 9 – (.56 * 9) = 3.95.
Fourth, make a prediction. Use the formula for a straight line:
Let’s assume that the X value is 8, we would predict that Y (which will call Y-prime so
we know it’s a prediction) equals 8.44.
Fifth, estimate the accuracy of the prediction. Don’t worry, there’s a formula for that too. Here it is:
The standard deviation of Y (sy) is 3.12 and r = .87, so the standard error of estimate is 1.81. This means that we’re 68% sure the real score will be 8.44 plus or minus 1.81 (that is between 6.63 and 10.25).
DAY 6: Regression
DAY 6: Regression
– 129 –
Practice Problems Item 1
Petals Weight
81 56 27 18 92
Should weight be X or Y?
a = _____
b = _____
If there are 6 petals, how much will an elephant plant weigh (in pounds): _____
Item 2
Trucks Dolls
12 11 87 65 21 43 99
14 15
Should trucks be X or Y? a = _____
b = _____
r = _____
If there are 9 dolls, how many trucks will a child have: _____
– 130 –
DAY 6: Regression
Item 3
Ducks Cows
3 11 66 41
15 2 11 10 57
a =
b =
If there are 22 ducks, how many cows will the farmer own: _____
_____ _____
Item 4
Doctors Nurses
1 13 11 5 22 3 11 1 16 18 17 8
a =
b =
If there are 5 nurses, how many doctors will a hospital have: _____
_____ _____
Item 5
Profs Classes
43 48 62 8 13 78 79
a =
b =
If there are 7 professors, how many classes will be taught: _____
_____ _____
DAY 6: Regression
– 131 –
Item 6
If there are 150 candles, how many flowers will a wedding have: _____
Candles Flowers
22 2 84 3 11
12 4 16 6 18
a =
b =
standard error of estimate (SEE) = _____
_____ _____
Item 7
How much will tuition (in thousands) be in 1997: _____
Year Tuition
1990 1 1991 3 1992 9 1993 12 1994 16
a =
b =
[Hint: use the last digit of the year or convert Years to your own integers]
_____ _____
– 132 –
Simulation 1
As a director of marketing, you are interested in how well your sales did over the year. In particular, you are curious whether last year’s revenue is a good predictor of this year’s. The numbers below are the number of dollars (in millions).
Last Year This Year $
86 13 14 54 89 11 12 7 11 98 27
What is the SumX of Last Year:________ What is the range of Last Year: _______ What is the SS of Last Year: _______ What is the SS of This Year: ________ What is the SSxy: ________
Since you are interested in how well one test acts as a linear predictor of another, which of the following tests should you perform:
a. t-test
b.ANOVA
c. correlation
d. regression
e. multiple regression
Perform the comparison you selected in the item above. Select only the appropriate one(s). What was the result of your calculations?\
a= b= r= t= F=
If a client’s last year revenue was 6 million, what would you predict this year’s revenue to be?
DAY 6: Regression
DAY 6: Regression
Simulation 2
Having encountered a new civilization, you use your knowledge of population statistics to predictions about wealth (number of grey rocks) and housing (hut size).
Rocks Hut Size
5 11 8 13 27
11 9 34
What is the SS for Rocks:
What is the standard deviation of Rocks: What is the range of Hut Size:
What is the SS for Hut Size:
What is the SSxy:
________
_______
_______
________
________
– 133 –
Since you are interested in how rocks predicted hut size, calculate the following:
a=
b=
If you have 7 rocks, how big is your hut likely to be?
Since you also are interested in how hut size predicts rocks, calculate the following:
a= b=
If your hut is 17 feet high, how many rocks would you likely have?
– 134 –
Simulation 3
As a politician, you are interested how family income predicts political contributions. Assume this is a population.
Giving Income
9 19 13 17 18 22
57 84 24 12
What is the SS for Income:
What’s the pop. stdev. of Income:_______
DAY 6: Regression
What is the range of Income: What is the SS for Giving: What is the SSxy:
_______
________
________
________
Since you are interested in how well one rating acts as a linear predictor of another, which of the following tests should you perform:
a.t-test
b.ANOVA c.correlation d.regression e.multiple regression
Perform the comparison you selected in the item above. Select only the appropriate one(s). What was the result of your calculations?
a= b= r= t= F=
If a family’s income is 4, what would you predict giving to be?
DAY 6: Regression
Simulation 4
– 135 –
As a director of counseling, you are interested in how well your therapy works. In particular, you are curious whether the number of counseling sessions is a good predictor of happiness (smiles per hour).
Sessions Happiness
12 4 37 92 4 11 79 55 69 27
What is the SS for Counseling: What is the pop. stdv. of Counseling: What is the mean of Happiness: What is the SS for Happiness:
What is the SSxy:
________
_______
_______
________
________
Since you are interested in how well one variable acts as a linear predictor of another, which of the following tests should you perform:
a.t-test
b.ANOVA c.correlation d.regression e.multiple regression
Perform the comparison you selected in the item above. Select only the appropriate one(s). What was the result of your calculations?
a= b= r= t= F=
If a client has 10 sessions with you, what would you predict their happiness to be?
– 136 –
SUMMARY
To finish off our discussion of measurement, there is a review, quiz, progress check, and chapter answers.
Review
DAY 6: Regression
The variable with the smallest standard deviation is the easiest to predict.
Without knowing anything else about a variable, the best predictor of it is its mean.
The angle of a regression line is called the slope. Slope is calculated by dividing the Sxy by the SSx.
The point where the regression line crosses the criterion axis is called the intercept. Predicting the future based on past experience is best done with a regression.
Predicting scores between known values is called interpolation. Predicting scores be- yond known values is called extrapolation.
Regression works best when a relationship is strong and linear.
Regression works best when the correlation is strong.
The error around a line of prediction is consistent along the whole line.
The error around a line of prediction can be estimated with the standard error of esti- mate.
Plus or minus one SEE accounts for 68% of the prediction errors.
A regression is based on paired-observations on the same subjects. Pre- and Post-test performance is best analyzed by using a regression.
DAY 6: Regression
1. Regressions assume that:
a. Student’s t = 0
b. trends are linear
c. the variables are discrete
d. subjects are randomly assigned
e. errors vary along the regression line
2. Regressions are best used for measuring:
a. validity
b. dispersion
c. central tendency
d. distance from a mean
e. differences between means
3.The angle of a regression line is called:
a. intercept
b. intersect
c. slope
d. shift
e. the point where X and Y meet
4.Scores which are projected beyond their samples are said to be:
a. interpolated
b. extrapolated
c. innovated
d. interpreted
e. unsubstantiated
5.Scores which predict between samples are said to be:
a. b. c. d. e.
6.The as:
a. b. c. d. e.
– 137 –
interpolated extrapolated innovated interpreted unsubstantiated
standard error of estimate is best understood
Sum of Squares variance
standard deviation mean
median
7. Linear regressions use the:
a. central limit theorem
b. standardized score theorem c. peripheral limit theorem
d. least squares criterion
e. maximum difference analysis
8. Linear regressions use the formula for a: a. circle
b. curved path
c. straight line
d. correlated t-test e. independent t-test
9. The line of regression always goes through the point where:
a. the most error occurs
b. the two means intersect
c. the X and Y axes intersect d. the slope is balanced
e. the slope is standardized
10. As a director of marketing, you are interested in how well your sales did over the year. In par- ticular, you are curious whether last year’s revenue is a good predictor of this year’s. Since you are interested in how well one test acts as a linear pre- dictor of another, which of the following tests should you perform:
a. t-test
b. ANOVA
c. correlation
d. regression
e. multiple regression
– 138 –
Progress Check
1. List three types of correlation and the kind of variables with which they are used: a.
b. c.
2. Calculate the following using this data:
XY
22 34 78 97
11 12 14 10
SSx = _____ SSy = _____ SSxy = _____ a = _____
b = _____ r = _____
Progress Check 6
Progress Check 6
– 139 –
3. As a doctor, you are interested in how similar your patients patterns of walking and swim- ming are. You have measured their performance on each task, and now hope to find how related these two variables are. The numbers below represent the number of hours per week spent by the same patients in each activity.
Walking Swimming
16 3 11 4 10 4 10 7
53 2 12
What is the sum of Walking: What is the mean of Walking: What is the SS for Walking:
__________
__________
__________
Since you are interested in communality, which of the following tests should you perform: a.multiple regression
b.regression c.correlation d.t-test e.ANOVA
Perform the comparison you selected in the item above. What was the result of your calcula- tion (select only the appropriate ones):
a= b= r= t= F=
How many degrees of freedom in this study?
What is the critical value for this statistic?
Is there a significant relationship between these variables at the .05 alpha level? Calculate the coefficient of determination: ________
– 140 –
Progress Check 6
4. As a chocolate seller, you are interested how price affects sales. Here is the information from the some of the most recent months:
Price Sales
44 53 37 22 98
11 12
What is the SS for Price: What is the variance of Price: What is the range of Price: What is the SS for Sales: What is the SSxy:
________
_______
_______
________
________
Since you are interested in how well one rating acts as a linear predictor of another, which of the following tests should you perform:
a.t-test
b.ANOVA c.correlation d.regression e.multiple regression
Perform the comparison you selected in the item above. Select only the appropriate one(s). What was the result of your calculations?
a= b= r= t= F=
If price equals 15, what would you predict sales to be?
Progress Check 6
Answers
Practice Problems Item 1
Petals (X) is predicting Weight (Y) SSx = 50
SSy = 38.80
SSxy = -42
a = 9
b = -.84
If Petals is 6, predict weight to be 3.96
Item 2
Dolls is X, Trucks is Y
SSx = 118
SSy = 139.43
SSxy = 126
a = 1.42
b = 1.07
If Dolls is 9, predict Trucks to be 9.55
Item 3
SSx = 109.33
SSy = 82.33
SSxy = – 23.33
a = 7.73
b = – .21
If have 22 ducks, predict cows = 3.04
Item 4
SSxy = – 55
a = 15.12
b = – .26
If 5 nurses, predict doctors to be 13.79
Item 5
SSxy = 21
a=-1.83
b = 1.50
If 7 professors, predict 8.67 classes
Item 6
SSxy = -119
a = 11.09
b = – .41
If 22 candles, predicts 1.98 flowers
Item 7
SSx = 10
SSy = 154.80
SSxy = 39
a= .40
b = 3.90
Predict tuition to be 27.70 in 1997
– 141 –
Simulations Simulation 1
ΣX = 63
range of X = 11
SSx = 80.88
SSy = 76.88
SSxy = 57.88
Regression
a = 3.24
b = .72
If last year = 6, this year = 7.53
Simulation 2
SS of Rocks = 54.80 Population stdev = 3.31 Range of Huts = 9
SS of Huts = 3.35 Rocks predicts huts
a = 5.75
b = .53
If 7 rocks, 9.43 huts
Huts preditcs rocks a = .61
b = .59
If hut 17 feet high, 10.64 rocks
Simulation 3
SS of Income = 415.43
Population stdev of Income = 7.70 Range of Income = 20
SS of Giving = 220
SSxy= 265
If family income = 4, giving = 3.72
Simulation 4
SS of Counseling = 76
Population stdev of Counseling = 3.08 Mean of Happiness = 6.75
SS of Happiness = 61.5
SSxy = – 37
If 10 sessions, happiness = 4.80
Multiple Choice b,a,c,b,a,c,d,c,b,d
– 142 –
Progress Check 6
Progress Check
1. List three types of correlation and the kind of variables with which they are used:
a. phi
b. Pearson r
c. point-biserial
2 discrete variables
2 continuous variables
1 continuous & 1 discrete variable
2. Calculate the following using this data: SSx = 107.33
SSy = 68.83 SSxy = 77.33 a = 1.67
b = .72
r = .90
3. Walking & Swimming
Sum of Walking: 54
Mean of Walking: 9
SS for Walking: 120
Correlation
r = -.65
degrees of freedom 4
Critical value .811
Not significant relationship at the .05 alpha level Coefficient of determination: .43
4. Chocolate Price & Sales SS of Price:
variance of Price: range of Price: SS for Sales: SSxy:
63.33
12.67 (sample variance for “some” of the months)
9 70 56
regression
Price is X, Sales is Y
a = .99
b = .88
If price equals 15, what would you predict sales to be?
14.25
Philosophically we live in a world of likelihoods. We know that it rains in the Sahara but that it doesn’t occur frequently. We know that snow in January is likely in New York and unlikely in Tahiti. We know that the sun rises in the east…usually. Notice that we have no proof that the sun will rise in the east tomorrow. It’s just that we think it is highly likely. We think it is very highly likely because we don’t know of any time that it hasn’t occurred that way. Imagine all of the theories we’d have to change if we woke up tomorrow and the sun was rising in the west.
In social science, business and real life, we base many of our decisions on probabilities. We believe elevators will probably take us to the correct floor, cars will probably fly and our job will probably be there tomorrow. Notice we’re not saying that our prediction forces something to happen. We are simply making guesses, predictions and estimations of the likelihood of events. In its simplest form, the probability of This AND That occurring is multiplied. And the probability of This OR That occurring is added.
A group can be compared to a written standard (criterion referenced testing) or a graphic standard (a significant regression line). To compare a group to a standard, use:
Criterion Reference Testing Analysis of Regression
– 143 –
Day 7: Probability
Comparing a group to a standard
BRIEFLY
– 144 –
INTRODUCTION
Criterion-referenced
Comparing group to a standard is an extension of comparing yourself to a standard. Typi- cally, we set standards by specifying the behaviors which must be visible and the tasks which need to be accomplished. And, as with self-standard comparisons, standards should use 3 principles:
Clear, not vague. Increase, not decrease Reward, not punish
To evaluate whether an individual meets a standard, a checklist is created. Every accom- plished task is checked off the list. Similarly, to evaluate whether a group meets a standard, the criteria for performance is specified (can add 4 3-digit numbers, can run a mile in 5 minutes, etc.). Performance can be on an individual basis (all members who perform the tasks are rewarded) or the group can be graded as a unit (the whole group wins or loses). In either case, comparing a group to a standard of written criteria requires very little number crunching
Analysis of regression
The second way to compare a group to a standard is to use a straight line. In this approach, data is compared to a model. The question is “Does the data look like this?”
“This” can be a simple or complex model. Nearly all group statistics are trying to match data to a model. The model differs from procedure to procedure but the underlying premise is the same.
Let’s take a simple model and extend our understanding of the linear regression to model testing. You already know how to calculate a correlation and a regression, so let’s expand on that knowledge.
A correlation is a measure of commonality; what two variables have in common. For the Pearson r, we plotted two continuous variables and looked at the scatterplot of the data. We could see if the trend was generally positive, negative, or had no linear pattern.
A regression is used from predicting. We plot a regression line through the data as best we can, using the line to make prediction. We can predict into the future (extrapolation) or fill in between data points (interpolation).
An analysis of regression looks at the pattern of data and compares it to the regression line drawn through it. It asks how well the data looks like the straight line.
This is a yes-no comparison. We start with the premise that the data doesn’t look like a straight line. We assume that there is no pattern. When we see small variations from a chance pattern, we still don’t accept the model of a straight line. We only change our minds when the pattern is strong that it is significant.
Our test of significance is actually a ratio of knowledge. We measure what we understand and what we don’t understand and compare them. When our level of understanding is so strong that the ratio of understanding to non-understanding posses a standard, we accept that the data is indeed linear
Significance
We use a F test to measure how well the data fits our linear model. The F test (named after its author, R.A. Fischer) is the ratio of understood variance (called regression mean squares or mean squares regression) to non-understood variance (called error mean squares or mean squares
error).
DAY 7: Probability
DAY 7: Probability
– 145 –
After you calculate the F, you compare it to the critical value in a table of Critical Values of F. There are several pages of critical values to choose from because the shape of F distribu- tion changes as the number of subjects in the study decreases. To find the right critical value, go across the degrees of freedom for regression (df regression) and down the df error.
Typically the standard we use for significance only allows 5% error. That is, significance is granted only when we will be right 95% of the time, and choose wrong 5% of the time. In other words, our alpha level is set .05 (the amount of error we are willing to accept). Setting the criterion at .05 alpha indicates that we want to be wrong no more than 5% of the time. Being wrong in this context means to see a significant relationship where none exists.
Two points should be made: (a) 5% is a lot of error and (b) seeing things that don’t exist is not good. Five-percent of the population of the US is over 50 million people; that’s a lot of error. If elevators failed 5% of the time, no one would ride them. If OPEC trims production by 5%, they cut 1.5 million barrels a day. There are 230 million people use the internet, about 5% of the world’s population.
We use a relative low standard of 5% because our numbers are fuzzy. Social science re- search is like watching a tennis game through blurry glasses which haven’t been washed in months. We have some understanding of what is going on—better than if we hadn’t attended the match—but no easy way to summarize the experience.
Second, seeing things that don’t exist is dangerous. In statistics, it is the equivalent of hallucination. We want to see the relationships that exist and not see additional ones that live only in our heads. Decisions which produce conclusions of relationship that don’t exist are called Type I errors.
If Type I error is statistical hallucination, Type II error is statistical blindness. It is NOT seeing relationships when they do exist. Not being able to see well is the pits (I can tell you from personal experience) but it’s not as bad as hallucinating. So we put most of our focus on limiting Type I error.
We pick an alpha level (how much Type I error we are willing to accept) and look up its respective critical value. If the F we calculate is smaller than the critical value, we assume the pattern we see is due to chance. And we continue to assume that it is caused by chance until it is so clear, so distinct, so accurate and it can’t be ignored. We only accept patterns that are significantly different from chance.
When the F we calculate is larger than the critical value, we are 95% sure that the pattern we see is not caused by chance. By setting the alpha level at .05, we have set the amount of Type I decision error at 5%.
Getting from SS to ms
The F test is a ratio of variance (understood/not understood). To find the variance, we begin by partitioning the Sum of Squares (SS) of the regression into explained and unex- plained components. Explained variance is simply the SSy multiplied by r2 (the coefficient of determination). The result is the SSregression (the understood portion of the regression).
The unexplained, not yet understood portion of the regression is found by multiplying the SSy by 1-r2 (the coefficient of nondetermination). The result is the SS (the non-understood
portion of the regression).
To get from Sum of Squares to variance, we divided each SS by its respective degrees of
freedom. The resulting variance terms are called mean squares (a reminder that variance is the average of the squared deviations from a distributions mean).
error
– 146 –
DAY 7: Probability
The degrees of freedom (df) for Regression is k-1 (columns minus one). Since a simple linear regression has only 2 columns, the df for an Analysis of Regression always equals 1. The df for Error is N-k (number of people minus the number of columns). And Totalerror = N- 1.
If it seems like it’s getting hard to keep track of all this, there is good news. An Analysis of Regression using a summary table that organizes all of the important information. Simply fill in the blanks of the table and the hard part is done.
Example.
In order to calculate an Analysis of Regression for this data,
XY
11 1 42 88 2 12 7 11
16 2
We fill in the blanks for the Analysis of Regression’s summary table:
SS df ms
Regression ____ ____ ____ Error ____ ____ ____ Total ____ ____ ____
Let’s start with the degrees of freedom. Since this is a simple linear regression, we know that df regression = 1. Two columns minus 1 = 1. We know that df error is equal to N-k (the number of people minus the number of columns); so 6-2 = 4. The total degrees of freedom is equal to N-1; 6-1 = 5.
We know that SStotal equals SSy. In this example, the SSy = 122. We partition this into SSregression and SSerror by multiplying the SStotal by r2 and 1-r2, respectively. Total Sum of Squares (122) times r2 (.337) equals 41.14. So 122 is partitioned into 41.14 (explained dispersion) and 80.86 (unexplained dispersion).
With this in mind, let’s update the summary table with what we know:
SS df ms
Regression 41.14 1 ____ Error 80.86 4 ____ Total 122.00 5 ____
Variance (which in a F-test is given the special designation of mean squares) is calculated by dividing the SS term by its respective degrees of freedom. Updating the summary table gives us:
SS df ms
Regression 41.14 1 Error 80.86 4 Total 122.00 5
41.14 20.21 24.40
DAY 7: Probability
testing F
– 147 –
The F statistic is the mean squares of Regression divided by the mean squares of Error. Use the mean squares from the summary table:
SS df ms
Regression 41.14 1 Error 80.86 4 Total 122.00 5
SoF =41.14/20.21 = 2.04
41.14 20.21 24.40
We test the significance of this F by comparing it to the critical value in the F Table. We enter the table by going across to the dfregression (1) and down the dferror (in this case it’s 4). So the critical value = 7.71. In order to be significant, the F we calculated would have to be equal to or larger than 7.71. Since it isn’t, the pattern we see is likely to be due to chance.
UNDERSTAND
Sampling with and without refreshment
Gambler’s fallacy
Establish Clear criteria
Criterion Reference Testing Explained & unexplained variance
Degrees of freedom
df is how much scores are free to vary.
We adjust df to avoid underestimating population variance
– 148 – DAY 7: Probability REMEMBER
Basic Facts
Theories are composed of constructs; models are composed of variables. Laws have prob- abilities whose accuracy is beyond doubt. Principles have some predictability but the prob- ability of beliefs is a matter of personal opinion.
Formulas:
An Analysis of Regression uses the formulas for correlation and a summary table which uses the following formulas:
Terms:
Regressiondf = k-1 Totaldf = N-1
F = Regressiondf /Errordf
Errordf = N-k
Mean sqaures = SS/df
alpha level
Analysis of Regression checklist
criterion reference testing critical values of F
df
F
k
k-1
mean squares
mean squareserror
mean squaresregression
N-1
N-k
partitioning
regression mean squares Type I error
Type II error
DAY 7: Probability
DO
Step-by-Step
XY
22 15 53 4.50 6 8 10 5.50 1
1. Calculate the Pearson r
SSx = 218.50 SSy = 126
r = .86 r2 = .74
2. Make a summary table
– 149 –
Now that we’ve covered the facts and concepts of z-scores, it’s time to put what we know into practice. This section includes Step-by-Step instructions, practice problems, simulations (word problems), quiz and a progress check.
SSxy = 142.50
In order to test the significance of a linear regression, start by creating the following table:
SS df ms
Regression ____ Error ____ Total ____
___ ____ ___ ____ ___ ____
3 .Partition SSy
In order to test the significance of a linear regression, the SSy is partitioned into two parts:
the amount due to the regression and the amount due to chance.
First, enter the total SS (Sum of Squares) for the table, which is the SSxy.
Second, multiply the SSy by r2.
Since the coefficient of determination (r2) is a measure of variance account for, multiple
SSy by it will give the percentage of SS due to the regression.
Third, multiply the SSy by 1- r2.
The coefficient of nondetermination (1- r2) times SSy indicates the percentage of SS due to
error (unaccounted for variance; what we don’t know). Updating the table, we now can see:
SS df ms
Regression 92.93 ___ ____ Error 33.07 ___ ____ Total 126 ___ ____
4. Find degrees of freedom
First, enter the degrees of freedom (df) for Regression, which is k-1 (columns minus one).
Since a simple linear regression has only 2 columns, the df for regression = 1.
Second, enter the df for Error, which is N-k (number of people minus the number of col-
umns). In our example, N = 5, so dferror = 3.
Third, enter the df for Total, which N-1 (number of people minus one). So, dftotal = 4.
– 150 –
DAY 7: Probability
Updating the table, we now can see:
SS df ms
Regression 92.93 Error 33.07 Total 126
1 ____ 3 ____ 4 ____
In order to check the accuracy of your calculation, simply add the dfs together; Regression + Error should equal Total.
5. Find mean squares
Mean squares is another name for variance. And since SS divided by df equals variance,
divide each SS by its respective df. Updating the table, we now can see:
SS df ms
Regression 92.93 1 Error 33.07 3 Total 126 4
92.93 11.02 31.50
6. Calculate F
Our test of significance is called an F-test. It is a ratio of the variance we understand to the
variance we don’t understand.
SS df
ms
92.93 11.02 31.50
Regression 92.93 Error 33.07 Total 126
1 3 4
Simply take the mean squares for Regression (msregression) and divide it by the mserror. The result is F. So, F = 8.43
7. Find the critical value
The Critical Values of the F Distribution table is actually a series of distributions. To enter
the table, go across to the row whose number matches the degrees of freedom for Regression (dfregression). And go down the dferror.
In our example, go across to 1 and down to 3. The critical value (the value you have to beat) = 10.13 (at the .05 alpha level).
8. Test significance
To test the significance of a linear regression, compare the F you calculated with the criti- cal value found in the table. If your F is larger than the book, you win: the regress is significant and the relationship is not likely to be due to chance.
In our example, we calculated F to be 8.43. The critical value in the F table was 10.13, so F is not significant. And the relationship between the two variables is likely to be due to chance.
DAY 7: Probability
Practice Problems Item 1
Calculate an Analysis of Regression on the following data, where Reading is predicting Education:
Reading Education
15 11 78 25 94 34
11 9 What is the Pearson r:
Perform the following calculations:
SS df ms
Regression ____ ____ ____ Error ____ ____ ____ TOTAL ____ ____
What is the F for this test:
– 151 –
– 152 –
Item 2
Calculate an Analysis of Regression on the following data, where TV watching predicts Vio- lence:
Television Violence
27 4 14 8 12
13 8 16 5 19 1
Perform the following calculations:
SS df ms
Regression ____
Error ____
TOTAL ____ ____
What is the F for this test: ___________ Is the critical value for F: ___________ Is the F significant?
____ ____
____ ____
What is the F for this test:
Item 3
To discover if TV watching has a significant impact on people’s shyness, use the following information, perform an Analysis of Regression and complete the following summary table:
r= – .55
SSx = 20.22 SSy = 15
N = 22
SS df ms
____ ____
____ ____
DAY 7: Probability
Regression ____
Error ____
TOTAL ____ ____
DAY 7: Probability
Item 4
– 153 –
To discover if learning statistics has a significant impact on people’s friendliness, use the following information, perform an Analysis of Regression and complete the following sum- mary table:
r= .45
SSx = 17.33 SSy = 16.30 N = 18
SS df ms
Regression ____
Error ____
TOTAL ____ ____
____ ____
____ ____
What is the F for this test: ___________ Is the critical value for F: ___________
Item 5
To discover if education has a significant impact on people’s income, use the following infor- mation, perform an Analysis of Regression and complete the following summary table:
r= .73 SSx = 50 SSy = 74 N = 12
SS df ms
What is the F for this test: ___________ Is the critical value for F: ___________ Is the F significant?
Regression ____
Error ____
TOTAL ____ ____
____ ____
____ ____
– 154 –
Item 6
To discover if music lessons have a significant impact on people’s intelligence, use the follow- ing information, perform an Analysis of Regression and complete the following summary table:
r= .12 SSx = 40 SSy = 42 N = 13
SS df ms Regression ____ ____ ____
Error ____ ____ ____
TOTAL ____ ____
What is the F for this test: ___________ Is the critical value for F: ___________ Is the F significant?
DAY 7: Probability
DAY 7: Probability
Simulations Simulation 1
As president of associated students, you wonder how much influence alumni giving has on the amount of scholarship money distributed. The numbers below are the number of dollars (in thousands) for some of the last few years:
Alumni Scholarships
13 12 97 65 73 47 12
What is the SS for Alumni:
What is the variance of Alumni:_______ What is the range of Alumni: _______ What is the SS for Scholarships:________ What is the SSxy: ________ Perform the following calculations:
SS df ms Regression ____ ____ ____
Error ____ ____ ____
TOTAL ____ ____ What is the F for this test:
Is it significant at .05 alpha?
Does alumni giving have a significant impact on scholarship money?
– 155 –
________
– 156 –
Simulation 2
As a Realtor, you wonder how much influence Age (in decades) has on selling prices of Houses (in $50,000s).
Age House
10 11 32 55 79 98 21
What is the SS for Age:
What’s the sample variance of Age:_______
DAY 7: Probability
What is the range of Age:
What is the SS for House:
What is the SSxy:
Perform the following calculations:
SS df ms Regression ____ ____ ____
Error ____ ____ ____
TOTAL ____ ____ What is the F for this test:
Does age have a significant impact on house price?
________
_______
________
________
DAY 7: Probability
SUMMARY
To finish off our discussion of measurement, there is a review, quiz, progress check, and chapter answers.
Review
The probability of X and Y occurring at the same time equals the probability of X times the probability of Y.
The probability of X or Y occurring (either one) is the addition of the probability of X and the probability of Y.
Analysis of Regression tests the likelihood that the linear relationship between the two vari- ables is due to chance. A significant F indicates that X predicts Y well and that the relationship between the two variables is not likely to be due to chance.
Analysis of Regression does not prove cause and effect. X may cause Y or Y may cause X, or both could be caused by another variable.
– 157 –
– 158 –
1. If the probability of being hit by lightning is .3 (it’s a very stormy night), and the probability of eating chocolate ice cream is .5, what is the prob- ability of eating ice cream AND being hit by light- ning?
a. .3
b. .5
c. .8
d. .15
e. .25
2. In the simplest case, the probability of either A or B occurring is calculated by:
a. adding the probabilities
b. subtracting the probabilities c. multiplyingtheprobabilities d. dividing the probabilities
e. bothAandC
3. If the alpha level is set at .05, the probability of making a Type I error is:
a. .01 b. .02 c. .05 d. .10 e. .20
4. The probability of passing this test item is: a. o
b. p
c. q
d. r
e. extremelyunlikely
5. A nondirectional test of significance is said to be:
a. no-tailed b. one-tailed c. two-tailed d. three-tailed e. four-tailed
Progress Check 7
6.The probability of rejecting the null when you shouldn’t is called::
a. TypeIerror
b. TypeIIerror
c. TypeIIIerror
d. TypeOerror
e. NotMyTypeerror
7.The probability of accepting the null when you shouldn’t is called::
a. TypeIerror
b. TypeIIerror
c. TypeIIIerror
d. TypeOerror
e. NotMyTypeerror
8.An Analysis of Regression tests a regression’s:: a. reliability
b. validity
c. symmetry
d. clarity
e. goodness of fit
9.An Analysis of Regression uses a:: a. b test
b. t test
c. F test
d. z test
e. vocabularytest
10. A significant Analysis of Regression test indi- cates the relationship is:
a. likely to be due to skewed means b. likely to be due to skewed medians c. likely to be due to confounds
d. unlikely to be due to chance
e. biased
Progress Check 7
Progress Check
1. List five measures of dispersion: a.
b. c. d. e.
2. List six criteria for evaluating theories: a.
b. c. d. e. f.
3. List six things associated with a linear regression: a.
b. c. d. e. f.
4. You wonder if the amount of caffine has an significant impact on the nervousness of public speakers. Use the following data to complete the summary table:
r= .81 SSx = 14 SSy = 33
N=9
SS df ms Regression ___ ___ ___ Error ___ ___ ___ TOTAL ___ ___ ___
What is the F for this test:
Is the critical value for F:
Is F significant at .05 alpha: ___________
– 159 –
– 160 –
Progress Check 7
5. As a doctor, you are interested in how similar your patients patterns of walking and dancing are. You have measured their performance on each task, and now hope to find how related these two variables are. The numbers below represent the number of hours per week spent by the same patients in each activity.
Walking Dancing
12 2 88 97 64 79 47 2 12
What is the sum of Walking: What is the SS for Dancing:
What is the SSxy:
__________
__________
__________
Since you are interested in communality, which of the following tests should you perform: a.multiple regression
b.regression c.correlation d.t-test e.ANOVA
Perform the comparison you selected in the item above. What was the result of your calcula- tion (select only the appropriate ones):
a= b= r= t= F=
How many degrees of freedom in this study?
What is the critical value for this statistic?
Is there a significant relationship between these variables at the .05 alpha level? Calculate the coefficient of determination: ________
Progress Check 7
– 161 –
6. As a chocolate maker, you are interested how sugar affects sales. You measure the amount of sugar (tablespoons) in each batch and the amount of chocolate sold (tablespoons).
Sugar Sales
34 16 7 11 55
14 9 What is the SS for Sugar:
What is the SS for Sales: What is the SSxy:
________
________
________
Since you are interested in how well one rating acts as a linear predictor of another, which of the following tests should you perform:
a.t-test
b.ANOVA c.correlation d.regression e.multiple regression
Perform the comparison you selected in the item above. Select only the appropriate one(s). What was the result of your calculations?
a= b= r= t= F=
If you put 9 tablespoons of sugar in a batch, how much would you expect to sell?
– 162 –
Answers
Practice Problems
Item 1
Pearson r = .79 F = 6.64
Regression Error TOTAL
Item 2
Pearson r = – .73 F = 4.61
Progress Check 7
Item 3
F = 8.67
Item 4
F = 4.06
Item 5
F = 11.41
Item 6
F = .16
SS
4.54 10.64 15.00
SS
3.30 13.00 16.30
SS
df ms
1 4.54 20 .52 21 .71
df ms
1 3.30 16 .81 17 .96
df ms
Regression Error TOTAL
Regression Error TOTAL
Regression Error TOTAL
Regression Error TOTAL
SS
26.73 16.10 42.83
SS
59.35
df ms
1 26.73 4 4.03 5 8.57
df ms
1 59.35
39.43
34.57
74.00 11
SS
df ms
Regression
Error 41.40
TOTAL 42 12
.60
4 12.87
51.48
110.83 5 22.17
1 39.43 10 3.46
1 .60 11 3.76
Progress Check 7
Simulations
Similation 1
What is the SS for Alumni: 85.33 Sample variance of Alumni: 17.07 Range of Alumni: 12 SS for Scholarships: 64
– 163 –
What is the SSxy: F = 7.74
60
SS df ms
Regression 42.19 1 Error 21.81 4 TOTAL 64.00 5
Simulation 2
What is the SS for Age:
The sample variance of Age: What is the range of Age: What is the SS for House: What is the SSxy:
F = 48.70
42.19 5.45
52 10.40
8 80 62
Regression Error TOTAL
Multiple Choice
d, a, c, b, c, a, b, e, c, d
Progress Check
SS df ms
73.92 1 73.92 6.08 4 1.52
80.00 5
1. List five measures of dispersion: a. Range
b. Mean Absolute Deviation c. Sum of Squares
d. Variance
e. Standard Deviation
2. List six criteria for evaluating theories: a. Clear
b. Useful
c. Summarizes facts
d. Small number of assumptions e. Internally consistent
f. Testable hypotheses
– 164 –
3. List six things associated with a linear regression: a. Slope
b. Intercept
c. Interpolate
d. Extrapolate
e. Least squares criterion
f. Standard error of estimate
4. Analysis of Regression:
SS df ms
What is the F for this test: 13.35 Is the critical value for F: 5.59 Is the F significant? Yes
5. Correlation
What is the sum of Walking:
What is the SS for Dancing::
What is the SSxy:
Pearson’s r = – .71
How many degrees of freedom in this study? 5
What is the critical value for this statistic? .755
Is there a significant relationship between these variables at the .05 alpha level? No Calculate the coefficient of determination: .51
6. Linear Regression
What is the SS for Sugar: 100
What is the SS for Sales: 34
What is the SSxy: 36
a= 4.84
b = .36
If you put 9 tablespoons of sugar in a batch, how much would you expect to sell? 8.08
Regression 21.65
Error 11.35
TOTAL 37.00 8
48 64 – 46
1 21.65 7 1.62
Progress Check 7
BRIEFLY
– 165 –
Day 8: t-Tests Comparing representatives
from 2 groups
A t-test asks whether two means are significantly different. If the means, as representatives of the variables, are equal or close to equal, the assumption is that the differences seen are due to chance. If the means are significantly different, the assumption is that the differences are due to the impact of one variable on the other.
When subjects are randomly assigned to groups, the t-test is said to be independent. That is, it tests the impact of an independent variable on a dependent variable. The independent variable is dichotomous (yes/no; treatment/control; high/low) and the dependent variable is continuous. If significant, the independent t-test supports a strong inference of cause-effect.
When subjects are given both conditions (both means are measures of the same subjects at different times), the t-test is said to be dependent or correlated. Because it uses repeated mea- sures, the correlated-t is often replaced by using a regression (where the assumptions of cova- riance are more clearly stated).
The t-test is a quick and easy way to compare 2 groups. To compare two groups, use:
Independent t-test Correlated t-test
– 166 –
INTRODUCTION
Independent t-test
The independent t-test assumes that one group of subjects has been randomly assigned to 2 groups. Each group contained the same number of subjects, has its own mean and has its own standard deviation.
Conceptually, the t-test is an extension of the z-score. A z score compares the difference between a raw score and the mean of the group to the standard deviation of the group. The result is the number of standard deviations between the score and the group mean.
Similarly, a t-test compares the difference between 2 means to the standard deviation of the pooled variance. That is, one mean pretends to be a raw score and the other mean is the mean of the group. The difference between these means is divided by a standard deviation; it’s calculated a little funny but conceptually it’s equivalent to the standard deviation used in calculating a z score.
Like a z score, a t-test is evaluated by comparing the calculated value to a standard. In the case of a z score, the standard is the Areas Under the Normal Curve. Similarly, a t-test com- pares its calculated value to a table of critical values. When N is large (infinity, for example), the values in the two tables are identical.
For example, in a one-tailed test at .05 alpha, the critical region would be the top 5% of the distribution. The z-score would be the one where 5% was beyond the z and 45% was between the mean and z (there’s another 50% below the mean but the table doesn’t include them). The appropriate z-score for the location where there is 5% beyond is 1.65. In the Critical Values of t, the critical value at the bottom of the .05 alpha 1-tailed column is 1.65.
Similarly, in a two-tailed test at the .05 alpha, the critical region would be the bottom 2.5% and the top 2.5%. The z-score for the bottom 2.5% is -1.96 and the z-score for the top 2.5% is +1.96. In the Critical Values of t table, the critical value at the bottom of the .05 alpha 2-tailed column is 1.96.
When the t-test has an infinite number of subjects, its critical value is the same as a z-score. At infinity, t-tests could be evaluated by referring to the Areas Under the Normal Curve table. A t-test, however, usually has a small number of subjects. Consequently, the values are modi- fied to adjust for the small sample size.
Significance
The t-test tells us if there is a significant difference between the means. It is as if two armies met and decided that each side would send a representative to battle it out. The repre- sentative would not be the best from each side but the average, typical member of their respec- tive groups. Similarly, by comparing the means, we are comparing the representatives of two groups. The entire cast is not involved, only a representative from each side.
We typically do a two-tailed test. That is, we want to know if Group 2 is significantly better than Group 1 AND if it is significantly worse. We want to know both things, so we start at the mean and assume that in order to be significantly different from chance, the t statistic has to be at either of the 2 tails. At .05 alpha (the amount of Type I error we are willing to accept), the critical region is split into to parts, one at each tail. Although the overall alpha level is 5%, there is only 2.5% at each tail.
In one-tailed tests, the entire 5% in a .05 alpha test would be at one end. That is, we would only want to know if Group 2 was significantly better than Group 1; we wouldn’t care if it was
DAY 8: T-Tests
DAY 8: T-Tests
– 167 –
worse. It doesn’t happen very often that our hypotheses are so finely honed that we are inter- ested in only one end of the distribution. We general conduct a 2-tailed test of significance. Consequently, the t statistic might be positive or negative, depending on which mean was put first. There is no theoretical reason why one mean should be placed first in a two-tailed test, so apart for identifying which group did better, the sign of the t-test can be ignored.
Consider the following data: Group 1 Group 2
85
83 11 8 14 4
87 64 42
To calculate an independent t-test, we need the mean and Sum of Squares for each variable. Then we use this formula:
Doing the calculations, we find:
SSx (the Sum of Squares of X) = 63.71 SSy (the Sum of Squares of Y) = 27.43 barX (the mean of X) = 8.43
barY (the mean of Y) = 4.71
And n for each variable is 7, so the total number of subjects (N) is 14. The degrees of freedom is N -2, so the df for this study = 12.
Inserting the appropriate infuriation into the appropriate slots in the formula, we calculate that t = 2.52
Comparing this to the critical value of 2.18, which we got by looking up the critical value at the .05 alpha level. We used the Critical Values of t table with 12 degrees of freedom. Since the value we calculated is larger than the book, the t is significant. That is, there is a significant difference between the two groups and they are unlikely to be from the same popu- lation.
Correlated t-test
Instead of randomly assigning subjects, a correlated t-test reuses them. The advantage is that each person acts as their own control group. Since no one is more like you than you, the control group can’t be more like the treatment group.
The second advantage is that a correlated t-test has more power (is able to use less people to conduct the study). An independent t-test has N-2 degrees of freedom. So if 20 people are randomly assigned to 2 groups, the study has 18 degrees of freedom. In a correlated t-test, if we use all 20 people, the study has 19 degrees of freedom.
The third advantage to correlated designs (also called within-subjects or repeated mea- sures designs) is cost. Reusing people is cheaper. If subjects are paid to participate, they are
– 168 –
DAY 8: T-Tests
paid for being in a study, regardless of how many trials it takes. Reusing people is also cheaper in time, materials and logistical effort. Once you have a willing subject, it’s hard to let them go.
The primary disadvantage of a correlated t-test is that it is impossible to tell if the effects of receiving one treatment will wear off before receiving the second condition. If people are testing 2 drugs, for example, will the first drug wear off before subjects are given the second drug?
A second problem with the pre- and post-test design often used with correlated t-tests is in its mathematical assumptions. Although the arguments are beyond the scope of this discus- sion, statisticians differ on the theoretical safety of using difference scores. Some worry that subtracting post-tests from pre-tests may add additional error to the process.
Consequently, a better way of testing correlated conditions is to use a correlation, a linear regression or an analysis of regression. Correlations test for relationship and can be used on ordinal and ratio data. Similarly, linear regression and analysis of regression make predictions and test for goodness of fit without relying on difference scores.
Correlated t-tests are sometimes called repeated-measures of within-subjects designs.
DAY 8: T-Tests
UNDERSTAND Hypothesis testing
– 169 –
Estimation
– 170 –
REMEMBER
Basic Facts
List and describe nine applications of the General Linear Model for continuous and discrete variables:
Continuous Models compare: a. Causal modeling
b. Multivariate analysis c. Multiple regression d. Regression
e. Correlation
f. Frequency distribution Discrete Models compare:
a. T-test 2 means;
b. One-way ANOVA c. Factorial ANOVA
Multiple measures of multiple factors Multiple predictors; multiple criteria Multiple predictors & single criterion Single predictor & single criterion
2 regression lines
1 variable (predictor or criterion)
1 independent variable
2+ means; 1 independent variable 2+ means; 2+ independent variables
DAY 8: T-Tests
Formulas
Terms:
1-tailed test 2-tailed test correlated t-test critical value df
estimation hypothesis testing independent t-test
DAY 8: T-Tests
DO
Step-by-Step
1. Consider this data set
In order to illustrate how to calculate an independent t-test, let’s agree to use this data:
Group1 Group2
2 10 39 4 12 87 58 5 13 8 11
2. Calculate the independent t
A t-test seeks to discover if there is a significant difference between two means.
First, like a z-score, the t-test begins with subtraction. Subtract one mean from another; it
doesn’t matter which one you start with. The mean of Group1 (X1) is 5 and the mean of X2 (Group2) is 10. So, the difference between the means = 5 (or -5 if you put the means in the other order).
Second, add the SS for each group together. The SS1 is 32 and the SS2 is 28. So the sum of the two Sum of Squares = 60.
Third, multiple n (the number of people in one group) times n-1. Each group has 7 scores, so 7 times 6 = 42.
Fourth, divide Step 2 by Step 3. That is, 60 divided by 42 = 1.43
Fifth, square-root Step 4. The square-root of 1.43 = 1.20.
Sixth, divide Step 1 by Step 5. And 5 divided by 1.2 = 4.17. So t = 4.17
In formal terms, here is how to calculate a t-test
Although it looks complicated, this formula is quite similar to that of the z-score. The top half of the equation is simply the difference between two means. The bottom half is the stan- dard deviation: the square-root of a pooled variance (combining both groups into one).
3. Find the critical value
The Critical Values of Student’s t table is a way to accommodate small sample sizes.
First, calculate N. That is, count the total number of subjects used in the study. Since there were 7 people in each group, combining both groups together equals 14.
Second, enter the table with N-2 degrees of freedom (df). So go down the table to 12 df and across to the .05 alpha level (2-tailed).
The critical value = 2.18. That’s the value you have to beat; if your t-test is larger than 2.18, there is a significant difference between the means.
– 171 –
Now that we’ve covered the facts and concepts of t-tests, it’s time to put what we know into practice. This section includes Step-by-Step instructions, practice problems, simulations (word problems), quiz and a progress check.
– 172 –
DAY 8: T-Tests
4. Test significance
Since the critical value for our example was 2.18. And the t we calculated was 4.17, t is
significant and the differences between the two groups are unlikely to be due to chance.
5. Which group did best
To interpret the t-test, you must know what was being measured (called the dependent variable). If the dependent variable was errors on a test, you probably want the group that has the lowest mean. If the dependent variable was dollars earned, you probably want the group with the largest mean. It is impossible to interpret the results of an independent t-test without knowing what was being measured.
Practice Problems Item 1
Calculate am independent t-test for the following data:
X1 X2 6 12
44 27 3 10 95 68 53
Mean of group 1 _____
Mean of group 2 _____
Difference between the means_____
SS of group 1 SS of group 2 n(n-1)
t =
_____
_____
_____
_____
How many degrees of freedom (df) at in this study What is the critical value for t (2 tailed, .05 alpha) Is the t significant _____
_____ _____
DAY 8: T-Tests
Item 2
Calculate am independent t-test for the following data:
X1 X2 15 3
11 5 84 12 2 76
Mean of group 1 _____ Mean of group 2 _____
– 173 –
SS of X1 SS of X2 t =
Item 3
_____
_____
_____
Calculate am independent t-test for the following data:
X1 X2 63
75 63 82 47 11 4
t = _____
How many degrees of freedom (df) at in this study _____ What is the critical value for t (2 tailed, .05 alpha) _____ Is the t significant? _____
Item 4
Calculate am independent t-test for the following data:
AB
52 11 4 92 65 23
t = _____
Is the t significant (.05 alpha)? _____
– 174 –
Item 5
Which students do significantly better in reading (words per minute):
High Scl College
DAY 8: T-Tests
t = _____
Item 6
75 43 61 6 11 38 28 1 13 69
Which cars are the safest (accidents):
Foreign Domestic
16 1 14 3 12 9 18 11 12 6
71
t = _____
Item 7
Which dogs are the meanest (bites per minute):
Big Little
t = _____
16 43 39 8 11 2 16 7 11 5 22 37
DAY 8: T-Tests
Simulations
1. As a coach, you are interested in how well each team has learned to run. You randomly assigned your players to two teams. Team A trained the old fashioned way. Team B is using computer-assisted training. The numbers in the columns below are the minutes needed to run around the track once.
Team A Team B 16 1
73 24 44 42 84 83
What is the median for Team A:__________ WhatisthemeanforTeamA: __________ What is the SS for Team A: __________ WhatisthemodeforTeamB: __________ What is the SS for Team B: __________
Since you are interested in which group did best, which of the following tests should you perform:
a.t-test
b.ANOVA c.correlation d.regression e.multiple regression
Perform the comparison you selected in the item above. Select only the appropriate one(s). What was the result of your calculation?
a= b= r= t= F=
How many degrees of freedom are in this study?
Which training program will you suggest to use in the future?
– 175 –
– 176 –
DAY 8: T-Tests
2. As a merchant, you are interested in how well your coupon works. You randomly assigned your customers to two groups. Half receive 10% off their next purchase, and half receive a free movie ticket. You measure the number of dollars of profit per day. Which is the better promotion?
10% Off Free Movie 2 11
1 24 37 29
4 22 6 12 3 17 3 14
What is the mean for 10% Off: __________
What is the SS for 10% Off: What is the mode for Movie:
What is the SS for Movie:
__________
__________
__________
Since you are interested in which group did best, which of the following tests should you perform:
a.t-test
b.ANOVA c.correlation d.regression e.multiple regression
Perform the comparison you selected in the item above. Select only the appropriate one(s). What was the result of your calculation?
a= b= r= t= F=
Which promotion is best for sales?
DAY 8: T-Tests
SUMMARY
To finish off our discussion of measurement, there is a review, quiz, progress check, and chapter answers.
Review
There are two kinds of t-tests: independent and correlated.
Correlated t-tests use paired observations on one group of subjects. This is called a within- subjects design.
Independent t-test use subjects which have been randomly-assigned to two groups.
The degrees of freedom for a correlated t-test equals n-1.
The degrees of freedom for an independent t-test equals N-2.
T-tests are not used with more than two groups because of the likelihood of Type I error. T-tests measure the differences between means.
T-tests are like z-scores.
The independent t-test pools the variance of the subgroups.
– 177 –
– 178 –
1. A null hypothesis says that two means are: a. significantly different
b. slightdifferent
c. notsignificantlydifferent
d. maybedifferent
e. you have hit the null on the head
2. A t-test compares: a. two medians b. two modes
c. two means
d. two variables
e. two standard deviations
3. A t-test is calculated like which of the follow- ing:
a. point estimation
b. sum of squares
c. degrees of freedom d. z-scores
e. confidencelevels
4. Rejecting the null when you should have ac- cepted it is a:
a. TypeIerror
b. TypeIIerror
c. TypeIIIerror
d. Typo error
e. NotMyTypeerror
5.Comparedtoanalphalevelof.20,analphalevel of .05 is less likely to have:
a. TypeIerror
b. Type II error
c. TypeIIIerror
d. TypeOerror
e. NotMyTypeerror
Progress Check 8
6. When the null hypothesis is true, the expected value for an independent measures t statistic is:
a. 0
b. +1.0
c. -1.0
d. +1 or -1
e. none of the above
7. When t = 2.01 and N = 5000, it indicates that the means are significantly different (2 tailed) at the:
a. .20 alpha level b. .05 alpha level c. .01 alpha level d. both A and B e. A,BandC
8. How many dependent variables does a t-test have:
a. 1
b. 2
c. 3
d. N e. N-1
9. When t = 4.3 and df = 60, it indicates that the meansaresignificantlydifferent(2tailed)atthe:
a. .20 alpha level b. .05alphalevel c. .01 alpha level d. bothAandB e. A, B and C
10. What’s the critical value for an independent t- testwith42subjects:
a. .69 b. 1.86 c. 1.96 d. 2.02 e. 2.07
Progress Check 8
Basic Facts
List and describe nine applications of the General Linear Model for continuous and discrete variables:
Continuous Models compare:: a.
b. c. d. e. f.
Discrete Models compare: a.
b. c.
– 179 –
– 180 –
Progress Check 8
1. As a clinician, you are interested in the relationship between the amount of sleep your patients get and their sense of hope. You measure all of your patients on these two variables:
Sleep Hope 17 21 12 12 10 9
67 43 12
What is the sum of Sleep: What is the SS of Sleep: The variance of Sleep is : What is the mean of Hope: WhatistheSSofHope :
_______
_______
_______
_______
_______
Since you are interested in commonality, which of the following tests should you perform: a.t-test
b.ANOVA c.correlation d.regression e.multiple regression
Perform the comparison you selected in the item above. Select only the appropriate one(s). What was the result of your calculation?
a= b= r= t= F=
Calculate the coefficient of determination: What percent of variance is unaccounted for:
Progress Check 8
– 181 –
2. As a director of sales, you are interested in how well your sales did over the year. In particular, you are curious whether last year’s revenue is a good predictor of this year’s. The numbers below are the number of dollars (in millions) for some of the months.
Last Year This Year 1 16
3 11 58 75 74 72
12 -7 What is the SS for Last Year:
________ What is the variance of Last Yr:_______
What is the range of Last Year:_______ What is the SS for This Year: ________ What is the SSxy: ________
Since you are interested in how well one test acts as a linear predictor of another, which of the following tests should you perform:
a. t-test
b. ANOVA
c. correlation
d. regression
e. multiple regression
Perform the comparison you selected in the item above. Select only the appropriate one(s). What was the result of your calculations?
a= b= r= t= F=
If a client’s last year revenue was 14 million, what would you predict this year’s revenue to be?
– 182 –
Progress Check 8
3. As a coach, you are interested in how well each team has learned to run. You randomly assigned your players to two teams. Team A trained the old fashioned way. Team B is using computer-assisted training. The numbers in the columns below are the minutes needed to run around the track once (it’s a long track).
Team A Team B
18 5 11 6 74 87 10 11
What is the median for Team A:__________ WhatisthemeanforTeamA: __________ What is the SS for Team A: __________ WhatisthemodeforTeamB: __________ What is the SS for Team B: __________
Since you are interested in which group did best, which of the following tests should you perform:
a.t-test
b.ANOVA c.correlation d.regression e.multiple regression
Perform the comparison you selected in the item above. Select only the appropriate one(s). What was the result of your calculation?
a= b= r= t= F=
Which training program will you suggest to use in the future?
Progress Check 8
– 183 –
4. You want to know if the amount of advertising has an significant impact on the sale of your wid- gets. Use the following data to complete the summary table:
r = – .62 SSx = 21 SSy = 13 N = 18
SS df ms
Between ____
Within ____
TOTAL ____ ____
What is the criterion in this study:
What is the F for this test: ___________ What is the critical value for F: ___________ Is F significant at .05 alpha: ___________
____ ____
____ ____
– 184 –
Answers
Practice Problems
Item 1
Mean of group 1 5
Mean of group 2 7
Differencebetweenthemeans 2
SS of group 1 32
SS of group 2 64
n(n-1) 42
t= 1.32
How many degrees of freedom (df) at in this study What is the critical value for t (2 tailed, .05 alpha)
Progress Check 8
Is the t significant
Item 2
No
10.60 4
41.20
Item 3
Mean of group 1 Mean of group 2 SS of X1
SS of X2
10 t= 4.13
t = 2.48
How many degrees of freedom (df) at in this study 10 What is the critical value for t (2 tailed, .05 alpha) 2.23 Is the t significant? Yes
Item 4
t = 2.03
Is the t significant (.05 alpha)? No
Item 5
t = 1.77
Item 6
t = 3.45
Item 7
t = 2.83
14-2 = 12 2.18
Progress Check 8
– 185 –
Simulations Simulation 1
What is the median for Team A: 7
What is the mean for Team A: 7.42
What is the SS for Team A: 199.71
What is the mode for Team B: 4
What is the SS for Team B: 8
This is a t-test
t = 2.24
How many degrees of freedom are in this study? 12
Which training program will you suggest to use in the future?
Team B is significantly better (use computer-assisted training)
Simulation 2
What is the mean for 10% Off: 3 What is the SS for 10% Off: 16 What is the mode for Movie: none What is the SS for Movie: 258 This is a t-test
t = 5.20
Which promotion is best for sales? Free movie earned significantly more dollars
Multiple Choice
c, c, d, a, a, a, e, a, e, d
Basic Facts
1. List and describe nine applications of the General Linear Model for continuous and discrete variables:
Continuous Models compare: a. Causal modeling
b. Multivariate analysis c. Multiple regression d. Regression
e. Correlation
f. Frequency distribution Discrete Models compare:
a. T-test 2 means;
b. One-way ANOVA c. Factorial ANOVA
Multiple measures of multiple factors Multiple predictors; multiple criteria Multiple predictors & single criterion Single predictor & single criterion
2 regression lines
1 variable (predictor or criterion)
1 independent variable
2+ means; 1 independent variable 2+ means; 2+ independent variables
– 186 –
Item 1
Progress Check 8
Correlation
What is the sum of Sleep:
What is the SS of Sleep:
The variance of Sleep is:
What is the mean of Hope:
What is the SS of Hope:
r = .97
Calculate the coefficient of determination: .95 What percent of variance is unaccounted for: .05
Item 2
50 169.33
28.22 (population) 9
242
Regression
What is the SS for Last Year:
What is the variance of Last Yr
What is the range of Last Year
What is the SS for This Year:
What is the SSxy: – 152
a = 17.90
b = – 2.05
If X = 14 million, what would you predict this year’s revenue to be: – 10.86 million
Item 3
t-test WhatisthemedianforTeamA: 10
What is the mean for Team A: What is the SS for Team A: What is the mode for Team B: What is the SS for Team B:
10.80 74.80 none 29.20
74
12.33 (sample) 11
317.71
t = 1.84
What is the critical value for this study: 2.31
Which training program will you suggest to use in the future: Either; not significant
Item 4
Analysis of Regression
Between Within TOTAL
SS df ms
5 1 5
8 16 .50 13 17
What is the criterion in this study: Sales What is the F for this test: 9.99 What is the critical value for F: 4.49 Is F significant at .05 alpha: Yes
BRIEFLY
Day 9:
1-Way ANOVA
Comparing 3+ groups
When more than 2 groups are to be compared, multiple t-tests are conducted because of the increased likelihood of Type I error. Instead, before subgroup comparisons are made, the vari- ance of the entire design is analyzed. This pre-analysis is called an Analysis of Variance (ANOVA for short). Using the F-test (like an Analysis of Regression), an ANOVA makes a ratio of variance between the subgroups (due to the manipulation of the experimenter) to variance within the subgroups (due to chance).
Comparing 3 or more groups requires a pre-analysis of the data. To compare a group to a standard, use:
Within-Subjects Designs 1-Way ANOVA
– 187 –
– 188 –
INTRODUCTION
Within-subjects
Sometimes we want to take repeated measures of the same people over time. These spe- cialized studies are called within-subjects or repeated measures designs. Conceptually, they are extensions of the correlated t-test; the means are compared over time.
Like correlated t-tests, the advantages are that subjects act as their own controls, eliminat- ing the difficulty of matching subjects on similar backgrounds, skills, experience, etc. Also, within-subject designs have more power (require less people to find a significant difference) and consequently are cheaper to run (assuming you’re paying your subjects).
They also suffer from the same disadvantages. There is no way of knowing if the effects of trial one wear off before the subjects get trial 2. The more trials in a study the larger the potential problem. In a multi-trial study, the treatment conditions could be impossibly con- founded.
A more detailed investigation of within-subject designs is beyond the score of this discus- sion. For now, realize that it is possible, and sometimes desirable, to construct designs with repeated measures on the same subjects. But it is not a straightforward proposition and re- quires more than an elementary understanding of statistics.
1-way anova
It is called 1-way because there is one independent variable is this design. It is called an ANOVA because that’s an acrostic for ANalysis Of VAriance. An 1-way analysis of variance is a pre-test to prevent Type I error.
Although we try to control Type I error by setting our alpha level at a reasonable level of error (typically 5%) for one test, when we do several tests, we run into increased risk of seeing relationships that don’t exist. One t-test has a 5/100 chance of having Type I error. But mul- tiple t-tests on the same data set destroy the careful controls we set in place.
We can use a t-test to compare the means of two groups. But to compare 3, 4 or more groups, we’d have to do too many t-tests; so many that we’d risk finding a significant t-test when none existed. If there were 4 groups (A, B, C and D, we’ll call them), to compare each condition to another you’d have to make the following t-tests:
AB
AC
AD
BC
BD
CD
The chances too good that we’ll find one of those tests to look significant but not be. What we need is a pre-analysis of data to test the overall design and then go back, if the overall variance is significant, and conduct the t-tests.
Theory of F
The premise of an ANOVA is to compare the amount of variance between the groups to the variance within the groups.
The variance within any given group is assumed to be due to chance (one subject had a good day, one was naturally better, one ran into a wall on the way out the door, etc.). There is no pattern to such variation; it is all determined by chance.
DAY 9: 1-Way ANOVA
DAY 9: 1-Way ANOVA
– 189 –
If no experimental conditions are imposed, it is assumed that the variance between the groups would also be due to chance. Since subjects are randomly assigned to the groups, there is no reason other than chance that one group would perform better than another.
After the independent variable is manipulated, the differences between the groups is due to chance and the independent variable. By dividing the between group variance by the within variance, the chance parts should cancel each other out. The result should be a measure of the impact the independent variable had on the dependent variable. At least that’s the theory be- hind the F test.
Significance
Yes, this is the same F test we used doing an Analysis of Regression. And it has the same summary table:
Between Within Total
_____ _____
_____
____ ____ ____ ____ ____ ____
SS df ms
Notice that the titles have changed. We know talk about Between Sum of Squares, not Regres- sion SS. The F test (named after its author, R.A. Fischer) is the ratio of between-group vari- ance (called between mean squares or mean squaresbetween) to within-group variance (called within mean squares or mean squareswithin).
After you calculate the F, you compare it to the critical value in a table of Critical Values of F. There are several pages of critical values to choose from because the shape of F distribu- tion changes as the number of subjects in the study decreases. To find the right critical value, go across the degrees of freedom for regression (dfbetween) and down the dfwithin.
Typically the standard we use for significance only allows 5% error. That is, significance is granted only when we will be right 95% of the time, and choose wrong 5% of the time. In other words, our alpha level is set .05 (the amount of error we are willing to accept). Setting the criterion at .05 alpha indicates that we want to be wrong no more than 5% of the time. Being wrong in this context means to see a significant relationship where none exists.
Two points should be made: (a) 5% is a lot of error and (b) seeing things that don’t exist is not good. Five-percent of the population of the US is over 50 million people; that’s a lot of error. If elevators failed 5% of the time, no one would ride them. If OPEC trims production by 5%, they cut 1.5 million barrels a day. There are 230 million people use the internet, about 5% of the world’s population.
We use a relatively low standard of 5% because our numbers are fuzzy. Social science research is like watching a tennis game through blurry glasses which haven’t been washed in months. We have some understanding of what is going on—better than if we hadn’t attended the match—but no easy way to summarize the experience.
Second, seeing things that don’t exist is dangerous. In statistics, it is the equivalent of hallucination. We want to see the relationships that exist and not see additional ones that live only in our heads. Decisions which produce conclusions of relationship that don’t exist are called Type I errors.
If Type I error is statistical hallucination, Type II error is statistical blindness. It is NOT seeing relationships when they do exist. Not being able to see well is the pits (I can tell you from personal experience) but it’s not as bad as hallucinating. So we put most of our focus on limiting Type I error.
– 190 –
DAY 9: 1-Way ANOVA
Example
We pick an alpha level (how much Type I error we are willing to accept) and look up its respective critical value. If the F we calculate is smaller than the critical value, we assume the pattern we see is due to chance. And we continue to assume that it is caused by chance until it is so clear, so distinct, so accurate and it can’t be ignored. We only accept patterns that are significantly different from chance.
When the F we calculate is larger than the critical value, we are 95% sure that the pattern we see is not caused by chance. By setting the alpha level at .05, we have set the amount of Type I decision error at 5%.
Group 1 Group 2 Group 3
6 6 13 4 1 10 536 827 289
In order to calculate an Analysis of Variance for this data, we fill in the blanks for the Analysis of Variance’s summary table:
SS df ms
Regression _____ ____ ____ Error ____ ____ ____ Total _____ ____ ____
Let’s start with the degrees of freedom. Like ANOR, an ANOVA Since this is a simple linear regression, we know that dfbetween = k-1, where k is the number of columns (kolumns) or groups (kroups?). Three columns minus 1 = 2.
We know that dfwithin is equal to N-k (the number of people minus the number of columns); so 15-3 = 12. The total degrees of freedom is equal to N-1; 15-1 = 14.
With this in mind, let’s update the summary table with what we know:
Calculating SS
SS df ms
Regression _____ 2 ____ Error ____ 12 ____ Total _____ 14 ____
To calculate the Sum of Squares for this study, we begin with collecting some summary information. Find the sum of each group, the n of each group, X-squares of each group (square each score and add them up), and the SS for each group. Then, add each column across and put it in a Totals column. Like this:
ANOVA
X1 X2 X3 TOTALS 6 6 13
4 1 10 536 827 289
Sum 25 20 45 90 Sum X-sqs 145 114 435 694
N 5 5 5 15 SS 20.00 34.00 30.00 84
DAY 9: 1-Way ANOVA
– 191 –
SSwithiTn he SS for each group (20, 34 and 30, respectively) is the amount of dispersion WITHIN that group. So the sum of those SS, gives us Within SS (84 in this case). That is, SSwithin is the SSx1 + SSx2 + SSx3….
SSbetween
The SSbetween is a bit more challenging to calculate. Here is the formula for it:
⎛(∑AB)2 ⎞ G2 S S b = ∑ ⎜⎝ n ⎟⎠ − N
Impressive, huh?
Let me explain. Take the sum of each group: 252 + 202 + 452
Divide it by the number of scores in each group. NOT the number of columns, the number of scores:
252 + 202 + 452 —————-—————
5
Now square the total of all the raw scores (90) and divide it by N (the number of people in the study). And subtract this from the subgroup numbers:
252 + 202 + 452 902 —————– minus ————
5 15
This is going to give us:
625 + 400 + 2025 8100 ——————— minus —————
5 15
And this gives us:
3050 / 5 minus 540
That is:
610 minus 540 = 70. The SSbetween equals 70.
Getting to F
SStotal To find the SStotal we use the totals column and plug those numbers into the regular Sum of Squares formula. That is, we ignore which group the scores come from and treat them as if they were all in one group. So, 694 minus (902) / 15. Notice that the last part of the formula has already been calculated with we did the SSbetween. What we come to, then, is 694 minus 540, which equals 154. The SStotal = 154.
– 192 –
DAY 9: 1-Way ANOVA
Finding F
Updating the summary table gives us:
Between Within Total
70.00 2
84.00 12 154.00 14
35.00 7.00 11.00
SS df ms
F is mean squares between (msbetween) divided by mean squares within (mswithin). In this case, F = 35 / 7 = 5.00
To test the significance of this F, we look up the critical value for the test at 2 and 12 degrees of freedom. Using the F Table, we find that the critical value at .05 alpha is 3.88. Since the value we calculated is larger than the one in the book, F is significant.
Interpretation
Since the F is significant, what do we do now?
Now, all of those t-tests we couldn’t do because we were afraid of Type I error are available for our calculating pleasure. So we do t-tests between:
AB AC BC
We might find that there is a significant difference between each group, such as this:
We might find that there is a not a significant difference between two of the groups but that there is a significant difference between them and the third group:
or
Also, which good did best depends on whether the numbers of money (you want the higher means) or errors (you want to lower means).
Just think, if the F had not been significant, there would not be anything left to do. We would have stopped with the calculating of F and concluded that the differences we see are due to chance. How boring, huh?
DAY 9: 1-Way ANOVA
UNDERSTAND
Degrees of freedom
Illustration 1: Degrees of freedom is like having a ball and chain. The mean is the ball and the chain allows you to go only a limited number of steps (degrees of freedom). The chain length can vary from N to N-1 or N-2,
etc.
Illustration 2: Degrees of freedom is like being in
jail. Scores are free to anywhere…as long as they stay within their degrees of freedom.
Illustration 3: Degrees of freedom is like being on a leash. The mean restricts only a bit (usually 1 or 2 degrees). All the other scores are free to vary.
– 193 –
– 194 – DAY 9: 1-Way ANOVA REMEMBER
Basic Facts
A 1-way ANOVA compares 3+ means on one independent variable.
Formulas:
There are two methods of calculating Sum of Squares: deviation (which is done to
⎛(∑AB)2 ⎞ G2 S S b = ∑ ⎜⎝ n ⎟⎠ − N
SS =∑X2 +∑X2 +∑X2 +…−
Mean Square = SS/df
Terms:
1-Way ANOVA ANOVA confounded controls
F
mean squares
mswithin
repeated measures design
SS within
SSbetween
SStotal
within-subjects design
msbetween
t
123
(∑X)2 N
DAY 9: 1-Way ANOVA
DO
Step By Step
1. Summarize the subgroups
First, find the n (number of score) for each group.
Second, find the sum for each group.
Third, square each number in the first group and sum them. Then square and sum the
scores in each of the other group.
Fourth, find the Sum of Squares (SS) for each group.
Fifth, find the totals for n, sum, squares, and SS. So, N = 20 (the sum of each group’s n’s).
The SumX = 117 and so on.
Updating our example, it would look like this:
Group1 Group2 Group3 Group4 Totals
1 6 12 5 2472 4 9 15 6 3593 2 11 7 4
n 5 5 5 5 20 Sum 12 35 50 20 117 Squares 34 279 548 90 951 SS5.20344810 97.20
2. Find SSwithin
First, start by creating a summary table.
Second, write in the SSwithin. In the process of creating summarizing the groups, the SSwithin was calculated. It was 97.2. That is, SSwithin (within the experiment) is the sum of the SS that is in (within) each group.
Updating our summary table, it now looks like this:
– 195 –
Now that we’ve covered the facts and concepts of z-scores, it’s time to put what we know into practice. This section includes Step-by-Step instructions, practice problems, simulations (word problems), quiz and a progress check.
Between Within Total
SS df ms
____ ___ ____ 97.20 ___ ____ ____ ___ ____
– 196 – DAY 9: 1-Way ANOVA 3. Find SSbetween
The formula for SSbetween looks more difficult than it is. Here’s the formula:
Here’s what to do about it. First, start with the sum of each group (12, 35, 50, 20). Square each of them and add them together:
122 +352 +502 +202
So we get: 144 + 1225 + 2500 + 400 = 4269.
Second, divide Step1 by the n (the number of subjects in each group. NOTE: It is not the number of groups but the number of scores in each group. This gives us: 4269 divided by 5 = 853.8.
Third, take the Sum of X in the totals column (117) and square it, which equals 13689.
Fourth, divide Step3 by the N in the totals column (20). That is, 13689 divided by 20 = 684.5
Fifth, subtract Step4 from Step2. So, 853.8 – 684.5 = 169.35. This is the SSbetween. Updating our summary table, it now looks like this:
SS df ms
169.35 ___ ____ 97.20 ___ ____ ____ ___ ____
The formula for SStotal is the same as any basic SS. We use the information in the totals column and apply this formula:
First, note that the sum of X-squares = 951.
Second, take the sum of X’s (117) and square it. This equals 13689.
Third, divide Step2 by 20 (big N), which equals 684.5
Fourth, subtract Step3 from Step1. That is, 951 – 684.5 = 266.55. This is the SStotal. Updating our summary table, it now looks like this:
Between Within Total
4. Find SStotal
Between Within Total
SS df ms
169.35 ___ ____ 97.20 ___ ____ 266.55 ___ ____
To check the calculations, simply add SSbetween to SSwithin and see if they equal SStotal. It does, so we calculated everything correctly.
DAY 9: 1-Way ANOVA
5. Find the degrees of freedom
First, enter the degrees of freedom (df) for Between, which is k-1 (columns minus one). Since our example has 4 columns, the df for Between = 3.
Second, enter the df for Within, which is N-k (number of people minus the number of columns). In our example, N = 20, so dfwithin = 16.
Third, enter the df for Total, which is N-1 (number of people minus one). So, dftotal = 19.
SS df ms
Between 169.35 3 ____ Within 97.20 16 ____ Total 266.55 19 ____
6. Find F
First, calculate the appropriate mean squares. Since mean squares is another name for variance (and SS divided by df equals variance), divide each SS by its respective df. Updating the table, we now have:
SS df ms
– 197 –
Between 169.35 3 Within 97.20 16 Total 266.55 19
56.45 6.08 14.03
Second, divide the mean squares of Between by the mean squares of Within. That is, 56.45 divided by 6.08 = 9.29. This ratio is called the F test, so F = 9.29.
7. Find the critical value
The Critical Values of the F Distribution table is actually a series of distributions. To enter the table, go across to the row whose number matches the degrees of freedom for Between (dfbetween). And go down the dfwithin.
In our example, go across to 3 and down to 16. The critical value (the value you have to beat) = 3.24 (at the .05 alpha level).
8. Decide what to do next
If F is not significant, there is nothing else to do. The differences between the groups is due to chance.
If F is significant, than t-tests are done: one between each pair of combinations (AB, AC, AD, BC, BD and CD).
To test for significance, the calculated value is compared to the F table. If the value you calculated is bigger than the value in the book, F is significant. In our example, we calculated F to be 9.29, which is bigger than the critical value of 3.24 we found at 3 and 16 degrees of freedom. So, F is significant and the t-tests are authorized.
– 198 –
Practice Problems Item 1
Which grade has the most car accidents:
DAY 9: 1-Way ANOVA
10th 11th 12th
2 13 4 9 17 8 3 14 2 191 714
SS df ms
Between ____ ___ ____ Within ____ ___ ____ Total ____ ___ ____
Item 2
Which color of house is lived in the longest (in years)?
Blue Green Peach
8 11 4 798 379 1 18 2 9 12 4
SS df ms
Between ____ ___ ____ Within ____ ___ ____ Total ____ ___ ____
DAY 9: 1-Way ANOVA
Item 3
Which is the best pizza (most pepperoni):
– 199 –
PHome PMad Mo-Pizza
12 3 5 235 804 115 225 945
SS df ms
Between ____ ___ ____ Within ____ ___ ____ Total ____ ___ ____
Item 4
Which is toothpaste lasts more days?
Brand-X ToothE HappyUp
319 679 538 768 259
SS df ms
Between ____ ___ ____ Within ____ ___ ____ Total ____ ___ ____
– 200 –
Item 5
Which shoe has the most stripes?
Brand-X MyTooth HappyUp
832 247 841 872 222
DAY 9: 1-Way ANOVA
SS
Between ____ Within ____ Total ____
F=
What is the critical value for F = Is F significant?
Item 6
Which region sells the most novels?
df ms
___ ____
___ ____
___ ____
East West South
1 11 8 472 428 455 333
SS df ms
Between ____ Within ____ Total ____
F=
What is the critical value for F = Is F significant?
___ ____ ___ ____ ___ ____
DAY 9: 1-Way ANOVA
Simulations Item 7
You work for a bottled-water company which is concerned with the quality of its flavored product. The numbers below are the number of complaints received in a 1-hour period.
Lemon Orange Lime Root Beer
12 1 18 1 13 4 9 2 11 5 12 1 14 3 7 9
Since you are interested in which flavor did best, which of the following tests should you perform:
a.t-test
b.ANOVA c.correlation d.regression e.multiple regression
What is the independent variable in this design:
What is the dependent variables in this design:
Based on above data, complete the following summary table:
– 201 –
Between Within Total
SS df ms
____ ___ ____
____ ___ ____
____ ___ ____
F=
What is the critical value for F = Is F significant?
What should be done next?
– 202 – DAY 9: 1-Way ANOVA
Item 8
Several samples are taken of each flavor of cough syrup. Which flavor tastes the best (number of complaints):
Peach StrawberryMelon Lemonade
4241 3188 7773 5639 2 2 7 12
Since you are interested in which flavor did best, which of the following tests should you perform:
a.t-test
b.ANOVA c.correlation d.regression e.multiple regression
What is the independent variable in this design:
What is the dependent variables in this design:
Based on above data, complete the following summary table:
Between Within Total
SS df ms
____ ___ ____
____ ___ ____
____ ___ ____
F=
What is the critical value for F = Is F significant?
What should be done next?
DAY 9: 1-Way ANOVA
SUMMARY
To finish off our discussion of measurement, there is a review, quiz, progress check, and chapter answers.
Review
Although there are multiple groups, they vary on one independent variable.
The dependent variable is what the numbers measure. The numbers are dependent on the performance of the subjects.
The independent variable is what the experimenter manipulates. It is independent of the sub- jects performance.
The F is a ratio of the variance between the groups to the variation within the groups. The F test assumes that the variation within a group is due to ability and chance.
The F test assumes that the variation between groups is due to ability, chance, and manipula- tion of the independent variable.
The F test assumes that variance due to ability and chance (between and with subjects) will cancel each other out, so that what remains is a measurement of the effect of the independent variable on the dependent variable.
Mean Squares is a variance term.
Mean Squares equals Sum of Squares divided by its appropriate degrees of freedom (SS/df)
Between-Ss degrees of freedom equals the number of groups minus 1 (k-1).
Within-Ss degrees of freedom equals the total number of people minus the number of groups (N-k).
Total degrees of freedom equals the total number of people minus 1 (N-1). ANOVA stands for ANalysis Of VAriance.
– 203 –
– 204 –
1. When more than two levels of one independent variable are to be compared, which of the follow- ing should be used:
a. regression
b. one-wayANOVA
c. t-test
d. factorial ANOVA
e. all of the above
2. The advantage of a repeated measures design is that it reduces the contribution of error variability due to:
a. meanofD
b. degrees of freedom
c. the effect of the treatment d. individualdifferences
e. none of the above
3. The F test is best understood as a: a. ratio
b. confound
c. standard deviation d. decision error
e. predictionerror
4. The total degrees of freedom for a 1-way ANOVA equal:
a. k
b. k-1 c. N d. N-1 e. N-k
5. The df for SSbetween equals: a. k
b. k-1 c. N d. N-1 e. N-k
Progress Check 9
6. How many independent variables are in a 1-way ANOVA:
a. 1
b. 3
c. 6
d. 9
e. varies
7. How many dependent variables are in a 1-way ANOVA:
a. 1
b. 3
c. 6
d. 9
e. varies
8. For an F-test, which of the following goes on top (numerator):
a. mstotal
b. mswithin c. msbetween d. SSbetween e. SSwithin
9. Which of the following should be used to com- pare three or more groups:
a. t b. F c. r d. r2 e. z
10. Which of the following is an estimate of error:
a. SStotal
b. SSbetween
c. SSwithin d. t
e. r
Progress Check 9
– 205 –
As a director of personnel, you are curious whether peer ratings are a good predictor of this supervi- sor ratings. The numbers below are a sample of the company’s data.
Supervisor Peer
62 32 24 41 52 52 63 68 67
What is the SS for Peer Ratings:
What is the variance of Peer Ratings: What is the range of Supervisor Ratings: What is the SS for Supervisor Ratings: What is the SSxy:
________
_______
_______
________
________
Since you are interested in how well one rating acts as a linear predictor of another, which of the following tests should you perform:
a.t-test
b.ANOVA c.correlation d.regression e.multiple regression
Perform the comparison you selected in the item above. Select only the appropriate one(s). What was the result of your calculations?
a= b= r= t= F=
If a peer rating was 11, what would you predict their supervisor rating to be?
– 206 –
Progress Check 9
As a teacher, you are interested in how reading and spelling are related. You have measured all of your students on each area and do not wish to generalize beyond your class. You hope to find how interrelated these two variables are. Below are the number correct on each test.
Reading Spelling
19 3 11 14
13 15
14 16
6 13 8 12 85
What is the sum of Reading: What is the SS of Spelling: The variance of Reading is: What is the mean of Spelling: What is the SS of Reading:
_______
_______
_______
_______
_______
Since you are interested in commonality, which of the following tests should you perform: a.t-test
b.ANOVA c.correlation d.regression e.multiple regression
Perform the comparison you selected in the item above. Select only the appropriate one(s). What was the result of your calculations?
a= b= r= t= F=
What is the critical value at .05 alpha: Is this test significant:
Progress Check 9
– 207 –
As a manager, you randomly assigned your staff to two locations. The numbers in the columns below are dollars. Which is the significantly better location?
New York Los Angeles
8 11 2.4 9 63 49 49
5 12
3 14 9.2 7.2
What is the median for New York: What is the mean for New York: What is the SS for New York: What is the mode for Los Angeles: What is the SS for Los Angeles:
__________
__________
__________
__________
__________
Since you are interested in which group did best, which of the following tests should you perform:
a.t-test
b.ANOVA c.correlation d.regression e.multiple regression
Perform the comparison you selected in the item above. Select only the appropriate one(s). What was the result of your calculations?
a= b= r= t= F=
Which location is best for sales?
– 208 –
Progress Check 9
You work for a cookie company which is testing a new product. Each group is composed of indepen- dent subjects, randomly assigned to treatment by you. The four groups differ in the amount of sugar in the cookies. The numbers below are the number of cookies eaten in a 2 hour period.
Sugarless LowSugarMedium High
7 6 8 16 3 6 8 12 5 4 5 11 3 3 5 15 4 3 3 11
Since you are interested in which cookie did best, which of the following tests should you perform:
a.t-test
b.ANOVA c.correlation d.regression e.multiple regression
How many independent variables are in this design: __________ How many dependent variables are in this design: _________ Based on above data, complete the following summary table:
SS df ms
Between ____
Within ____
TOTAL ____ ____
Calculate the F for this test: What is the critical value: Is the F significant?
What should be done next?
____ ____
____ ____
Progress Check 9
– 209 –
You wonder if the amount of salt has an significant impact on the rating of your ice cream’s quality. Use the following data to complete the summary table:
r= .44 SSx = 122 SSy = 40 N = 14
Between ____ ____ ____ Within ____ ____ ____ TOTAL ____ ____
What is the predictor in this study:
What is the F for this test: ___________ What is the critical value for F: ___________ Is F significant at .05 alpha: ___________
SS df ms
– 210 –
Answers
Practice Problems
Progress Check 9
Item 1
F = 3.95
Item 2
F = 4.60
Item 3
F = 1.83
Item 4
F = 8.10
Item 5
F = 1.43
Item 6
F = 1.20
Between Within Total
Between Within Total
Between Within Total
Between Within Total
Between Within Total
Between Within Total
SS df ms
150.53 2 75.27 228.80 12 19.07 379.33 14 27.10
SS df ms
116.13 2 58.07 151.60 12 12.63 267.73 14 19.12
SS df ms
30.78 2 15.19 126.33 15 8.42 157.11 17 9.24
SS df ms
56.13 2 28.07 41.60 12 3.47 97.73 14 6.98
SS df ms
20.80 2 10.40
87.20 12 7.27 108.00 14 7.71
SS df ms
16.53 2 8.27
88.80 12 7.40 105.33 14 7.52
The critical value is 3.88
Progress Check 9
– 211 –
Simulations Simulation 1
ANOVA
F = 9.67
Critical value = 3.49 Significant at .05 alpha Do t-tests next
Between Within Total
Simulatin 2
ANOVA
F = 1.07
Critical value = 3.24
Not significant at .05 alpha Do nothing more
Between Within Total
Multiple Choice
a, d, a, d, b, a, a, c, b, c
Progress Check Item 1
SS df
308.25 3 127.50 12 435.75 15
SS df
28.95 3 144 16 172.95 19
ms
102.75 10.63 29.05
ms
9.65 9 9.10
Regression
Peer (X) predicts Supervisor (Y)
What is the SS for Peer Ratings:
What is the variance of Peer Ratings:
What is the range of Supervisor Ratings:
What is the SS for Supervisor Ratings:
What is the SSxy:
a = 4.07
b= .205
If a peer rating was 11, what would you predict their supervisor rating to be: 6.33
48.22 6.03
4 17.56
9.89
– 212 –
Item 2
correlation
What is the sum of Reading: 79
What is the Sum of Squares of Reading:119.43 What is the variance of Reading: 17.06 (population) What is the mean of Spelling: 11.14
What is the SS of Spelling 84.86
r = – .27
What is the critical value for r: .755
Is this test significant: No, not at .05 alpha (5 df)
Item 3
t-test
What is the median for New York: 4.50 What is the mean for New York: 5.20 What is the SS for New York: 40.08 What is the mode for Los Angeles: 9 What is the SS for Los Angeles: 70.64 t = 2.82
Critical value = 2.15 at 14 df; sig. at .05 alpha level Which location is best for sales? Los Angeles
Item 4
1-Way ANOVA
How many independent variables are in this design: 1 How many dependent variables are in this design: 1
SS df ms
Progress Check 9
Between 254.60 Within 61.20 TOTAL 315.80
3 84.87 16 3.83 19
Calculate the F for this test: 22.19 What is the critical value: 3.24 Is the F significant? Yes
What should be done next? t-tests
Item 5
Analysis of regression
Between Within TOTAL
7.74 1 32.26 12 40 13
7.74 2.69
Salt 2.88
4.75 No
SS df ms
What is the predictor in this study: What is the F for this test:
What is the critical value for F:
Is F significant at .05 alpha:
BRIEFLY
Complex models build on the principles we already discussed. Although their calculation is beyond the scope of this discussion (that’s what computers are for), here is an introduction to procedures that use multiple predictors, multiple criteria and multivariate techniques to test interactions between model components.
There are four types of complex models I’d like to overview:
Factorial ANOVA Multiple regression Multivariate analysis Causal modeling
– 213 –
Day 10: Advanced Designs
Testing for interactions
– 214 –
INTRODUCTION
Factorial ANOVA
Until now, our models have been quite simple. One individual, one group, or one variable predicting another. We have explored the levels of measurement, the importance of theories and how to convert theoretical constructs into model variables. We have taken a single vari- able, plotted its frequency distribution and described its central tendency and dispersion. We have used percentiles and z-scores to describe the location of an individual score in relation to the group.
In addition to single variable models, we studied two variable models, such as correlations, regressions, t-tests and one-way ANOVAs. We have laid a thorough foundation of research methods, experimental design, and descriptive and inferential statistics.
Despite their simplicity, these procedures are very useful. You can use a correlation to measure the reliability and validity of a test, machine or system of management, training or production. You can use a linear regression to time data a rare archaeological find, predict the winner of a race or analyze a trend in the stock market. You can use the t-test to test a new drug against a placebo or compare 2 training conditions. You can use the 1-way ANOVA to test several psychotherapies, compare levels of a drug or brands of computers.
Also, the procedures you’ve studied so far can be combined into more complex models. The most complex models have more variables but they are variations of the themes you’ve already encountered.
A factorial AVOVA is like combining 1-way ANOVAs together. The purpose of combining the designs is to test for interactions. A 1-way ANOVA can test to see if different levels of salt will influence compliments but what happens if the soft drink is both salty and sweet?
Factorial designs
A factorial ANOVA tests the impact of 2 or more independent variables on one dependent variable. It tests the influence of many discrete variables on one continuous variable. It has multiple independent variables and one dependent variable.
A 1-way ANOVA model tests multiple levels of 1 independent variable. Let’s assume the question is if stress causes people to work multiplication problems. Subjects are randomly assigned to a treatment level (high, medium and low, for example) of one independent variable (stress, for example). And their performance on one dependent variable (number of errors) is measured. If stress impacts perfor- mance, you would expect errors to increase with the level of attention. The variation between the cells is due to the treatment given you. Variation within
each cell is thought to be due to random chance.
A 2-way ANOVA has 2 independent variables. Here is a design which could look at gender
(male; female) and stress (low, medium and high):
It is called a 2 x 3 (“two by three”) factorial design. If each cell contained 10 subjects, there would be 60 subjects in the design.
A design for the amount of student debt (low, medium and high) and year in college (frosh, soph, junior and senior) and ) would have 1 independent variable (debt) with 3 levels and 1 indepen-
DAY 10: Advanced Designs
dent (year in school) with 4 levels. This is a 3×4 factorial design.
DAY 10: Advanced Designs
– 215 –
Notice that the number (3, 4, etc.) tells how many levels in the indepen- dent variable. The number of numbers tells you how many independent vari- ables there are. A 2×4 has 2 independent variable. A 3×7 has 2 independent variables (one with 3 levels and one with 7 levels). A 2x3x4 factorial design has 3 independent variables.
Factorial designs can do something 1-way ANOVAs can’t. Factorial designs can test the interaction between independent variables. Taking pills can be dangerous and driving can be dangerous; but it often is the interaction between variables that interests us the most.
Analyzing a 3×4 factorial design involves 3 steps: columns, rows and cells. The factorial ANOVA tests the columns of the design as if each column was a different group. Like a 1-way ANOVA, this main effect tests the columns as if the rows didn’t exist.
The second main effect (rows) is tested as if each row was a different group. It tests the rows as if the columns didn’t exist. Notice that each main effect is like doing a separate 1-way ANOVA on that variable.
The cells also are tested to see if one cell is significantly larger (or smaller) than the others. This is a test of the interaction and checks to see if a single cell is significantly different from the rest. If one cell is signifi- cantly higher or lower than the rest, then it is the result of a combination of the independent variables.
Multiple Regression
An extension of simple linear regression, multiple regression is based on observed data. In the case of multiple regression, two or more predictors are used; there are multiple predictors and a single criterion.
Let’s assume that you have selected 3 continuous variables as predictors and 1 continuous variable as criterion. You might want to know if gender, stress and time of day impact typing performance.
Each predictor is tested against the criterion separately. If a single predictor appears to be primarily responsible for changes in the criterion, its influence is measured. Every combina- tion of predictors is also used tested. So both main effects and interactions can be tested. If this sounds like a factorial ANOVA, you’re absolutely correct.
You could think of Multiple Regression and ANOVA as siblings. factorial ANOVAs use discrete variables; Multiple Regression uses continuous variables. If you were interested in using income as one of your predictors (independent variables), you could use discrete catego- ries of income (high, medium and low) and test for significance with an ANOVA. If you wanted to use measure income on a continuous variable (actual income earned), the procedure would be a Multiple Regression.
You also could think of Multiple Regression and the parent of ANOVA. Analysis of Vari- ance is actually a specific example of Multiple Regression; it is the discrete variable version. Analysis of Variance uses categorical predictors. Multiple Regression can use continuous or discrete predictors (in any combination); it is not restricted to discrete predictors.
– 216 –
DAY 10: Advanced Designs
Both factorial ANOVA and Multiple Regression produce a F statistic and both have only one outcome measure. Both produce a F score that is compared to the Critical Values of F table. Significance is ascribed if the calculated value is large than the standard given in the table.
Both procedures have only one outcome measure. There may be many predictors in a study but there is only one criterion. You may select horse weight, jockey height, track condition, past winning and phase of the moon as predictors of a horse race but only one outcome mea- sure is used. Factorial ANOVA and Multiple Regression are multiple predictor-single crite- rion procedures.
Multivariate Analysis
Sometime called MANOVA (pronounced man-o-va), multivariate analysis is actually an extension of multiple regression. Like multiple regression, multivariate analysis has multiple predictors. In addition to multiple predictors, multivariate analysis allows multiple outcome measures.
Now it is possible to use gender, income and education as predictors of happiness AND health. You are no longer restricted to only a single crite- rion. With multivariate analysis, the effects and interactions of multiple predictors can be examined. And their impact on multiple outcomes can be
assessed.
The analysis of a complex multiple-predictor multiple-criteria model is
best left to a computer but the underlying process is the calculation of correlations and linear regressions. As variables are selected for the model, a decision is made whether it is predictor or a criterion. Obviously, aside from the experimenter’s theory, the choice of predictor or criterion is arbitrary. In multivariate analysis, a variable such as annual income could be either a predictor or a criterion.
Complex Modeling
There are a number of statistical procedures at the high end of modeling. Relax! You don’t have to calculate them. I just want you to know about them.
In particular, I want to make the point that there is nothing scary about the complex mod- els. There are involved and require lots of tedious calculations but that’s why God gave us computers. Since we are blessed to have stupid but remarkably fast mechanical slaves, we should let them do the number crunching.
It is enough for us to know that a complex model—at its heart—is a big bundle of correla- tions and regressions. Complex models hypothesize directional and nondirectional relation- ships between variables. Each factor may be measured by multiple measures. Intelligence might be defined as the combination of 3 different intelligence tests, for example.
And income might be a combination of both salary plus benefits minus vacation. And education might be years in school, number of books read and number
of library books checked out. The model, then, becomes the interaction
of factors that are more abstract than single variable measures.
Underlying the process, however, are principles and procedures you already know. Complex models might try to determine if one more predictor helps or hurts but the model is evaluated just like a correla- tion: percentage of variance accounted for by the relationships.
DAY 10: Advanced Designs
UNDERSTAND
The story is in the details. And different statistical procedures show different amounts of detail. So choose the statistical procedure that shows the amount of detail you want to see.
Illustration 1: A visual image should not be sketchy.:
It should have enough detail to convey meaning:
Even if it doesn’t look like reality:
Illustration 2: A case study should not be sketchy. It should describe the behavior and symptoms of the client and provide a thorough family history.
REMEMBER
Basic Facts
Review the Basic Facts test. You should now be able to recall and understand everything on it.
Formulas
Review the formulas. You don’t have to be able to recall the formulas from memory but y.ou should now be able to choose the correct formula for each procedure.
Terms
Review the terms. You don’t have to be able to list them from memory but y.ou should now be able to define and distinguish between each of them.
– 217 –
– 218 –
DO
SUMMARY
We began the course looking at a single variable. You discovered its central tendency (mean, median and mode) and its dispersion (range, MAD, SS, variance and standard deviation). You learned how to compare your score to the group by using percentiles and areas under the curve. With a z-score, you learned how many standard deviations a score is from its mean. You encountered the normal curve, bimodal distributions, and positively- and negatively-skewed distributions.
With correlation and regression, the single variable model was expanded to 2 dependent vari- ables. You observed the pattern of a scatterplot, learned to estimate the strength of a correla- tion, and made predictions (extrapolations and interpolations) based on their pattern of rela- tionship. You also tested the significance of the 2-DV model with an Analysis of Regression.
The t-test was also a 2-variable model: 1 independent variable and 1 dependent variable. The independent variable was split into 2 conditions (treatment and control) and subjects were randomly assigned to each. Building on the t-test, you learned to extend the 2-variable model by splitting the IV into more parts for a 1-Way ANOVA.
In advanced models, the basic principles remain the same but more variables are added. In a 2- Way ANOVA, 2 independent variables are used to predict the performance of 1 dependent variable. Factorial ANOVAs get more complex (adding more and more IVs) but each have only 1 dependent variable.
Multiple regression uses continuous variables as predictors and mirrors the designs of facto- rial ANOVAs. In fact, factorial ANOVAs are simply a subset of multiple regression (MR) procedures. ANOVA use discrete (high, medium, low) predictors and MR can use discrete or continuous predicators (or any combination of them).
In multivariate analysis (often called MANOVA), multiple predictors (like those in factorial ANOVA and in MR) are used to predict multiple outcomes.
Causal modeling (and all the other super-complex designs) use several measures of each fac- tor and then the factors are treated like predictors and criteria. They are giant MANOVAs, where each factor is composed of 2 or 3 (or more) different measures.
DAY 10: Advanced Designs
P
Take the practice final at the end of this chapter. Don’t memorize the wording of the ques- tions but focus on the process. You should now be able to choose the proper procedure for a given situation, be able to accurately calculate the statistics and be able to interpret the results.
DAY 10: Advanced Designs
PROGRESS CHECK
1.Which of the following shows the interaction of 2 independent variables:
a. regression
b. one-wayANOVA c. t-test
d. factorial ANOVA e. all of the above
2. In a 2x3x4 factorial design, how many indepen- dent variables are there:
a. 1 b. 3 c. 6 d. 9 e. 24
3. How many dependent variables are in a 3×3 fac- torial design:
a. 1 b. 2 c. 3 d. 6 e. 9
4. In a factorial ANOVA, degrees of freedom for SStotal equals:
a. N
b. N minus 1 c. k
d. k minus 1 e. N minus k
5. Which would use multiple measures of a factor (such as different intelligence tests) as a criterion:
a. causal modeling
b. 1-wayANOVA
c. Analysis of Regression d. 2-wayANOVA
e. univariateanalysis
6. The type of relationships between model components is determined by our:
a. theoretical questions b. empiricalanalysis
c. sampling error
d. statistical bias
e. control groups
– 219 –
7. In an independent measures t-test with n = 10 and N=20, how many degrees of freedom are in the study:
a. 10 b. 11 c. 18 d. 19 e. 20
8. When a frequency distribution is positively skewed, the mean is:
a. higher than the median b. lower than the median c. same as the median
d. same as the mode
e. same as the mean deviation
9. If a manager randomly assigns phone calls to all 10 customer service representatives, which of the following should be used to best for signifi- cant differences in the speed of call handling:
a. multiple regression b. regression
c. correlation
d. t-test
e. 1-wayANOVA
10. Which of the following represents the height of a frequency distribution:
a. mean
b. median
c. mode
d. standard deviation
e. frequencydistribution
11. If a 1-way ANOVA is performed on 4 person- ality types, how many degrees of freedom are associated with the Between Sum of Squares:
a. 1
b. 3
c. 4
d. 14
e. can’t tell without knowing N
– 220 –
12. In a 2×3 factorial design (Gender and Wing Span of butterflies), a significant interaction would indicate that there is a significant differ- ence between:
a. male and female butterflies
b. small- and large-winged butterflies
c. medium- and large-winged butterflies d. small- and medium-sized butterflies e. a single cell and the rest of the cells
13. Which is most affected by outlying scores: a. mean
b. median
c. mode
d. B and C
e. A, B and C are equally affected
14. Rejecting the null when you should have accepteditis:
a. TypeIerror
b. TypeIIerror
c. TypeIIIerror
d. Typoerror
e. NotMyTypeerror
15. How many dependent variables are in a 2x 3×3 factorial design:
a. 1 b. 3 c. 6 d. 9 e. 18
16. A test which described a restricted range of people will probably result in a distribution which is:
a. linear
b. curvalinear
c. skewed
d. normal
e. criterion-referenced
17. The more samples taken, the more normal the curve looks, according to the:
a. critical-limit hypothesis
b. null-limit hypothesis
c. maximum-limit theorem
d. unbiased-limittheorem
e. central-limittheorem
DAY 10: Advanced Designs
18. Which correlation coefficient shows the greatest amount of relationship:
a. -.23 b. .45 c. .56 d. .71 e. -.89
19. A nondirectional test of significance is said to be:
a. no-tailed b. one-tailed c. two-tailed d. three-tailed e. four-tailed
20. Which of the following is defined as a cumulative frequency divided by N (times 100):
a. frequencydistribution b. percentile
c. sample
d. percent
e. proportion
21. From one standard deviation above the mean and to one standard deviation below the mean accounts for what percent of scores:
a. 17% b. 25% c. 34% d. 68% e. 95%
22. A correlation between a discrete and a con- tinuous variable is called:
a. phi
b. Pearson r
c. least squares criterion d. point biserial
e. confidencelevel
23. If 4 car colors are tested to see which lasts the longest in the sun, which of the following should be used:
a. correlation
b. regression
c. t-test
d. 1-wayANOVA e. factorialAVOVA
Progress Check 10
24. This distribution is best described as: a. normal
b. positivelyskewed c. negativelyskewed d. bimodal
e. strangelyfamiliar
25. A t-test compares: a. two medians
b. two modes
c. two means
d. two variables
e. two standard deviations
26. To compare an individual to a group, use a: a. t-test
b. z-score
c. correlation
d. regression
e. 1-wayANOVA
27. Scores projected between data points are: a. interpolated
b. extrapolated
c. innovated
d. interpreted
e. insubstantiated
28. In an actual study, annual income could be: a. a predictor
b. a criterion
c. an intervening variable d. a modifier variable
e. all of the above
29. In the simplest case, the probability of A and B occurring is calculated by:
a. adding the probabilities
b. subtracting the probabilities c. multiplying the probabilities d. dividing the probabilities
e. bothAandC
30. When the null hypothesis is true, the ex- pected value for an independent measures t statistic is:
a. 0
b. +1.0
c. -1.0
d. +1or-1
e. more than +1
– 221 –
31. How many df to test a Pearson r : a. N
b. N-1 c. N-2 d. N-k e. N-r
32. Which could be a point-biserial correlation: a. -1.2
b. -.87 c. -3.0 d. 4.27 e. 11.90
33. Which shows a correlation’s strength: a. magnitude
b. stanine c. stature d. skew e. sign
34. Which is the sum of the squared deviations: a. mean
b. mean variance
c. sum of squares
d. standard deviation e. variance
35. Which should be used to measure reliability: a. multiple regression
b. regression
c. correlation
d. t-test
e. ANOVA
36. A variable whose levels are described as “high,” “medium” and “low” is:
a. reliable
b. discrete
c. continuous d. valid
e. logarithmic
37. A z-score of -1.5 on a test with a mean of 100 and a standard deviation of 10 would equal a score of:
a. 85
b. 100 c. 101.5 d. 115 e. 120
– 222 –
38. T-tests are calculated like: a. point estimations
b. sum of squares
c. degrees of freedom
d. z-scores
e. confidencelevels
39. Which of the following is an example of a grouped frequency distribution:
a. datamatrix b. correlation c. regression d. histogram e. t-test
40. Which do we select, manipulate or induce: a. independentvariable
b. dependent variable
c. moderator variable
d. suppressor variable e. interveningvariable
41. Which should be used to make predictions: a. z-score
b. correlation
c. regression
d. correlated t-test e. independent t-test
42. Which is the coefficient of determination: a. t
b. t2
c. r
d. r2 e. 1-r2
43. The number on your race car is: a. nominal
b. ordinal c. interval d. ratio
e. reliable
44. A repeated measures design reduces error variability due to:
a. meanofD
b. degrees of freedom
c. the effect of the treatment d. individualdifferences
e. none of the above
Progress Check 10
45. To test the interaction of diet and exercise on the maze running behavior of rats, a researcher could use a:
a. t-test
b. correlation
c. regression
d. 1-WayANOVA e. 2x3ANOVA
46.Whichshowsinteractionof2independentvari- ables:
a. regression
b. 1-WayANOVA c. t-test
d. factorialANOVA e. all of the above
47. To compare 2 dependent variables a re- searcher would use a:
a. t-test
b. z-score
c. correlation
d. frequencydistribution e. 1-wayANOVA
48. To predict future success in psychology from current GPA, a researcher would use a:
a. t-test
b. correlation
c. regression
d. 1-wayANOVA e. 2x3ANOVA
49. Which of the following cannot be an appro- priate Sum of Squares of Memory:
a. 21.5 b. 1.27 c. 9.00 d. 129.2 e. -42
50. What percentage of scores are beyond a z of +1.
a. 14% b. 16% c. 34% d. 48% e. 67%
Progress Check 10
– 223 –
As a pigeon seller, you are interested how weight affects sale price. Here is a sample of information from the last few months:
Weight Price 2 12 4 10 78 9 10
13 5
What is the SS for Weight:
________ What is the variance of Weight: _______
What is the range of Weight: What is the SS for Price: What is the SSxy:
________
________
________
Since you are interested in how well one rating acts as a linear predictor of another, which of the following tests should you perform:
a. t-test
b. ANOVA
c. correlation
d. regression
e. multiple regression
Perform the comparison you selected in the item above. Select only the appropriate one(s). What was the result of your calculations?
a= b= r= t= F=
If weight equals 3, what would you predict price to be?
– 224 –
Progress Check 10
As a biologist, you are interested in the ability of pigeons to run and fly. You have measured their performance on each task, and now hope to find how related these two variables are. The numbers below represent the number of feet per activity.
Walk Fly
15 11 89 47
11 10 23
What is the sum of Walk:__________ What is the mean of Walk: ________ What is the SS for Walk:__________
Since you are interested in communality, which of the following tests should you perform: a.multiple regression
b.regression c.correlation d.t-test e.ANOVA
Perform the comparison you selected in the item above. What was the result of your calcula- tion (select only the appropriate ones):
a= b= r= t= F=
How many degrees of freedom in this study?
What is the critical value for this statistic?
Is there a significant relationship between these variables at the .05 alpha level? What percentage of variance is shared by the two variables?
Calculate the coefficient of non-determination: ________
Progress Check 10
– 225 –
As a chef, you want to find the best way to serve pigeons. You catch twelve of them and randomly assign them to fried and steamed. That evening you serve both in your cafe for 6 hours; which version sells best?
Fried Steamed
58
64 13 6 73 94
What is the median for Fried:
What is the mean for Fried:
What is the SS for Fried:
What is the median for Steamed:__________ What is the mean for Steamed: __________
Since you are interested in which group did significantly better, which of the following tests should you perform:
a.t-test
b.ANOVA c.correlation d.regression e.multiple regression
Perform the comparison you selected in the item above. Select only the appropriate one(s). What was the result of your calculation?
a= b= r= t= F=
What is the critical vaule?
Is there a significant difference between the two groups?
__________
__________
__________
– 226 –
Progress Check 10
As a regional manager, you’re interested in finding the fastest way to deliver sales info to the home office. After randomly assigning messages to 3 methods of communication, you measure the number of hours it takes for the data to be delivered. Here’s what you found:
Walking Pigeon Email
13 9 5 13 5 7 15 5 4 11 1 3
922 583
Since you are interested in which method did best, which of the following tests should you perform:
a.t-test
b.ANOVA c.correlation d.regression e.multiple regression
How many independent variables are in this design:
What is the dependent variables are in this design:
Based on above data, complete the following summary table:
Between Within TOTAL
What is the F for this test: What is the critical value: Is the F significant:
What should be done next:
SS df ms ____ ____ ____
____ ____ ____ ____ ____
__________
_________
Progress Check 10
– 227 –
You wonder if the amount of peanut butter has an significant impact on the rating of your new cookie’s popularity. Use the following data to complete the summary table:
r= .55 SSx = 60 SSy = 50 N = 14
Between Within TOTAL
SS df ms
____ ____ ____
____ ____ ____
____ ____
What is the predictor in this study:
What is the criterion in this study:
What is the F for this test: ___________ What is the critical value for F: ___________ Is F significant at .05 alpha: ___________ What is your conclusion:
– 228 –
Answers Quiz
d, b, a, b, a, a, d, b, c, a
Progress Check
b, a, e, a, c, a, e, c, e, a b, b, e, a, a, a, c, e, e, c b, d, d, d, b, c, b, a, e, c
Progress Check 10
a, c, b, a, c, a, c, e, d, a,
Item 1
Regression What is the What is the What is the What is the What is the a = 12.78
c, b, a, d, d d, d, c, c, e
SS for Weight: 74 variance of Weight: 14.8
range of Weight: SS for Price: SSxy:
11
28 – 40
b = – .54
If weight equals 3, what would you predict price to be? 11.16
Item 2
Correlation
What is the sum of Walk: 40
What is the mean of Walk: 8
What is the SS for Walk: 110
r = .92
How many degrees of freedom in this study? 8
What is the critical value for this statistic? .63
Is there a significant relationship between these variables at the .05 alpha level? Yes What percentage of variance is shared by the two variables? .85
Calculate the coefficient of non-determination: .15
Item 3
t-test
What is
What is
What is
What is
What is
What is
t = 1.79
Is there a significant difference between the two groups? No
the median for Fried: 7 the mean for Fried: 8 the SS for Fried: 40 the median for Steamed: 4 the mean for Steamed: 5 the critical value: 2.31
Progress Check 10
Item 4
1-Way Analysis of Variance
How many independent variables are in this design: What is the dependent variables are in this design:
– 229 –
hours SS df ms Between 172 2 86
Within 130 15 8.67
TOTAL 302 17 What is the F for this test: 9.92
What is the critical value: 3.68
Is the F significant? Yes
What should be done next: t-tests
Item 5
Analysis of Regression
Between Within TOTAL
SS df ms 15.13 1 15.13 38.88 12 2.91 50 13
What is the predictor in this study: Peanut butter What is the criterion in this study: Popularity What is the F for this test: 5.20
What is the critical value for F: 4.75
Is F significant at .05 alpha: Yes What is your conclusion:
The amount of peanut butter has a significant impact on the popularity of your cookies. The data fits the model of a straight line.
1 (communication)
– 230 –
Tables
Areas Under The Normal Curve
0.01 0.0040 0.4960 0.5040 1.10
0.02 0.0080 0.4920 0.5080
0.03 0.0120 0.4880 0.5120 1.15
0.04 0.0160 0.4840 0.5160
0.05 0.0199 0.4801 0.5199 1.20
0.06 0.0239 0.4761 0.5239
0.07 0.0279 0.4721 0.5279 1.25
0.08 0.0319 0.4681 0.5319
0.09 0.0359 0.4641 0.5359 1.30
0.10 0.0398 0.4602 0.5398
0.15 0.0596 0.4404 0.5596 1.35 0.20 0.0793 0.4207 0.5793
0.25 0.0987 0.4013 0.5987 1.40 0.30 0.1179 0.3821 0.6179
0.35 0.1368 0.3632 0.6368 1.45 0.40 0.1554 0.3446 0.6554
0.45 0.1736 0.3264 0.6736 1.50 0.50 0.1915 0.3085 0.6915
0.55 0.2088 0.2912 0.7088 1.55 0.60 0.2257 0.2743 0.7257
0.65 0.2422 0.2578 0.7422 1.60 0.70 0.2580 0.2420 0.7580
0.75 0.2734 0.2266 0.7734 1.65 0.80 0.2881 0.2119 0.7881
0.85 0.3023 0.1977 0.8023 1.70 0.90 0.3159 0.1841 0.8159
0.95 0.3289 0.1711 0.8289 1.75 1.00 0.3413 0.1587 0.8413
1.05 0.3531 0.1469 0.8531 1.80 1.85
Percentile z 0 0.0000 0.5000 0.5000
z Between Beyond
Between Beyond
0.3643 0.1357 0.3749 0.1251 0.3849 0.1151 0.3944 0.1056 0.4032 0.0968 0.4115 0.0885 0.4192 0.0808 0.4265 0.0735 0.4332 0.0668 0.4394 0.0606 0.4452 0.0548 0.4505 0.0495 0.4554 0.0446 0.4599 0.0401 0.4641 0.0359 0.4678 0.0322
Percentile
0.8643 0.8749 0.8849 0.8944 0.9032 0.9115 0.9192 0.9265 0.9332 0.9394 0.9452 0.9505 0.9554 0.9599 0.9641 0.9678
Critical Values of the Pearson r 2-tailed tests of r
Critical Values of Student’s t 2-tailed t-tests
df .05 alpha 1 12.71
2 4.30
3 3.18
4 2.78 5 2.57 6 2.45 7 2.37 8 2.31 9 2.26
10 2.23 11 2.20 12 2.18 13 2.16 14 2.15 15 2.13 16 2.12 17 2.11 18 2.10 19 2.09 20 2.09 21 2.08 22 2.07 23 2.07 24 2.06 25 2.06 26 2.06 27 2.05 28 2.05 29 2.05 30 2.04 40 2.02 60 2.00
120 1.98 infinity 1.96
df
1 2 3 4 5 6 7 8 9
10
11
12
13
14
15
16
17
18
19
20
25
30
35
40
45
50
60
70
80
90
100
.05 alpha 0.997 0.950 0.878 0.811 0.755 0.707 0.666 0.632 0.602 0.576 0.553 0.532 0.514 0.497 0.482 0.468 0.456 0.444 0.433 0.423 0.381 0.349 0.325 0.304 0.288 0.273 0.250 0.232 0.217 0.205 0.195
– 231 –
– 232 –
Critical Values of F .05 alpha
df 1 2 3
1 161 200 216
2 18.51 19.00 19.16
3 10.13 9.55 9.28
4 7.71 6.94 6.59
5 6.61 5.79 5.41
6 5.99 5.14 4.76
7 5.59 4.47 4.35
8 5.32 4.46 4.07
9 5.12 4.26 3.86
10 4.96 4.10 3.71
11 4.84 3.98 3.59
12 4.75 3.88 3.49
13 4.67 3.80 3.41
14 4.60 3.74 3.34
15 4.54 3.68 3.29
16 4.49 3.63 3.24
17 4.45 3.59 3.20
18 4.41 3.55 3.16
19 4.38 3.52 3.13
20 4.35 3.49 3.12.
4 5 225230
19.25 19.3 9.12 9.01 6.39 6.26 5.19 5.05 4.53 4.39 4.12 3.97 3.84 3.69 3.63 3.48 3.48 3.33 3.36 3.20 3.26 3.11 3.18 3.02 3.11 2.96 3.06 2.90 3.01 2.85 2.96 2.81 2.93 2.77 2.90 2.74 2.87 2.71
Basic Facts Test
50 things you should keep in your head
1. List three measures of central tendency: a. Mean
b.Median c. Mode
2. List five measures of dispersion: a.Sum of Squares
b.Variance c.Standard Deviation d.Mean Variance e.Range
3. Complete the following:
a.Theories are composed of:
b.Models are composed of: c. Laws:
d.Principles:
e.Beliefs:
4. List six criteria for evaluating theories:
a. Clear
b. Useful
c. Summarizefacts
d. Small number of assumptions
e. Internally consistent
f. Testable hypotheses
5. List four levels of measurement: a. Nominal
b. Ordinal c. Interval d. Ratio
Constructs
Variables
Accuracy beyond doubt Some predictability Personal opinion
6. List three types of correlation and the kind of variables with which they are used:
a. phi
b. Pearson r
c. point-biserial
2 discrete variables
2 continuous variables
1 continuous and 1 discrete variable
– 233 –
– 234 –
7. List two characteristics of a data matrix:
a. b.
8 List six a. b. c. d. e.
f.
Columns are attributes Rows are entities (subjects)
things associated with a linear regression:
Intercept
Slope
Interpolate
Extrapolate
Least squares criterion Standard error of estimate
9. List four types of variables: a. Independent
b. Dependent c. Modifier
d. Intervening
10. List and describe nine applications of the General Linear Model to continuous and discrete vari- ables:
Continuous Models compare:
a. causal modeling
b. multivariate analysis
c. multiple regression
d. regression
e. correlation
f. frequency distribution
Discrete Models Compare:
a. t-test
b. one-way ANOVA
c. factorial ANOVA
Multiple measures of a factor Multiple predictors; multiple criteria Multiple predictors; single criterion Single predictor; single criterion Two regressions
One variable (predictor or criterion)
2 means; 1 independent variable
3 or more means; 1 independent variable 2+ means on 2+ independent variables
Formulas
DESCRIPTIVE
N
S
mean
median
mode
range
mean variance Sum of Squares
SS=∑X2 −
variance = SS/df population
sample
standard deviation s (population) s (sample)
INFERENTIAL
z=X−X s
F= MeanSquares(Beteen) Mean Square (Within)
Mean Square = SS/df
SSw = SS1 + SS2 + SS3+…
(∑X)2 N
– 235 –
– 236 –
SS =∑X2 +∑X2 +∑X2 +…−
RELATIONAL
scatterplot
(∑X)2 N
t
123
∑ XY − (∑ X )(∑Y) N
r=
(∑X2− N )(∑Y2− N )
=
SSxSSy
REGRESSION
Formula for a line:
Y′ = a + bX a = intercept
b = regression coefficient; slope
b = SSxy SSx
Standard error of estimate
(∑X)2 (∑Y)2 SSxy
Final Version
Amazon