Day 9: Pre-analysis
One-Way ANOVA Is a pretest of variance before you do other analyses. When more than 2 groups are to be compared, multiple t-tests are conducted because of the increased likelihood of Type I error. Instead, before subgroup comparisons are made, the variance of the entire design is analyzed. This pre-analysis is called an Analysis of Variance (ANOVA for short). Using the F-test (like an Analysis of Regression), an ANOVA makes a ratio of variance between the subgroups (due to the manipulation of the experimenter) to variance within the subgroups (due to chance).
Essentially, a 1-Way ANOVA is an overgrown t-test. A t-test compares two means. A 1-Way ANOVA lets you test the differences between more than two means. Like a t-test, there is only one independent variable (hence the “1-way”). It is an ANOVA because it analyzes the variance in the scores. The acrostic ANOVA stands for ANalysis Of VAriance.
In general, you can design experiments where people are re-used (within-subjects designs) or used only once (between-subjects design). The difference is all about time.
Within-Subjects Designs
Sometimes we want to take repeated measures of the same people over time. These specialized studies are called within-subjects or repeated measures designs. Conceptually, they are extensions of the correlated t-test; the means are compared over time.
Like correlated t-tests, the advantages are that subjects act as their own controls, eliminating the difficulty of matching subjects on similar backgrounds, skills, experience, etc. Also, within-subject designs have more power (require less people to find a significant difference) and consequently are cheaper to run (assuming you’re paying your subjects).
They also suffer from the same disadvantages. There is no way of knowing if the effects of trial one wear off before the subjects get trial 2. The more trials in a study the larger the potential problem. In a multi-trial study, the treatment conditions could be impossibly confounded.
A more detailed investigation of within-subject designs is beyond the score of this discussion. For now, realize that it is possible, and sometimes desirable, to construct designs with repeated measures on the same subjects. But it is not a straight-forward proposition and requires more than an elementary understanding of statistics. So we’re going to focus on between-subjects designs.
Between-Subjects Designs
In a between-subject design, subjects are randomly assgined to groups. The groups vary along one independent variable. It doesn’t matter if you have 3 groups (high, medium and low) or ten groups or 100 groups…as long as they only vary on one dimension. Three types of cars is one independent variable (cars) with 3 groups. Ten types of ice cream can also be one independent variable: flavor.
Like an Analysis of Regression, an Analysis of Variance uses a F test. If F is equal to or larger than the value in the standard table, the F is considered significant, and the results are unlikely to be due to chance.
1-Way
It is called 1-way because there is one independent variable is this design. It is called an ANOVA because that’s an acrostic for ANalysis Of VAriance. An 1-way analysis of variance is a pre-test to prevent Type I error.
Although we try to control Type I error by setting our alpha level at a reasonable level of error (typically 5%) for one test, when we do several tests, we run into increased risk of seeing relationships that don’t exist. One t-test has a 5/100 chance of having Type I error. But multiple t-tests on the same data set destroy the careful controls we set in place.
We can use a t-test to compare the means of two groups. But to compare 3, 4 or more groups, we’d have to do too many t-tests; so many that we’d risk finding a significant t-test when none existed. If there were 4 groups (A, B, C and D, we’ll call them), to compare each condition to another you’d have to make the following t-tests:
The chances too good that we’ll find one of those tests to look significant but not be. What we need is a pre-analysis of data to test the overall design and then go back, if the overall variance is significant, and conduct the t-tests.
A Ratio
The premise of an ANOVA is to compare the amount of variance between the groups to the variance within the groups.
The variance within any given group is assumed to be due to chance (one subject had a good day, one was naturally better, one ran into a wall on the way out the door, etc.). There is no pattern to such variation; it is all determined by chance.
If no experimental conditions are imposed, it is assumed that the variance between the groups would also be due to chance. Since subjects are randomly assigned to the groups, there is no reason other than chance that one group would perform better than another.
After the independent variable is manipulated, the differences between the groups is due to chance and the independent variable. By dividing the between group variance by the within variance, the chance parts should cancel each other out. The result should be a measure of the impact the independent variable had on the dependent variable. At least that’s the theory behind the F test.
The F Test
Yes, this is the same F test we used doing an Analysis of Regression. And it has the same summary table.
Notice that the titles have changed. We know talk about Between Sum of Squares, not Regression SS. The F test (named after its author, R.A. Fischer) is the ratio of between-group variance (called between mean squares or mean squares between) to within-group variance (called within mean squares or mean squares within).
What To Do
After you calculate the F, you compare it to the critical value in a table of Critical Values of F. There are several pages of critical values to choose from because the shape of F distribution changes as the number of subjects in the study decreases. To find the right critical value, go across the degrees of freedom for regression (df between) and down the df within.
Simply compare the value you calculated for F to the one in the table. If your F is equal to or higher than the book, you win: what you see is significantly different from chance. The table we most often use is the .05 alpha level because our numbers aren’t very precise, so we’re willing to accept 5% error in our decisions. In other words, our alpha level is set .05 (the amount of error we are willing to accept). Setting the criterion at .05 alpha indicates that we want to be wrong no more than 5% of the time. Being wrong in this context means to see a significant relationship where none exists.
5% Error
Two points should be made: (a) 5% is a lot of error and (b) seeing things that don’t exist is not good. Five-percent of the population of the US is over 50 million people; that’s a lot of error. If elevators failed 5% of the time, no one would ride them. If OPEC trims production by 5%, they cut 1.5 million barrels a day. There are 230 million people use the internet, about 5% of the world’s population.
We use a relatively low standard of 5% because our numbers are fuzzy. Social science research is like watching a tennis game through blurry glasses which haven’t been washed in months. We have some understanding of what is going on-better than if we hadn’t attended the match-but no easy way to summarize the experience.
Second, seeing things that don’t exist is dangerous. In statistics, it is the equivalent of hallucination. We want to see the relationships that exist and not see additional ones that live only in our heads. Decisions which produce conclusions of relationship that don’t exist are called Type I errors.
If Type I error is statistical hallucination, Type II error is statistical blindness. It is NOT seeing relationships when they do exist. Not being able to see well is the pits (I can tell you from personal experience) but it’s not as bad as hallucinating. So we put most of our focus on limiting Type I error.
We pick an alpha level (how much Type I error we are willing to accept) and look up its respective critical value. If the F we calculate is smaller than the critical value, we assume the pattern we see is due to chance. And we continue to assume that it is caused by chance until it is so clear, so distinct, so accurate and it can’t be ignored. We only accept patterns that are significantly different from chance.
When the F we calculate is larger than the critical value, we are 95% sure that the pattern we see is not caused by chance. By setting the alpha level at .05, we have set the amount of Type I decision error at 5%.
Interpretation
If the F is significant, what do we do now?
Now, all of those t-tests we couldn’t do because we were afraid of Type I error are available for our calculating pleasure. So we do t-tests between:
AB
AC
BC
We might find that there is a significant difference between each group. Or we might find that there is a not a significant difference between two of the groups but that there is a significant difference between them and the third group.
Also, which did best depends on whether the numbers are of money (you want the higher means) or errors (you want to lower means). Doing the t-tests between each combination of means will tell us which ones are significant, and which are likely to be due to chance.
Just think, if the F had not been significant, there would not be anything left to do. We would have stopped with the calculating of F and concluded that the differences we see are due to chance. How boring, huh? It’s a lot more fun to do lots of t-tests. Where’s my calculator?
.
Advanced Procedures
Want to jump ahead?
- What is statistics?
- Ten Day Guided Tour
- How To Calculate Statistics
- Start At Square One
- Practice Items
- Resources
- Final Exam
Book
Statictics Safari
Photo by chris robert on Unsplash
Photo by Tsunami Green on Unsplash