Day 8: Testing 2 means
A t-test asks whether two means are significantly different. If the means, as representatives of two samples of the same variable, are equal or close to equal, the assumption is that the differences seen are due to chance. If the means are significantly different, the assumption is that the differences are due to the impact of an independent variable.
Independent t-tests are an extension of z scores. Instead of comparing a score to a mean, t-tests compare two means. Two means are thought to be from the same population until they are so different they are very unlikely to be the same. The question is how different can you be and still be the same.
Assume that the t you calculated was a person. If that score is close to the mean of the t distribution, it is not significant; there are too many scores hanging around the mean to make it special. But if your calculated score is at one extreme of the distribution, this would be usual (or in stats terms: “significant”), and the relationship between your score and the t distribution might look like this:
When subjects are randomly assigned to groups, the t-test is said to be independent. That is, it tests the impact of an independent variable on a dependent variable. The independent variable is dichotomous (yes/no; treatment/control; high/low) and the dependent variable is continuous. If significant, the independent t-test supports a strong inference of cause-effect.
When subjects are given both conditions (both means are measures of the same subjects at different times), the t-test is said to be dependent or correlated. Because it uses repeated measures, the correlated-t is often replaced by using a regression (where the assumptions of covariance are more clearly stated).
You know what it is to be independent. It means you are in control. In research, an independent test means that the experimenter is in control. Subjects get the treatment condition that the experimenter chooses. The choice is independent of what the subject does, thinks or feels.
One of the most common approaches is for experimenters to randomly assign subjects to treatment or control. Subjects don’t know, and don’t choose, which treatment they get.
Independent t-test
The independent t-test assumes that one group of subjects has been randomly assigned to 2 groups. Each group contains the same number of subjects, has its own mean and has its own standard deviation.
Conceptually, the t-test is an extension of the z-score. A z score compares the difference between a raw score and the mean of the group to the standard deviation of the group. The result is the number of standard deviations between the score and the group mean.
Similarly, a t-test compares the difference between 2 means to the standard deviation of the pooled variance. That is, one mean pretends to be a raw score and the other mean is the mean of the group. The difference between these means is divided by a standard deviation; it’s calculated a little funny but conceptually it’s equivalent to the standard deviation used in calculating a z score.
Like a z score, a t-test is evaluated by comparing the calculated value to a standard. In the case of a z score, the standard is the Area Under the Normal Curve. Similarly, a t-test compares its calculated value to a table of critical values. When N is large (infinity, for example), the values in the two tables are identical.
For example, in a one-tailed test at .02 alpha, the critical region would be the top 2% of the distribution. A more common standard is an alpha level of .05; here the critical region would be the top 5% of the distribution. The z-score would be the one where 5% was beyond the z and 45% was between the mean and z (there’s another 50% below the mean but the table doesn’t include them). The appropriate z-score for the location where there is 5% beyond is 1.65. In the Critical Values of Student’s t, the critical value at the bottom of the .05 alpha 1-tailed column is 1.65.
Similarly, in a two-tailed test at the .05 alpha, the critical region would be the bottom 2.5% and the top 2.5%. The z-score for the bottom 2.5% is -1.96 and the z-score for the top 2.5% is +1.96. In the Critical Values of Student’s t table, the critical value at the bottom of the .05 alpha 2-tailed column is 1.96.
When the t-test has an infinite number of subjects, its critical value is the same as a z-score. At infinity, t-tests could be evaluated by referring to the Areas Under the Normal Curve table. A t-test, however, usually has a small number of subjects. Consequently, the values are modified to adjust for the small sample size.
Significance
The t-test tells us if there is a significant difference between the means. It is as if two armies met and decided that each side would send a representative to battle it out. The representative would not be the best from each side but the average, typical member of their respective groups. Similarly, by comparing the means, we are comparing the representatives of two groups. The entire cast is not involved, only a representative from each side.
We typically do a two-tailed test. That is, we want to know if Group 2 is significantly better than Group 1 AND if it is significantly worse. We want to know both things, so we start at the mean and assume that in order to be significantly different from chance, the t statistic has to be at either of the 2 tails. At .05 alpha (the amount of Type I error we are willing to accept), the critical region is split into two parts, one at each tail. Although the overall alpha level is 5%, there is only 2.5% at each tail.
In one-tailed tests, the entire 5% in a .05 alpha test would be at one end. That is, we would only want to know if Group 2 was significantly better than Group 1; we wouldn’t care if it was worse. It doesn’t happen very often that our hypotheses are so finely honed that we are interested in only one end of the distribution. We general conduct a 2-tailed test of significance. Consequently, the t statistic might be positive or negative, depending on which mean was put first. There is no theoretical reason why one mean should be placed first in a two-tailed test, so apart from identifying which group did better, the sign of the t-test can be ignored.
Correlated t-test
Correlated t-tests are sometimes called repeated-measures or within-subjects designs.
Instead of randomly assigning subjects, some studies reuse people. The advantage is that each person acts as their own control group. Since no one is more like you than you, the control group can’t be more like the treatment group. The t-test for repeated measures designs is called a correlated t-test.
The second advantage is that a correlated t-test has more power (is able to use less people to conduct the study). An independent t-test has N-2 degrees of freedom. So if 20 people are randomly assigned to 2 groups, the study has 18 degrees of freedom. In a correlated t-test, if we use all 20 people, the study has 19 degrees of freedom.
The third advantage to correlated designs (also called within-subjects or repeated measures designs) is cost. Reusing people is cheaper. If subjects are paid to participate, they are paid for being in the study, regardless of how many trials it takes. Reusing people is also cheaper in time, materials and logistical effort. Once you have a willing subject, it’s hard to let them go.
The primary disadvantage of a correlated t-test is that it is impossible to tell if the effects of receiving one treatment will wear off before receiving the second condition. If people are testing 2 drugs, for example, will the first drug wear off before subjects are given the second drug?
A second problem with the pre- and post-test design often used with correlated t-tests is in its mathematical assumptions. Although the arguments are beyond the scope of this discussion, statisticians differ on the theoretical safety of using difference scores. Some worry that subtracting post-tests from pre-tests may add additional error to the process.
Consequently, a better way of testing correlated conditions is to use a correlation, a linear regression or an analysis of regression. Correlations test for relationship and can be used on ordinal and ratio data. Similarly, linear regression and analysis of regression make predictions and test for goodness of fit without relying on difference scores.
Using t-tests for hypothesis testing and estimation
Hypothesis testing is like venturing out onto a frozen lake. The primary hypothesis is that the lake is frozen but you proceed as if it weren’t. You’re cautious until you’re sure the ice is thick enough to hold you. The H0 is that the ice is not frozen; this is your null hypothesis (no difference from water). When you have tested the ice (jumping up and down on it or cutting a hole in it to measure the thickness of the ice), you then decide to accept the null hypothesis (no difference from water) or reject that hypothesis and accept the H1 hypothesis that the lake is frozen and significantly different from water.
We use t-tests to make confidence estimations. When t is significant, we are saying that we are confident that our findings are true 95% of the time (assuming the alpha level is set at .05). Our confidnece estimates are interval estimates of a distribution of t scores. A significant t says that its value falls in a restricted part of the distribution (upper 5%, for example).
Estimation is like getting your car fixed. If you go to a repair shop and they estimate the cost to repair you car is $300, that’s a point estimate. An interval estimate would be a range of numbers. If the shop says it will cost between $200-400, that’s an interval estimate
.
One-Way ANOVA
Want to jump ahead?
- What is statistics?
- Ten Day Guided Tour
- How To Calculate Statistics
- Start At Square One
- Practice Items
- Resources
- Final Exam