Day 5: Relationship
This is the first 2-variable model we’ll consider. Both variables (designated X and Y) are measures obtained from the same subjects. Basically, a mathematical representation of a scatterplot, a correlation indicates whether the variables move together in the same direction (+ correlation), move in opposite directions (- correlation) or move separately (0 correlation). Correlations are widely used to measure reliability, validity and commonality.
When one wing goes up, does the other wing to up, down or stay the same? In a strong positive correlation, when one wing goes up, the other usually goes up too. In a strong negative correlation, when one wing goes up, the other usually goes down. In a weak correlation, either positive or negative, when one wing goes up, the other wing does whatever it wants.
With correlations we are only observing, but we’re going to look at two variable and see how they are related to each other. When one variable changes, we want to know what happens to the other variable. In a perfect correlation, the two variable with move together. When there is no correlation, the variables will act independently of each other.
To use this simple and yet powerful method of description, we must collect two pieces of information on every person. These are paired observations. They can’t be separated. If we are measuring height and weight, it’s not fair to use one person’s height and another person’s weight. The data pairs must remain linked. That means that you can’t reorganize one variable (how highest to lowest, for example) without reorganizing the other variable. The pairs must stay together.
Sign & Magnitude
A correlation has both sign and magnitude. The sign (+ or -) tells you the direction of the relationship. If one variable is getting larger (2, 4, 5, 7, 9) and the other variable is headed in the same direction (2, 3, 6, 8, 11), the correlation’s sign is positive. In a negative correlation, while the first variable is getting larger (2, 4, 5, 7, 9), the second variable is getting smaller (11, 8, 6, 3, 2).
The magnitude of a correlation is found in the size of the number. Correlation coefficients can’t be bigger than 1. If someone says they found a correlation of 2.48, they did something wrong in the calculation. Since the sign can be positive or negative, a correlation must be between -1 and +1.
The closer the coefficient is to 1 (either + or -), the stronger the relationship. Weak correlations (such as .13 or -.08) are close to zero. Strong correlations (such as .78 or -.89) are close to 1. Consequently, a coefficient of -.92 is a very strong correlation. And +.25 indicates a fairly weak positive correlation.
Magnitude is how close the coefficient is to 1; sign is whether the relationship is positive (headed the same same) or negative (inverse).
Correlations don’t prove causation. A strong correlation is a necessary indicator of causation but it is not sufficient. When a cause-effect relationship exists, there will be a strong correlation between the variables. But a strong correlation does not mean that variable A causes variable B.
In correlations, A can cause B. Or, just as likely, B can cause A. Or, just as likely, something else (call it C) causes both A and D to occur.
For a simple example, let’s assume that we know nothing about science. But we do notice that when the sun comes up, it gets warm outside. From a statistical point of view, we can’t tell which causes which. Perhaps the sun coming up makes it get warm. But it is as likely that when it gets warm the sun comes up. Or the sun and warmth are caused by something else: a dragon (pulling the sun behind it) flies across the sky blowing it’s hot breath on the earth (making it warm).
You might laugh at this illustration but think how shocked you’d be if tomorrow it got warm and the sun didn’t come up!
It is, of course, perfectly OK to infer causation from correlational data. But we must remember that these inferences are not proofs; they are leaps of faith. Leaping is allowed but we must clearly indicate that it is an assumption, not a fact
Reliability & Validity
Although correlations can’t prove cause and effect, they are very useful for measuring reliability and validity. Reliability means that you get the same results every time you use a test. If you’re measuring the temperature of liquid and get a reading of 97-degrees, you would expect a reliable thermometer to yield the same result a few second later. If your thermometer gives different readings of the same source over a short period of time, it is unreliable and you would throw it away.
We expect many things in our lives to be reliable. When you flip on a light switch, you expect the light to come on. When you get on an elevator and push the “down” button, you don’t expect the elevator to go sideways. If you twice measure the length of a table, a reliable tape measure will yield the same result. Even if your measuring skill is poor, you expect the results to be close (not 36 inches and then 4 inches). You expect the same results every time.
Reliability, then, is the correlation between two observations of the same event. Test reliability is determined by giving the test once and then given the same test to the same people 2 weeks later. With this test-retest method, you would expect a high positive correlation between the first time the test was given and the second time.
A test with a test-retest reliability of .90 (which many intelligence tests have) is highly reliable. A correlation of .45 shows a moderate amount of reliability, and a coefficient close to zero indicates the test is unreliable. Obviously, a negative test-retest reliability coefficient would indicate something was wrong. People who got high scores the first time should be getting high scores the second time, if the test is reliable.
There are 3 basic types of reliability correlations. A test-retest coefficient is obtained by giving and re-giving the test. A “split half” correlation is found by correlating the total score for the first half with the total score for the second half for each subject. A parallel forms correlation shows the reliability of two tests with similar items.
Correlations also can be used to measure validity. Although a reliable test is good, it is possible to be reliably (consistently) wrong. Validity is the correlation between a test and an external criterion. If you create a test of musical ability, you expect that musicians will score high on the test and those judged by expects to be unmusical to score low on the test. The correlation between the test score and the expert’s rating is a measure of validity.
Validity is whether a test measures what it says it measures; reliability is whether a test is consistent. Clearly, reliability is necessary not sufficient for a test to be valid.
Significance
It is possible to test a correlation coefficient for significance. A significant correlation means the relationship is not likely to be due to chance. It doesn’t mean that X causes Y. It doesn’t mean that Y causes X; or that another variable causes both X and Y. Although a correlation cannot prove which causes what, r can be tested to see if it is likely to be due to chance.
First, determine the degrees of freedom for the study. The degrees of freedom (df) for a correlation are N-2. If there are 7 people (pairs of scores), the df = 5. If there are 14 people, df = 12.
Second, enter the statistical table “Critical Values of the Pearson r” with the appropriate df. Let’s assume there were 10 people in the study (10 pairs of scores). That would mean the degrees of freedom for this study equals 8.
Go down the df column to eight, and you’ll see that in order to be significant a Pearson r with this few of people the magnitude of the coefficient has to be .632 or larger.
Notice that the table ignores the sign of the correlation. A negative correlation of -.632 or larger (closer to -1) would also be significant.
Evaluate r-squared
A correlation can’t prove that A causes B; it could be that B causes A…or that C causes both A & B. The coefficient of determination is an indication of the amount of relationship between the two variables. It gives the percentage of variance that is accounted for by the relationship between the two variables.
To calculate the coefficient of determination, simply take the Pearson r and square it. So, .89 squared = .79. In this example, 79% of the variance can be explained by the relationship between the two variables. Using a Venn diagram, it is possible to see the relationship between the two variables. It is the area of overlap:
To calculate the amount of variance that is NOT explained by the relationship (called the coefficient of non-determination), subtract r-squared from 1. In our example, 1-r2 = .21. That is, 21% of the variance is unaccounted for..
.
Regression
Want to jump ahead?
- What is statistics?
- Ten Day Guided Tour
- How To Calculate Statistics
- Start At Square One
- Practice Items
- Resources
- Final Exam
Book
Statictics Safari
Photo by Boris Smokrovic on Unsplash