You need three things to things to calculate Pearson’s r (correlation coefficient): sum of squares for variable X, sum of squares for variable Y, and a sum of squares you make yourself.
You already know how to calculate the SS of X and the SS of variable Y. They are just regular old sum of squares. If you need it, here’s a reminder: How Calculate Sum of Squares (SS).
For the sake of simplicity, we’ll restrict ourselves to the Pearson r, the most commonly used type of correlation. A correlation is a ratio between togetherness and individuality. To calculate the Pearson, three Sum of Squares are needed. The Pearson r is the ratio of SSxy to the squareroot of the product of SSx and SSy. Here is the formula:
For SSx, find the Sum of Squares of the X variable. Similarly, SSy is the simply the Sum of Squares of Y. The SSxy, however, is a bit different. First, we have to make a new variable: XY. To do so, we multiply each X by its respective Y. Now we have 3 columns: X, Y and XY. Second, sum the XYs. Third, use this formula:
Notice that this formula is a lot like the regular formula for Sum of Squares; it’s a variation on the theme. It’s the sum of the XYs but we don’t have to square them (they’re already big enough). And we don’t square the Sum of X; we multiple the Sum of X and the Sum of Y together. Fourth, finish off the formula and the result is the Pearson r.
EXAMPLE
We create a new variable by multiplying every X by its Y partner. So this:
X | Y |
2 | 17 |
13 | 3 |
10 | 4 |
3 | 18 |
2 | 19 |
12 | 11 |
becomes this:
X | Y | XY |
2 | 17 | 34 |
13 | 3 | 39 |
10 | 4 | 40 |
3 | 18 | 54 |
2 | 19 | 38 |
12 | 11 | 132 |
Then, we sum each column. The sum of X = 42, the sum of Y = 72, and the sum of XY is 337.
Calculate the SS for X (136) and the SS of Y (256). And calculate the SS of XY. Multiple the sum of X by the sum of Y (42 * 72 = 3024). Now divide the result by N (the number of pairs of scores = 6); 3024/6 = 504. Subtract the result from the Sum of XYs (337-504 = -167.
Notice the SSxy is negative. It’s OK. The SSxy can be negative. It is the only Sum of Squares that can be negative. The SSx or the SSy are measures of dispersion from the variable’s mean. But we created the XY variable; it’s not a real variable when it comes to dispersion. The sign of SSxy indicates the direction of the relationship between X and Y. So we have a negative SSxy because X and Y have an inverse relationship.
Look at the original data: when X is small (2), Y is large (17). When X is large (13), Y is small (3). It is a consistent but inverse relationship. It’s like pushing the yoke down and the plane going up.
Let’s finish off the calculation of the Pearson r. Multiple the SSx by the SSy (136 * 256 = 34816). Take the square root of that number (sqrt if 34816 = 186.59). Divide the SSxy (-167/186.59 = -.895). Rounding to 2 decimal places, the Pearson r for this data set equals -.90. It is a strong, negative correlation.