Statistics

March 30, 2023 by ktangen

Calc ANOR

F = msRegression ÷ msError

You need four things to calculate an ANOR (analalysis of regression): the XY correlation coefficient (Pearson r), the number of columns (k), the number of people in the study (N), and the SS of Y (assuming X is predicting Y). From these, you can calculate an F value which you compare to a value you look up in a table.

Here is a set of data to play with:

X Y
11          1
4          2
8          8
2 12
7 11
16 2

When we are done, we have filled in a summary table like this one:

SS df ms

Regression _____ ____ ____

Error _____ ____ ____

Total _____ ____ ____

There are six people in the study; 2 measures on the same people. The total degrees of freedom are N-1, so 6 minus 1 equals 5.

The df for regression is k-1 (columns minus 1. Two minus one equals 1.

The df for error is total minus regression, which is 4.

Here is where we are so far:

SS df ms

Regression ____ 1 ____

Error _____ 4 ____

Total ____ 5 ____

The SStotal equals the SS of Y. In this case it is 122.

The SSregression is calculated bu multiplying SSregression by the coefficient of determination (the square of Pearson’s r). The correlation between X and Y in this case is .56. Consequently, r-squared is .34, and the SSregression is 41.14.

The Sserror is the correlation coefficient multiplied by the coefficient of non-determinism (1 – r-squared). You check your work by subtracting the SSregression from the SStotal.

Here is where we are:

SS df ms

Regression 41.14 1 ____

Error 80.86 4 ____

Total 122.00 5 ____

To complete the summary table, divide each SS by its appropriate df.

SS df ms

Regression 41.14 1 41.14

Error 80.86 4 20.21

Total 122.00 5 25.20

The F statistic is the mean squares of Regression divided by the mean squares of Error. So F = 41.14 / 20.21. When divided through, you get: F = 2.04.

We test the significance of this F by comparing it to the critical value in the F Table. We enter the table by going across to the dfregression (1) and down the dferror (in this case it’s 4). So the critical value = 7.71. In order to be significant, the F we calculated would have to be larger than 7.71. Since it isn’t, the pattern we see is likely to be due to chance.

March 30, 2023 by ktangen

Calc One-Way ANOVA

F = msBetween ÷ msWithin

You need to complete a calculation table, a summary table and an F score that you compare pair to the critical value in an F table. And you need to do a lot of basic math steps. No one really calculates one-way ANOVAs by hand, except for illustrative purposes. This is a task computers are good at. But walk through the steps a couple of times and you’ll gain a real feel for how the statistic works.

One independent variable can have several levels, such as high, medium and low. In this example there are four levels, still only one independent variable. Eventually, we will complete a summary table and test for F. But first we’ve got some calculating to do.

Here is the general approach.

1. Summarize the subgroups

First, find the n (number of scores) for each group.

Second, find the sum for each group.

Third, square each number in the first group and sum them. Then square and sum the scores in each of the other groups.

Fourth, find the Sum of Squares (SS) for each group.

Fifth, find the totals for n, sum, squares, and SS. So, N = 20 (the sum of each group’s n’s). The SumX = 117 and so on.

Updating our example, it would look like this:

Group1     Group2     Group3     Group4
1             6           12            5
2             4             7            2
4             9           15            6
3             5             9            3
2           11             7            4

n              5            5             5             5           20
Sum        12          35           50           20          117
Squares 34        279         548           90        951
SS 5. 20        34           48           10           97.20

2. Find SSwithin

First, start by creating a summary table.

Second, write in the SSwithin. In the process of creating summarizing the groups, the SSwithin was calculated. It was 97.2. That is, SSwithin (within the experiment) is the sum of the SS that is in (within) each group.

Updating our summary table, it now looks like this:

SS df ms

Between ____ ___ ____

Within 97.20 ___ ____

Total ____ ___ ____

3. Find SSbetween

The formula for SS_between looks more difficult than it is. Here’s the formula:

First, start with the sum of each group (12, 35, 50, 20). Square each of them and add them together:

122 + 352 + 502 + 202

So we get: 144 + 1225 + 2500 + 400 = 4269.

Second, divide Step1 by the n (the number of subjects in each group. NOTE: It is not the number of groups but the number of scores in each group. This gives us: 4269 divided by 5 = 853.8.

Third, take the Sum of X in the totals column (117) and square it, which equals 13689.

Fourth, divide Step3 by the N in the totals column (20). That is, 13689 divided by 20 = 684.5

Fifth, subtract Step4 from Step2. So, 853.8 – 684.5 = 169.35. This is the SSbetween.

Updating our summary table, it now looks like this:

SS df ms

Between 169.35 ___ ____

Within 97.20 ___ ____

Total ____ ___ ____

4. Find SStotal

The formula for SStotal is the same as any basic SS.

First, note that the sum of X-squares = 951.

Second, take the sum of X’s (117) and square it. This equals 13689.

Third, divide Step2 by 20 (big N), which equals 684.5

Fourth, subtract Step3 from Step1. That is, 951 – 684.5 = 266.55. This is the SStotal.

Updating our summary table, it now looks like this:

SS df ms

Between 169.35 ___ ____

Within 97.20 ___ ____

Total 266.55 ___ ____

To check the calculations, simply add SSbetween to SSwithin and see if they equal SStotal. It does, so we calculated everything correctly.

5. Find the degrees of freedom

First, enter the degrees of freedom (df) for Between, which is k-1 (columns minus one). Since our example has 4 columns, the df for Between = 3.

Second, enter the df for Within, which is N-k (number of people minus the number of columns). In our example, N = 20, so dfwithin = 16.

Third, enter the df for Total, which is N-1 (number of people minus one). So, dftotal = 19.

SS df ms

Between 169.35 3 ____

Within 97.20 16 ____

Total 266.55 19 ____

6. Find F

First, calculate the appropriate mean squares. Since mean squares is another name for variance (and SS divided by df equals variance), divide each SS by its respective df. Updating the table, we now have:

SS df ms

Between 169.35 3 56.45

Within 97.20 16 6.08

Total 266.55 19 14.03

Second, divide the mean squares of Between by the mean squares of Within. That is, 56.45 divided by 6.08 = 9.29. This ratio is called the F test, so F = 9.29.

7. Find the critical value

The Critical Values of the F Distribution table is actually a series of distributions. To enter the table, go across to the row whose number matches the degrees of freedom for Between (dfbetween). And go down the dfwithin.

In our example, go across to 3 and down to 16. The critical value (the value you have to beat) = 3.24 (at the .05 alpha level).

8. Decide what to do next

If F is not significant, there is nothing else to do. The differences between the groups is due to chance.

If F is significant, than t-tests are done: one between each pair of combinations (AB, AC, AD, BC, BD and CD).

To test for significance, the calculated value is compared to the F table. If the value you calculated is bigger than the value in the book, F is significant. In our example, we calculated F to be 9.29, which is bigger than the critical value of 3.24 we found at 3 and 16 degrees of freedom. So, F is significant and the t-tests are authorized.

March 30, 2023 by ktangen

Calc Regression

Fff

Calc Correlation

You need three things to things to calculate Pearson’s r (correlation coefficient): sum of squares for variable X, sum of squares for variable Y, and a sum of squares you make yourself.

You already know how to calculate the SS of X and the SS of variable Y. They are just regular old sum of squares. If you need it, here’s a reminder: How Calculate Sum of Squares (SS).

For the sake of simplicity, we’ll restrict ourselves to the Pearson r, the most commonly used type of correlation. A correlation is a ratio between togetherness and individuality. To calculate the Pearson, three Sum of Squares are needed. The Pearson r is the ratio of SSxy to the squareroot of the product of SSx and SSy. Here is the formula:

For SSx, find the Sum of Squares of the X variable. Similarly, SSy is the simply the Sum of Squares of Y. The SSxy, however, is a bit different. First, we have to make a new variable: XY. To do so, we multiply each X by its respective Y. Now we have 3 columns: X, Y and XY. Second, sum the XYs. Third, use this formula:

Notice that this formula is a lot like the regular formula for Sum of Squares; it’s a variation on the theme. It’s the sum of the XYs but we don’t have to square them (they’re already big enough). And we don’t square the Sum of X; we multiple the Sum of X and the Sum of Y together. Fourth, finish off the formula and the result is the Pearson r.

EXAMPLE

We create a new variable by multiplying every X by its Y partner. So this:

X	Y
2	17
13	3
10	4
3	18
2	19
12	11

becomes this:

X	Y	XY
2	17	34
13	3	39
10	4	40
3	18	54
2	19	38
12	11	132

Then, we sum each column. The sum of X = 42, the sum of Y = 72, and the sum of XY is 337.

Calculate the SS for X (136) and the SS of Y (256). And calculate the SS of XY. Multiple the sum of X by the sum of Y (42 * 72 = 3024). Now divide the result by N (the number of pairs of scores = 6); 3024/6 = 504. Subtract the result from the Sum of XYs (337-504 = -167.

Notice the SSxy is negative. It’s OK. The SSxy can be negative. It is the only Sum of Squares that can be negative. The SSx or the SSy are measures of dispersion from the variable’s mean. But we created the XY variable; it’s not a real variable when it comes to dispersion. The sign of SSxy indicates the direction of the relationship between X and Y. So we have a negative SSxy because X and Y have an inverse relationship.

Look at the original data: when X is small (2), Y is large (17). When X is large (13), Y is small (3). It is a consistent but inverse relationship. It’s like pushing the yoke down and the plane going up.

Let’s finish off the calculation of the Pearson r. Multiple the SSx by the SSy (136 * 256 = 34816). Take the square root of that number (sqrt if 34816 = 186.59). Divide the SSxy (-167/186.59 = -.895). Rounding to 2 decimal places, the Pearson r for this data set equals -.90. It is a strong, negative correlation.

March 30, 2023 by ktangen

How To Calculate Central Tendency

Mean, Median & Mode

Calculate: Mean

To calculate the mean:

Sum (add) all of the scores in a variable
Divide that sum by the number of scores in the distribution.

You probably already know this formula. A mean is the average of scores. We like to use it becuase we don’t have to arrange the scores in any particular order before calculating it. We just add up all the numbers and divide by the number of scores. In statistical vocab we “sum” the numbers and divided the sum by N (the number of scores).

∑X/N

Calculate the mean of these numbers:

7
6
5
5
5
4
3

The sum of the variable called X is 35. That is

∑X = 35

N (number of scores) is 7.

The mean of these scores is calculated by dividing 35 by 7. So, the mean of these scores is 5. That is:

= 5

Calculate: Median

Finding the median in a distribution of integers is relatively easy. When there is an odd number of scores: it is the one left over when counting in from either end. When there are an even number of scores, the median is whatever the middle two scores are (if they are the same) or the halfway point between the middle-most two scores when they differ from each other.

Medians are most often used when distributions are skewed. Indeed, when data is presented in medians, ask about the means. If they are quite different, the distribution is highly skewed, and the sample may not be as representative as you would like.

A median requires that we put the numbers in order (from high to low, or low to high). The median is the score in the middle (if there are an odd number of scores) or the average of the two middle-most scores (if there are an even number of scores). That too much work, so we prefer the mean.

There is no easy formula for median. To calculate the median, arrange the scores in order of magnitude from high to low or from low to high (it doesn’t matter which one you choose). Select the score in the middle.

Take these number, and arrangement from high to low:

2
9
4
7
8

Here they are arranged in a distribution:

9
8
7
4
2

Find the score in the middle. In the following numbers, the median is 7:

9
8
7
4
2

Calculate: Mode

The mode is the most popular score (most common). If you plot a distribution, the mode will be the highest spot on the distribution. It will be the top of the mountain. If your mountain has more than one peak, the distribution will be bimodal (2 high spots) or multimodal (several high spots).

There are two ways to calculate this popularity.

First, the mode may be found by sorting the scores and selecting the one most frequently given.

The mode of this distribution is 5:

11
9
5
5
5
2

Second, and more practical in a distribution of many scores, the mode is the highest point on a frequency distribution. If a frequency distribution is accurately drawn, both approaches will yield the same result.

In this case, there is one person who scored 2. Three who scored 5. One who scored, 9. And one who scored 11. So the highest point of this graph (histogram) is the mode.

When we make a distribution, the scores are arranged from left to right, with the lowest scores on the left and the highest scores on the right. When everyone has the same score, the distribution is a straight horizontal line. When more than one person has the same score, the scores are stacked vertically. Consequently, a distribution where everyone had the same score would be represented by a straight vertical line.

March 30, 2023 by ktangen

Calc ANOR

F = msRegression ÷ msError

Calc One-Way ANOVA

F = msBetween ÷ msWithin

1. Summarize the subgroups

2. Find SSwithin

3. Find SSbetween

4. Find SStotal

5. Find the degrees of freedom

6. Find F

7. Find the critical value

8. Decide what to do next

Calc Regression

Calc Correlation

EXAMPLE

How To Calculate Central Tendency

Mean, Median & Mode

Calculate: Mean

Calculate: Median

Calculate: Mode

Calc Percent

Search

KenTangen.com

My Channel

Statistics

F = msRegression ÷ msError

F = msBetween ÷ msWithin

1. Summarize the subgroups

2. Find SSwithin

3. Find SSbetween

4. Find SStotal

5. Find the degrees of freedom

6. Find F

7. Find the critical value

8. Decide what to do next

EXAMPLE

Mean, Median & Mode

Calculate: Mean

Calculate: Median

Calculate: Mode

Footer

Search

KenTangen.com

My Channel