ktangen

March 29, 2023 by ktangen

One-Way ANOVA

Day 9: Pre-analysis

One-Way ANOVA Is a pretest of variance before you do other analyses. When more than 2 groups are to be compared, multiple t-tests are conducted because of the increased likelihood of Type I error. Instead, before subgroup comparisons are made, the variance of the entire design is analyzed. This pre-analysis is called an Analysis of Variance (ANOVA for short). Using the F-test (like an Analysis of Regression), an ANOVA makes a ratio of variance between the subgroups (due to the manipulation of the experimenter) to variance within the subgroups (due to chance).

1-way ANOVA: Boats are like independent variable levels; same category different

Essentially, a 1-Way ANOVA is an overgrown t-test. A t-test compares two means. A 1-Way ANOVA lets you test the differences between more than two means. Like a t-test, there is only one independent variable (hence the “1-way”). It is an ANOVA because it analyzes the variance in the scores. The acrostic ANOVA stands for ANalysis Of VAriance.

In general, you can design experiments where people are re-used (within-subjects designs) or used only once (between-subjects design). The difference is all about time.

Within-Subjects Designs

Sometimes we want to take repeated measures of the same people over time. These specialized studies are called within-subjects or repeated measures designs. Conceptually, they are extensions of the correlated t-test; the means are compared over time.

Like correlated t-tests, the advantages are that subjects act as their own controls, eliminating the difficulty of matching subjects on similar backgrounds, skills, experience, etc. Also, within-subject designs have more power (require less people to find a significant difference) and consequently are cheaper to run (assuming you’re paying your subjects).

They also suffer from the same disadvantages. There is no way of knowing if the effects of trial one wear off before the subjects get trial 2. The more trials in a study the larger the potential problem. In a multi-trial study, the treatment conditions could be impossibly confounded.

A more detailed investigation of within-subject designs is beyond the score of this discussion. For now, realize that it is possible, and sometimes desirable, to construct designs with repeated measures on the same subjects. But it is not a straight-forward proposition and requires more than an elementary understanding of statistics. So we’re going to focus on between-subjects designs.

Between-Subjects Designs

In a between-subject design, subjects are randomly assgined to groups. The groups vary along one independent variable. It doesn’t matter if you have 3 groups (high, medium and low) or ten groups or 100 groups…as long as they only vary on one dimension. Three types of cars is one independent variable (cars) with 3 groups. Ten types of ice cream can also be one independent variable: flavor.

Like an Analysis of Regression, an Analysis of Variance uses a F test. If F is equal to or larger than the value in the standard table, the F is considered significant, and the results are unlikely to be due to chance.

1-Way

1-way ANOVA It is called 1-way because there is one independent variable is this design. It is called an ANOVA because that’s an acrostic for ANalysis Of VAriance. An 1-way analysis of variance is a pre-test to prevent Type I error.

Although we try to control Type I error by setting our alpha level at a reasonable level of error (typically 5%) for one test, when we do several tests, we run into increased risk of seeing relationships that don’t exist. One t-test has a 5/100 chance of having Type I error. But multiple t-tests on the same data set destroy the careful controls we set in place.

We can use a t-test to compare the means of two groups. But to compare 3, 4 or more groups, we’d have to do too many t-tests; so many that we’d risk finding a significant t-test when none existed. If there were 4 groups (A, B, C and D, we’ll call them), to compare each condition to another you’d have to make the following t-tests:

The chances too good that we’ll find one of those tests to look significant but not be. What we need is a pre-analysis of data to test the overall design and then go back, if the overall variance is significant, and conduct the t-tests.

Comparisons

A Ratio

The premise of an ANOVA is to compare the amount of variance between the groups to the variance within the groups.

The variance within any given group is assumed to be due to chance (one subject had a good day, one was naturally better, one ran into a wall on the way out the door, etc.). There is no pattern to such variation; it is all determined by chance.

If no experimental conditions are imposed, it is assumed that the variance between the groups would also be due to chance. Since subjects are randomly assigned to the groups, there is no reason other than chance that one group would perform better than another.

After the independent variable is manipulated, the differences between the groups is due to chance and the independent variable. By dividing the between group variance by the within variance, the chance parts should cancel each other out. The result should be a measure of the impact the independent variable had on the dependent variable. At least that’s the theory behind the F test.

The F Test

F table
Yes, this is the same F test we used doing an Analysis of Regression. And it has the same summary table.

Notice that the titles have changed. We know talk about Between Sum of Squares, not Regression SS. The F test (named after its author, R.A. Fischer) is the ratio of between-group variance (called between mean squares or mean squares between) to within-group variance (called within mean squares or mean squares within).

What To Do

After you calculate the F, you compare it to the critical value in a table of Critical Values of F. There are several pages of critical values to choose from because the shape of F distribution changes as the number of subjects in the study decreases. To find the right critical value, go across the degrees of freedom for regression (df between) and down the df within.

Simply compare the value you calculated for F to the one in the table. If your F is equal to or higher than the book, you win: what you see is significantly different from chance. The table we most often use is the .05 alpha level because our numbers aren’t very precise, so we’re willing to accept 5% error in our decisions. In other words, our alpha level is set .05 (the amount of error we are willing to accept). Setting the criterion at .05 alpha indicates that we want to be wrong no more than 5% of the time. Being wrong in this context means to see a significant relationship where none exists.

5% Error

Two points should be made: (a) 5% is a lot of error and (b) seeing things that don’t exist is not good. Five-percent of the population of the US is over 50 million people; that’s a lot of error. If elevators failed 5% of the time, no one would ride them. If OPEC trims production by 5%, they cut 1.5 million barrels a day. There are 230 million people use the internet, about 5% of the world’s population.

1-Way ANOVA: 5% error We use a relatively low standard of 5% because our numbers are fuzzy. Social science research is like watching a tennis game through blurry glasses which haven’t been washed in months. We have some understanding of what is going on-better than if we hadn’t attended the match-but no easy way to summarize the experience.

Second, seeing things that don’t exist is dangerous. In statistics, it is the equivalent of hallucination. We want to see the relationships that exist and not see additional ones that live only in our heads. Decisions which produce conclusions of relationship that don’t exist are called Type I errors.

If Type I error is statistical hallucination, Type II error is statistical blindness. It is NOT seeing relationships when they do exist. Not being able to see well is the pits (I can tell you from personal experience) but it’s not as bad as hallucinating. So we put most of our focus on limiting Type I error.

We pick an alpha level (how much Type I error we are willing to accept) and look up its respective critical value. If the F we calculate is smaller than the critical value, we assume the pattern we see is due to chance. And we continue to assume that it is caused by chance until it is so clear, so distinct, so accurate and it can’t be ignored. We only accept patterns that are significantly different from chance.

When the F we calculate is larger than the critical value, we are 95% sure that the pattern we see is not caused by chance. By setting the alpha level at .05, we have set the amount of Type I decision error at 5%.

Interpretation

If the F is significant, what do we do now?

Now, all of those t-tests we couldn’t do because we were afraid of Type I error are available for our calculating pleasure. So we do t-tests between:

AB
AC
BC

We might find that there is a significant difference between each group. Or we might find that there is a not a significant difference between two of the groups but that there is a significant difference between them and the third group.

Also, which did best depends on whether the numbers are of money (you want the higher means) or errors (you want to lower means). Doing the t-tests between each combination of means will tell us which ones are significant, and which are likely to be due to chance.

Just think, if the F had not been significant, there would not be anything left to do. We would have stopped with the calculating of F and concluded that the differences we see are due to chance. How boring, huh? It’s a lot more fun to do lots of t-tests. Where’s my calculator?

Advanced Procedures

Want to jump ahead?

Book

Statictics Safari

Photo by chris robert on Unsplash

Photo by Tsunami Green on Unsplash

March 29, 2023 by ktangen

Independent t-Test

Day 8: Testing 2 means

A t-test asks whether two means are significantly different. If the means, as representatives of two samples of the same variable, are equal or close to equal, the assumption is that the differences seen are due to chance. If the means are significantly different, the assumption is that the differences are due to the impact of an independent variable.

Independent t-test

Independent t-tests are an extension of z scores. Instead of comparing a score to a mean, t-tests compare two means. Two means are thought to be from the same population until they are so different they are very unlikely to be the same. The question is how different can you be and still be the same.

Assume that the t you calculated was a person. If that score is close to the mean of the t distribution, it is not significant; there are too many scores hanging around the mean to make it special. But if your calculated score is at one extreme of the distribution, this would be usual (or in stats terms: “significant”), and the relationship between your score and the t distribution might look like this:

When subjects are randomly assigned to groups, the t-test is said to be independent. That is, it tests the impact of an independent variable on a dependent variable. The independent variable is dichotomous (yes/no; treatment/control; high/low) and the dependent variable is continuous. If significant, the independent t-test supports a strong inference of cause-effect.

When subjects are given both conditions (both means are measures of the same subjects at different times), the t-test is said to be dependent or correlated. Because it uses repeated measures, the correlated-t is often replaced by using a regression (where the assumptions of covariance are more clearly stated).

You know what it is to be independent. It means you are in control. In research, an independent test means that the experimenter is in control. Subjects get the treatment condition that the experimenter chooses. The choice is independent of what the subject does, thinks or feels.

One of the most common approaches is for experimenters to randomly assign subjects to treatment or control. Subjects don’t know, and don’t choose, which treatment they get.

Independent t-test

The independent t-test assumes that one group of subjects has been randomly assigned to 2 groups. Each group contains the same number of subjects, has its own mean and has its own standard deviation.

Conceptually, the t-test is an extension of the z-score. A z score compares the difference between a raw score and the mean of the group to the standard deviation of the group. The result is the number of standard deviations between the score and the group mean.

Similarly, a t-test compares the difference between 2 means to the standard deviation of the pooled variance. That is, one mean pretends to be a raw score and the other mean is the mean of the group. The difference between these means is divided by a standard deviation; it’s calculated a little funny but conceptually it’s equivalent to the standard deviation used in calculating a z score.

An independent t-test is the difference between two means divided by their pooled variance Like a z score, a t-test is evaluated by comparing the calculated value to a standard. In the case of a z score, the standard is the Area Under the Normal Curve. Similarly, a t-test compares its calculated value to a table of critical values. When N is large (infinity, for example), the values in the two tables are identical.

For example, in a one-tailed test at .02 alpha, the critical region would be the top 2% of the distribution. A more common standard is an alpha level of .05; here the critical region would be the top 5% of the distribution. The z-score would be the one where 5% was beyond the z and 45% was between the mean and z (there’s another 50% below the mean but the table doesn’t include them). The appropriate z-score for the location where there is 5% beyond is 1.65. In the Critical Values of Student’s t, the critical value at the bottom of the .05 alpha 1-tailed column is 1.65.

Similarly, in a two-tailed test at the .05 alpha, the critical region would be the bottom 2.5% and the top 2.5%. The z-score for the bottom 2.5% is -1.96 and the z-score for the top 2.5% is +1.96. In the Critical Values of Student’s t table, the critical value at the bottom of the .05 alpha 2-tailed column is 1.96.

When the t-test has an infinite number of subjects, its critical value is the same as a z-score. At infinity, t-tests could be evaluated by referring to the Areas Under the Normal Curve table. A t-test, however, usually has a small number of subjects. Consequently, the values are modified to adjust for the small sample size.

Significance

The t-test tells us if there is a significant difference between the means. It is as if two armies met and decided that each side would send a representative to battle it out. The representative would not be the best from each side but the average, typical member of their respective groups. Similarly, by comparing the means, we are comparing the representatives of two groups. The entire cast is not involved, only a representative from each side.

An independent t-test is compared to a critical region
We typically do a two-tailed test. That is, we want to know if Group 2 is significantly better than Group 1 AND if it is significantly worse. We want to know both things, so we start at the mean and assume that in order to be significantly different from chance, the t statistic has to be at either of the 2 tails. At .05 alpha (the amount of Type I error we are willing to accept), the critical region is split into two parts, one at each tail. Although the overall alpha level is 5%, there is only 2.5% at each tail.

In one-tailed tests, the entire 5% in a .05 alpha test would be at one end. That is, we would only want to know if Group 2 was significantly better than Group 1; we wouldn’t care if it was worse. It doesn’t happen very often that our hypotheses are so finely honed that we are interested in only one end of the distribution. We general conduct a 2-tailed test of significance. Consequently, the t statistic might be positive or negative, depending on which mean was put first. There is no theoretical reason why one mean should be placed first in a two-tailed test, so apart from identifying which group did better, the sign of the t-test can be ignored.

Correlated t-test

Correlated t-tests are sometimes called repeated-measures or within-subjects designs.
Instead of randomly assigning subjects, some studies reuse people. The advantage is that each person acts as their own control group. Since no one is more like you than you, the control group can’t be more like the treatment group. The t-test for repeated measures designs is called a correlated t-test.

The second advantage is that a correlated t-test has more power (is able to use less people to conduct the study). An independent t-test has N-2 degrees of freedom. So if 20 people are randomly assigned to 2 groups, the study has 18 degrees of freedom. In a correlated t-test, if we use all 20 people, the study has 19 degrees of freedom.

The third advantage to correlated designs (also called within-subjects or repeated measures designs) is cost. Reusing people is cheaper. If subjects are paid to participate, they are paid for being in the study, regardless of how many trials it takes. Reusing people is also cheaper in time, materials and logistical effort. Once you have a willing subject, it’s hard to let them go.

The primary disadvantage of a correlated t-test is that it is impossible to tell if the effects of receiving one treatment will wear off before receiving the second condition. If people are testing 2 drugs, for example, will the first drug wear off before subjects are given the second drug?

A second problem with the pre- and post-test design often used with correlated t-tests is in its mathematical assumptions. Although the arguments are beyond the scope of this discussion, statisticians differ on the theoretical safety of using difference scores. Some worry that subtracting post-tests from pre-tests may add additional error to the process.

Consequently, a better way of testing correlated conditions is to use a correlation, a linear regression or an analysis of regression. Correlations test for relationship and can be used on ordinal and ratio data. Similarly, linear regression and analysis of regression make predictions and test for goodness of fit without relying on difference scores.

Using t-tests for hypothesis testing and estimation

Hypothesis testing is like venturing out onto a frozen lake. The primary hypothesis is that the lake is frozen but you proceed as if it weren’t. You’re cautious until you’re sure the ice is thick enough to hold you. The H0 is that the ice is not frozen; this is your null hypothesis (no difference from water). When you have tested the ice (jumping up and down on it or cutting a hole in it to measure the thickness of the ice), you then decide to accept the null hypothesis (no difference from water) or reject that hypothesis and accept the H1 hypothesis that the lake is frozen and significantly different from water.

We use t-tests to make confidence estimations. When t is significant, we are saying that we are confident that our findings are true 95% of the time (assuming the alpha level is set at .05). Our confidnece estimates are interval estimates of a distribution of t scores. A significant t says that its value falls in a restricted part of the distribution (upper 5%, for example).

Estimation is like getting your car fixed. If you go to a repair shop and they estimate the cost to repair you car is $300, that’s a point estimate. An interval estimate would be a range of numbers. If the shop says it will cost between $200-400, that’s an interval estimate

One-Way ANOVA

Want to jump ahead?

Book

Statictics Safari

Photo by Gigi on Unsplas

March 29, 2023 by ktangen

Probability

Day 7: Chance

Moving from describing events to predicting their likelihood involves probabilities, odds, and the use of Fisher’s F test. Probabilities compare of an event to the total number of possible events (4 aces out of 52 cards equals a probability of .077). Odds compare sides: 4 aces in a deck against 48 cards that aren’t aces (which equals odds of 1:12).

Probability

We base many of our decisions on probabilities. Is it likely to rain tomorrow? What is the probability of a new car breaking down? What are the chances our favor team will win their next game?

We are in search of causation. We want to know if what we see is likely to be due to chance. Or are we seeing a pattern that has meaning. So we begin by calculating the likelihood of events occurring by chance. We calculate probabilities and odd.

Probabilities and odds are related but not identical. They are easy to tell apart because probabilities are stated as decimals and odds are stated as ratios. But the big difference between them is what they compare. Probabilities compare the likelihood of something occurring to the total number of possibilities. Odds compare the likelihood of something occurring to the likelihood of it’s not occurring.

If you roll your ordinary, friendly six-sided die with the numbers 1 through 6 (one on each side), the probability of getting a specific number is .167. This is calculated by taking how many correct answers there are (1), divided by how many total possibilities (6), and expressing it in decimal form (.167). The odds of getting a specific number is how many correct answers (1), against how many incorrect answers. So the odds of rolling a 4 is 1:5…or 5:1 against you.

Let’s try another example. The odds of pulling a King out of a deck of cards is the number of possible correct answers (4), against the number of incorrect answers (48). So the odds are 4:48, which can be reduced to 1:12. The probability of pulling an ace is 4 divided by 52, which equals .077. Probabilities are always decimals, and odds are always ratios.

To calculate the probability of two independent events occurring at the same time, we multiply the probabilities. If the probability of you eating ice cream is .30 (you really like ice cream) and the probability of your getting hit by a car is .50 (you sleep in the middle of the street), the probability that you’ll be eating ice cream when you get hit by a car is .15. Flipping a coin twice (2 independent events) is calculated by multiplying .5 times .5. So the probability of rolling 2 heads in a row is .25. Rolling snake eyes (ones) on a single roll of a pair of dice has a probability of .03 (.167 times .167).

A major question in research is whether or not the data looks like chance. Does the magic drug we created really cure the comon cold or is it a random pattern that just looks like the real thing?

To answer our question, we compare things we think aren’t due to chance to those which we believe are due to chance. We know people vary greatly in performance. We all have strengths and weaknesses. So we assume that people in a control will vary because of chance, not because of what we did to them. But people in different treatment groups should vary because of the experiment, and not just because of chance. Later, we will use this procedure to compare differences between experimental groups to variation within each group. That is, we will compare between-subjects variance to error variance (within-subjects variance).

For the present, we can use the same test (Fischer’s F) to test the significance of a regression. Does the data we collected approximate a straight line? An Analysis of Regression (ANOR) tests whether data is linear. That is, it tests of the fit of the data to a straight line. It, like regression, assumes the two variables being measured are both changing. It works well for testing two continuous variables (like age and height in children) but not so well when one of the variables no longer varies (like age and height in adults).

Analysis of Regression (ANOR) is an application of probability to linear regression. The ANOR uses a F-test, which is a ratio of variances. It is the ratio of understood variance to unexplained variance. To find the likelihood that a regression can be explained by a straight line, the number derived from an F test is compared to a table of probabilities. If the value you calculated is bigger than (or equal to) the value in the book, the pattern you see in the data is unlikely to be due to chance.

Probability and scatter plot
When one thing causes another it tends to have a linear pattern. Someone stomps on your toes, you scream. The harder the stomp, the more you scream. If you graphed this pattern, you’d find that pressure and loudness looked somewhat like a diagonal line. It would only be somewhat straight because high pressure doesn’t always hurt more than light pressure, it just usually does.

In contrast, if there were no relationship between pressure and loudness, the pattern would be more like a circle. The relationship between chance variables has its own pattern: no consistent pattern. Chance tends to change. It might look exactly like a circle, then somewhat like a circle, somewhat like a straight line, and then back to looking like a circle. The trick with chance is inconsistency.

The reason we replicate studies is to show consistency. We know that the results of one study might look like causation but the next study will show opposite results. The results of a single study might be due to chance. So we conduct a series of studies in the area, publish the results, and let other researchers try to replicate our findings.

Null Hypothesis

Our other protection against chance is we use a null hypothesis. We start with the hypothesis that what we see is going to be due to chance. And we discard that hypothesis only when there is a substantial reason to do so. We assume that a variable has no significant impact on another. Unless we have evidence to the contrary, we accept the nullhypothesis. We need a significant amount of evidence for us to reject the null hypothesis, and to say that there is a significant relationship between variables.

Our approach is equivalent to presuming the innocence of a criminal suspect. People are assumed to be innocent until proved guilty. And if they are not proved guilty, they are not declared innocent; they are “not guilty.” Similarly, the lack of a significant finding in research doesn’t mean that a causal relationship doesn’t exist, only that we didn’t find it.

Alpha Level

We also limit the amount of error we are willing to accept. That is, we decide ahead of time how far away from being a circle (how close to being a straight line) does the data have to be before we say it is significant. It’s good to set the standard beforehand. Otherwise, we’d be tempted to change the standard to fit the data. To avoid that problem, we typically only allow an error rate of 5%, which is the same cutoff score we used for z-scores and correlations.

We could use a higher standard: 3% error or 1% error, but we use a relative low standard of 5% because our numbers are fuzzy. Social science research is like watching a tennis game through blurry glasses that haven’t been washed in months. We have some understanding of what is going on—better than if we hadn’t attended the match—but no easy way to summarize the experience. So 5% is a good level for us.

Decision Error

The kind of error we are limiting to 5% is decision error. We’re setting the point in the distribution beyond which we’re going to discard our null hypothesis (nothing is going on) and accept our alternative hypothesis (something is going on). If we set the standard too low, everything we test will look significant. Seeing significant finding in random events is the equivalent of a statistical hallucination. We only want to see the relationships that exist and not see additional ones that live only in our heads. Decisions which produce conclusions of relationship that don’t exist are called Type I errors.

We pick an alpha level (amount of Type I error) and look up its respective critical value in a table. If the value we calculate is smaller than the critical value, we assume the pattern we see is due to chance. And we continue to assume that it is caused by chance until it is so clear, so distinct, so significant that it can’t be ignored. We only accept patterns that are significantly different from chance.

Independent t test

Want to jump ahead?

Book

Statictics Safari

Photo by Naser Tamimi on Unsplash

March 29, 2023 by ktangen

Regression

Day 6: Prediction

With regression, you can make predictions. Accurate predictions requires a strong correlation. When there is a strong correlation between two variables (positive or negative), you can make accurate predictions from one to the other. If sales and time are highly correlated, you can predict what sales will be in the future…or in the past. You can enhance the sharpness of an image by predicting what greater detail would look like (filling in the spaces between the dots with predicted values). Of course the accuracy of your predictions, depends on the strength of the correlation. Weak correlations produce lousy predictions.

Regression is like looking into the future or the past

An extension of the correlation, a regression allows you to compare your data looks to a specific model: a straight line. Instead of using a normal curve (bell-shaped hump) as a standard, regression draws a straight line through the data. The more linear your data, the better it will fit the regression model.Once a line of regression is drawn, it can be used to make specific predictions. You can predict how many shoes people will buy based on how many hats they buy, assuming there is a strong correlation between the two variables.

Just as a correlation can be seen in a scatterplot, a regression can be represented graphically too. A regression would look like a single straight line drawn through as many points on the scatterplot
as possible. If your data points all fit on a straight line (extremely unlikely), the relationship between the two variables would be very linear.

Scatterplot J

Most likely, there will be a cluster or cloud of data points. If the scatterplot is all cloud and no trend, a regression line won’t help…you wouldn’t know where to draw it: all lines would be equally bad.

But if there the scatterplot reveals a general trend, some lines will obviously be better than others. In essence, you try draw a line that follows the trend but divides or balances the data points equally.

In a positive linear trend, the regression line will start in the bottom left part of the scatterplot and go toward the top right part of the figure. It won’t hit all of the data points but it will hit most or come close to them.

Regression line through data

You can use either variable as a predictor. The choice is yours. But the results mostly likely won’t be the same, unless the correlation between the two variables is perfect (either +1 or -1). So it matters which variable is selected as a predictor and which is characterized as the criterion (outcome variable).

Predicting also assumes that the relationship between the two variables is strong. A weak correlation will produce a poor line of prediction. Only strong (positive or negative) correlations will produce accurate predictions.

A regression allows you to see if the data looks like a straight line. Obviously, if your data is cyclical, a straight line won’t represent it very well. But if there is a positive or negative trend, a straight line is a good model. It is not so much that we apply the model to the data; more like we collect the data and ask if it looks this model (linear), that model (circular or cyclic) or that model (chance).

If the data approximates a straight line, you can then use that information to predict what will happen in the future. Predicting the future assumes, of course, that conditions remain the same. The stock market is hard to predict because it gets changing, up and down, slowly up, quickly down. It’s too erratic to predict its future, particularly in the short run.

If you roll a bowling ball down a lane and measure the angle it is traveling, you can predict where the ball will hit when it reaches the pins. The size, temperature and shape of the bowling lane are assumed to remain constant for the entire trip, so a linear model would work well with this data. If you use the same ball on a grass lane which has dips and bulges, the conditions are not constant enough to accurately predict its path.

Predicting the future also assumes that the relationship between the two variables is strong. A weak correlation will produce a poor line of prediction. Only strong (positive or negative) correlations will produce accurate predictions.

A regression is composed of three primary characteristics. Any two of these three can be used to draw a regression line: pivot point, slope and intercept.

First, the regression line always goes through the point where the mean of X and the mean of Y meet. This is reasonable since the best prediction of a variable (knowing nothing else about it) is its mean. Since the mean is a good measure of central tendency (where everyone is hanging out), it is a good measure to use.

Second, a regression line has slope. For every change in X, slope will indicate the change in Y. If the correlation between X and Y is perfect, slope will be 1; every time X gets larger by 1, Y will get larger by 1. Slope indicates the rate of change in Y, given a change of 1 in X.

Third, a regression line has a Y intercept: the place where the regression line crosses the Y axis. Think of it as the intersection between the sloping regressing line and vertical axis.

Regression means to go back to something. We can regress to our childhood; regress out of a building (leave the way we came in). Or regress back to the line of prediction. Instead of looking at the underlying data points, we use the line we’ve created to make predictions. Instead of relying on real data, we regress to our prediction line.

There are two major determinants of a prediction’s accuracy: (a) the amount of variance the predictor shares with the criterion and (b) the amount of dispersion in the criterion.

Taking them in order, if the correlation between the two variables is not strong, it is very difficult to predict from one to the other. In a strong positive correlation, you know that when X is low Y is low. Know where one variable is makes it easy to the general location of the other variable.

A good measure of predictability, therefore, is the coefficient of determination (calculated by squaring r). R-squared (r2) indicates how much the two variables have in common. If r2is close to 1, there is a lot of overlap between the variables and it becomes quite easy to predict one from the other.

Even when the correlation is perfect, however, predictions are limited by the amount of dispersion in the criterion. Think of it this way: if everyone has the same score (or nearly so), it is easy to predict that score, particularly if the variable is correlated with another variable. But if everyone has a different score (lots of dispersion from the mean), guessing the correct value is difficult.

The standard error of estimate (see) takes both of these factors into consideration and produces a standard deviation of error around the prediction line. A prediction is presented as plus or minus its see.

The true score of a prediction will be within 1 standard error of estimate of the regression line 68% of the time. If the predicted score is 15 (just to pick a number), we’re 68% sure that the real score is 15 plus or minus 3 (or whatever the see is).

Similarly, we’re 96% sure that the real score falls within two standard deviations of the regression line (15 plus or minus 6). And we’re 99.9% sure that the real score fall within 3 see of the prediction (15 plus or minus 9).

Probability

Want to jump ahead?

Book

Statictics Safari

Photo by Daniel Lerman on Unsplash

March 29, 2023 by ktangen

Correlation

Day 5: Relationship

This is the first 2-variable model we’ll consider. Both variables (designated X and Y) are measures obtained from the same subjects. Basically, a mathematical representation of a scatterplot, a correlation indicates whether the variables move together in the same direction (+ correlation), move in opposite directions (- correlation) or move separately (0 correlation). Correlations are widely used to measure reliability, validity and commonality.

Correlation would mean wings of a bird flap together.

When one wing goes up, does the other wing to up, down or stay the same? In a strong positive correlation, when one wing goes up, the other usually goes up too. In a strong negative correlation, when one wing goes up, the other usually goes down. In a weak correlation, either positive or negative, when one wing goes up, the other wing does whatever it wants.

With correlations we are only observing, but we’re going to look at two variable and see how they are related to each other. When one variable changes, we want to know what happens to the other variable. In a perfect correlation, the two variable with move together. When there is no correlation, the variables will act independently of each other.

To use this simple and yet powerful method of description, we must collect two pieces of information on every person. These are paired observations. They can’t be separated. If we are measuring height and weight, it’s not fair to use one person’s height and another person’s weight. The data pairs must remain linked. That means that you can’t reorganize one variable (how highest to lowest, for example) without reorganizing the other variable. The pairs must stay together.

Sign & Magnitude

A correlation has both sign and magnitude. The sign (+ or -) tells you the direction of the relationship. If one variable is getting larger (2, 4, 5, 7, 9) and the other variable is headed in the same direction (2, 3, 6, 8, 11), the correlation’s sign is positive. In a negative correlation, while the first variable is getting larger (2, 4, 5, 7, 9), the second variable is getting smaller (11, 8, 6, 3, 2).

Correlation is about direction of change, like arrows

The magnitude of a correlation is found in the size of the number. Correlation coefficients can’t be bigger than 1. If someone says they found a correlation of 2.48, they did something wrong in the calculation. Since the sign can be positive or negative, a correlation must be between -1 and +1.

The closer the coefficient is to 1 (either + or -), the stronger the relationship. Weak correlations (such as .13 or -.08) are close to zero. Strong correlations (such as .78 or -.89) are close to 1. Consequently, a coefficient of -.92 is a very strong correlation. And +.25 indicates a fairly weak positive correlation.

Magnitude is how close the coefficient is to 1; sign is whether the relationship is positive (headed the same same) or negative (inverse).

Correlations don’t prove causation. A strong correlation is a necessary indicator of causation but it is not sufficient. When a cause-effect relationship exists, there will be a strong correlation between the variables. But a strong correlation does not mean that variable A causes variable B.

In correlations, A can cause B. Or, just as likely, B can cause A. Or, just as likely, something else (call it C) causes both A and D to occur.

For a simple example, let’s assume that we know nothing about science. But we do notice that when the sun comes up, it gets warm outside. From a statistical point of view, we can’t tell which causes which. Perhaps the sun coming up makes it get warm. But it is as likely that when it gets warm the sun comes up. Or the sun and warmth are caused by something else: a dragon (pulling the sun behind it) flies across the sky blowing it’s hot breath on the earth (making it warm).

You might laugh at this illustration but think how shocked you’d be if tomorrow it got warm and the sun didn’t come up!

It is, of course, perfectly OK to infer causation from correlational data. But we must remember that these inferences are not proofs; they are leaps of faith. Leaping is allowed but we must clearly indicate that it is an assumption, not a fact

Reliability & Validity

Although correlations can’t prove cause and effect, they are very useful for measuring reliability and validity. Reliability means that you get the same results every time you use a test. If you’re measuring the temperature of liquid and get a reading of 97-degrees, you would expect a reliable thermometer to yield the same result a few second later. If your thermometer gives different readings of the same source over a short period of time, it is unreliable and you would throw it away.

We expect many things in our lives to be reliable. When you flip on a light switch, you expect the light to come on. When you get on an elevator and push the “down” button, you don’t expect the elevator to go sideways. If you twice measure the length of a table, a reliable tape measure will yield the same result. Even if your measuring skill is poor, you expect the results to be close (not 36 inches and then 4 inches). You expect the same results every time.

Reliability, then, is the correlation between two observations of the same event. Test reliability is determined by giving the test once and then given the same test to the same people 2 weeks later. With this test-retest method, you would expect a high positive correlation between the first time the test was given and the second time.

A test with a test-retest reliability of .90 (which many intelligence tests have) is highly reliable. A correlation of .45 shows a moderate amount of reliability, and a coefficient close to zero indicates the test is unreliable. Obviously, a negative test-retest reliability coefficient would indicate something was wrong. People who got high scores the first time should be getting high scores the second time, if the test is reliable.

There are 3 basic types of reliability correlations. A test-retest coefficient is obtained by giving and re-giving the test. A “split half” correlation is found by correlating the total score for the first half with the total score for the second half for each subject. A parallel forms correlation shows the reliability of two tests with similar items.

Correlations also can be used to measure validity. Although a reliable test is good, it is possible to be reliably (consistently) wrong. Validity is the correlation between a test and an external criterion. If you create a test of musical ability, you expect that musicians will score high on the test and those judged by expects to be unmusical to score low on the test. The correlation between the test score and the expert’s rating is a measure of validity.

Validity is whether a test measures what it says it measures; reliability is whether a test is consistent. Clearly, reliability is necessary not sufficient for a test to be valid.

Significance

It is possible to test a correlation coefficient for significance. A significant correlation means the relationship is not likely to be due to chance. It doesn’t mean that X causes Y. It doesn’t mean that Y causes X; or that another variable causes both X and Y. Although a correlation cannot prove which causes what, r can be tested to see if it is likely to be due to chance.

First, determine the degrees of freedom for the study. The degrees of freedom (df) for a correlation are N-2. If there are 7 people (pairs of scores), the df = 5. If there are 14 people, df = 12.

Second, enter the statistical table “Critical Values of the Pearson r” with the appropriate df. Let’s assume there were 10 people in the study (10 pairs of scores). That would mean the degrees of freedom for this study equals 8.

Go down the df column to eight, and you’ll see that in order to be significant a Pearson r with this few of people the magnitude of the coefficient has to be .632 or larger.

Notice that the table ignores the sign of the correlation. A negative correlation of -.632 or larger (closer to -1) would also be significant.

Evaluate r-squared

A correlation can’t prove that A causes B; it could be that B causes A…or that C causes both A & B. The coefficient of determination is an indication of the amount of relationship between the two variables. It gives the percentage of variance that is accounted for by the relationship between the two variables.

To calculate the coefficient of determination, simply take the Pearson r and square it. So, .89 squared = .79. In this example, 79% of the variance can be explained by the relationship between the two variables. Using a Venn diagram, it is possible to see the relationship between the two variables. It is the area of overlap:

To calculate the amount of variance that is NOT explained by the relationship (called the coefficient of non-determination), subtract r-squared from 1. In our example, 1-r2 = .21. That is, 21% of the variance is unaccounted for..

Regression

Want to jump ahead?

Book

Statictics Safari

Photo by Boris Smokrovic on Unsplash

March 29, 2023 by ktangen

z Scores

Day4: Where am I

An entire distribution can often be reduced to a mean and standard deviation. A z-score uses that information to indicate the location of an individual score. Essentially, z-scores indicate how many standard deviations you are away from the mean. If z = 0, you’re at the mean. If z is positive, you’re above the mean; if negative, you’re below the mean. In practical terms, z scores can range from -3 to +3.

Z cores: how compare to others; like one red in field of yellow flowers

Composed of two parts, the z-score has both magnitude and sign. The magnitude can be interpreted as the number of standard deviations the raw score is away from the mean. The sign indicates whether the score is above the mean (+) or below the mean (-). To calculate the z-score, subtract the mean from the raw score and divide that answer by the standard deviation of the distribution. In formal terms, the formula is

Using this formula, we can find z for any raw score, assuming we know the mean and standard deviation of the distribution. What is the z-score for a raw score of 110, a mean of 100 and a standard deviation of 10? First, we find the difference between the score and the mean, which in this case would be 110-100 = 10. The result is divided by the standard deviation (10 divided by 10 = 1). With a z score of 1, we know that the raw score of 110 is one standard deviation above the mean for this distribution being studied.

Independent t-test

Applications

A-applicationsThere are 5 primary applications of z-scores.

First, z-scores can be used for describing the location of an individual score. If your score is at the mean, your z score equals zero. If z = 1, you are one standard deviation above the mean. If z = -2, you are two standard deviations below the mean. If z = 1.27, you score is a bit more than one and 1/4th standard deviations above the mean.

What is the z-score for a raw score of 104, a mean of 110 and a standard deviation of 12? 104-110 equals -6; -6 divided by 12 equals -.5. The raw score of 104 is one-half a standard deviation below the mean.

Second, raw scores can be evaluated in relation to some set z-score standard; a cutoff score. For example, all of the scores above a cutoff z-score of 1.65 could be accepted. In this case, z-scores provide a convenient way of describing a frequency distribution regardless of what variable is being measured.

Each z score’s location in a distribution is associated with an area under the curve. A z of 0 is at the 50th percentile and indicates that 50% of the scores are below that point. A z score of 2 is associated with the 98th percentile. If we wanted to select the top 2% of the individuals taking a musical ability test, we would want those who had a z score of 2 or higher. Z scores allow us to compare an individual to a standard regardless of whether the test had a mean of 60 or 124.

Most statistics textbooks have a table that shows the percentage of scores at any given point of a normal distribution. You can begin with a z score and find an area or begin with an area and find the corresponding z score. Areas are listed as decimals: .5000 instead of 50%. In order to save space, tables only list positive values are shown. The tables also assume you know that 50% of the scores fall below the mean and 50% above the mean. The table usually has 3 columns: the z score, the area between the mean and z, and the area beyond z.

The area between the mean and z is the percentage of scores located between z and the mean. A z of 0 has an area between the mean and z of 0 and the area beyond (the area toward the end of the distribution) as .5000. Although there are no negatives, notice that a z score of -0 would also have an area beyond (toward the low end of the distribution) of .5000.

A z score of .1, for example, has an area between the mean and z of .0398. That is, 3.98% of the scores fall within this area. And the third column shows that the area beyond (toward the positive end of the distribution) is .4602. If the z has -.1, the area from the mean down to that point would account for 3.98% of the scores and the area beyond (toward the negative end of the distribution) would be .4602.

Areas under the curve can be combined. For example, to calculate the percentile of a z of .1, the area between the mean and z (.0398) is added to the area below z (which you know to be .5000). So the total percentage of scores below a z of .1 is 53.98 (that is, .0398 plus .5000). A z score of .1 is at the 53.98th percentile.

Third, an entire variable can be converted to z-scores. This process of converting raw scores to z-scores is called standardizing and the resulting distribution of z-scores is a normalized or standardized distribution. A standardized test, then, is one whose scores have been converted from raw scores to z-scores. The resultant distribution always has a mean of 0 and a standard deviation of 1.

Standardizing a distribution gets rid of the rough edges of reality. If you’ve created a nifty new test of artistic sensitivity, the mean might be 123.73 and the standard deviation might be 23.2391. Interpreting these results and communicating them to others would be easier if the distribution was smooth and conformed exactly to the shape of a normal distribution. Converting each score on your artistic sensitivity test to a z score, converts the raw distribution’s bumps and nicks into a smooth normal distribution with a mean of 0 and a standard deviation of 1. Z scores make life prettier.

Fourth, once converted to a standardized distribution, the variable can be linearly transformed to have any mean and standard deviation desired. By reversing the process, z-scores are converted back to raw score by multiplying each by the desired standard deviation and add the desired mean. Most intelligence tests have a mean of 100 and a standard deviation of 15 or 16. But these numbers didn’t magically appear. The original data looked as wobbly as your test of artistic sensitivity. The original distribution was converted to z scores and then the entire distribution was shifted.

To change a normal distribution (a distribution of z scores) to a new distribution, simply multiply by the standard deviation you want and add the mean you want. It’s easy to take a normalized distribution and convert it to a distribution with a mean of 100 and a standard deviation of 20. Begin with the z scores and multiply by 20. A z of 0 (at the mean) is still 0, a z of 1 is 20 and a z of -1 is -20. Now add 100 to each, and the mean becomes 100 and the z of 1 is now 120. The z of -1 becomes 80, because 100 plus -20 equals 80. The resulting distribution will have a mean of 100 and a standard deviation of 20.

Fifth, two distributions with different means and standard deviations can be converted to z-scores and compared. Comparing distributions is possible after each distribution is converted into z’s. The conversion process allows previously incomparable variables to be compared. If a child comes to your school but she old school used a different math ability test, you can estimate her score on your school’s test by converting both to z scores.

If her score was 65 on a test with a mean of 50 and a standard deviation of 10, her z score was 1.5 on the old test (65-50 divided by 10 equals 1.5). If your school’s test has a mean of 80 and a standard deviation of 20, you can estimate her score on your test as being 1.5 standard deviations above the mean; a score of 110 on your test.

ktangen

Day 9: Pre-analysis

Within-Subjects Designs

Between-Subjects Designs

1-Way

A Ratio

The F Test

What To Do

5% Error

Interpretation

Want to jump ahead?

Book

Photo by chris robert on Unsplash

Day 8: Testing 2 means

Independent t-test

Significance

Correlated t-test

Using t-tests for hypothesis testing and estimation

Want to jump ahead?

Book

Photo by Gigi on Unsplas

Day 7: Chance

Null Hypothesis

Alpha Level

Decision Error

Want to jump ahead?

Book

Photo by Naser Tamimi on Unsplash

Day 6: Prediction

Want to jump ahead?

Book

Photo by Daniel Lerman on Unsplash

Day 5: Relationship

Sign & Magnitude

Reliability & Validity

Significance

Evaluate r-squared

Want to jump ahead?

Book

Photo by Boris Smokrovic on Unsplash

Day4: Where am I

Applications

Want to jump ahead?

Book

Photo by Rupert Britton on Unsplash

Footer

Search

KenTangen.com

My Channel