Statistics

March 30, 2023 by ktangen

Calc Basics

Measurement leads to data collection. When you send up with a pile of papers, surveys and numbers, here are a few terms you will encounter.

N & n
Mini & Max
Totals
Data Matrix

How To Calculate

Basics
Percent & Percentile
degrees of freedom (df)
Central Tendency
Dispersion
- Range, MAD (mean absolute variance), variance
Sum of Squares
Correlation
Regression
ANOVA
ANOR
z scores
t-test
chi square

March 29, 2023 by ktangen

Day 10: Complex models

It is helpful to have an overview of designs more advanced than those covered in a typical statistics course. Complex models build on the principles we already discussed. Although their calculation is beyond the scope of this discussion (that’s what computers are for), here is an introduction to procedures that use multiple predictors, multiple criteria and multivariate techniques to test interactions between model components.

Advanced procedures: discovering light

Until now, our models have been quite simple. One individual, one group, or one variable predicting another. We have explored the levels of measurement, the importance of theories and how to convert theoretical constructs into model variables. We have taken a single variable, plotted its frequency distribution and described its central tendency and dispersion. We have used percentiles and z-scores to describe the location of an individual score in relation to the group.

In addition to single variable models, we studied two variable models, such as correlations, regressions, t-tests and one-way ANOVAs. We have laid a thorough foundation of research methods, experimental design, and descriptive and inferential statistics.

Despite their simplicity, these procedures are very useful. You can use a correlation to measure the reliability and validity of a test, machine or system of management, training or production. You can use a linear regression to time data a rare archaeological find, predict the winner of a race or analyze a trend in the stock market. You can use the t-test to test a new drug against a placebo or compare 2 training conditions. You can use the 1-way ANOVA to test several psychotherapies, compare levels of a drug or brands of computers.

Also, the procedures we’ve studied so far can be combined into more complex models. The most complex models have more variables but they are variations of the themes we have already encountered.

ANOR

Analysis of variance (ANOR) tests a regression to see how straight of a line it is. It is a goodness of fit test. It tests how good the data fits our straight line model.

Starting off, we assume our data looks like chance. It is not an organized pattern; it’s a circle with no linearity. Our null hypothesis is that our data has no significant resemblance to a straight line. We are assuming our data will not match (fit) our model (straight line). We will keep that assumption until it is clear that the data fits the model. But the fit has to be good; it has to be significant.

We are using X to predict Y. We are hoping the variations in Y can be explained by the variations in X. Prediction is based on commonality. When X and Y are highly correlated, it is easy to make predictions from one variable to another. When there is little or no correlation, X is not a good predictor of Y; they are operating independently.

In statistic talk, an ANOR partitions the variance into mean squares Regression (what we understand) and mean squares Error (what we can’t explain). Mean squares is another name for variance. We are going to make a ratio of understood variance to not-understood variance. We will compare this ratio with the values in an F table.

Factorial ANOVA

A factorial AVOVA is good for testing interactions. It is like combining 1-way ANOVAs together. The purpose of combining the designs is to test for interactions. A 1-way ANOVA can test to see if different levels of salt will influence compliments but what happens if the soft drink is both salty and sweet.

Interactions can be good or bad. Some heart medications work better when given together. For example, Digoxin and calcium channel blockers go together because they work on different channels. Together they are better than each would be separately. But other heart medications (phenylpropanolamine with MAO inhibitors) can result in fast pulse, increased blood pressure, and even death. This is why we’re often warned not to mix drugs without checking with our doctor.

The ability to check how variables interact is the primary advantage of complex research designs and advanced statistical techniques. Although a 1-Way ANOVA can test to see if different levels of aspirin help relieve headaches. A factorial ANOVA can be used to test both aspirin and gender as predictors of headaches. Or aspirin, gender, time of day, caffeine, and chicken soup. Any number of possible explanations and combination of explanations can be tested with the techniques of multiple regression, MANOVA, factorial ANOVA and causal modeling.

A factorial ANOVA tests the impact of 2 or more independent variables on one dependent variable. It tests the influence of many discrete variables on one continuous variable. It has multiple independent variables and one dependent variable.

Advanced procedures: 1-way ANOVA

A 1-way ANOVA model tests multiple levels of 1 independent variable. Let’s assume the question is if stress causes people to work multiplicationn problems. Subjects are randomly assigned to a treatment level (high, medium and low, for example) of one independent variable (stress, for example). And their performance on one dependent variable (number of errors) is measured.

If stress impacts performance, you would expect errors to increase with the level of attention. The variation between the cells is due to the treatment given you. Variation within each cell is thought to be due to random chance.

A 2-way ANOVA has 2 independent variables. Here is a design which could look at gender (male; female) and stress (low, medium and high):

It is called a 2×3 (“two by three”) factorial design. If each cell contained 10 subjects, there would be 60 subjects in the design. A design for amount of student debt (low, medium and high) and year in college (frosh, soph, junior and senior) and ) would have 1 independent variable (debt) with 3 levels and 1 independent (year in school) with 4 levels.

Advanced procedures: 3x4 design
This is a 3×4 factorial design. Notice that the number (3, 4, etc) tells how many levels in the independent variable. The number of numbers tells you how many independent variables there are. A 2×4 has 2 independent variable. A 3×7 has 2 independent variables (one with 3 levels and one with 7 levels). A 2x3x4 factorial design has 3 independent variables.

Factorial designs can do something 1-way ANOVAs can’t. Factorial designs can test the interaction between independent variables. Taking pills can be dangerous and driving can be dangerous; but it often is the interaction between variables that interests us the most.

Advanced procedures: main effect Analyzing a 3×4 factorial design involves 3 steps: columns, rows and cells. The factorial ANOVA tests the columns of the design as if each column was a different group. Like a 1-way ANOVA, this main effect tests the columns as if the rows didn’t exist.

The second main effect (rows) is tested as if each row was a different group. Advanced procedures: nd main effect
It tests the rows as if the columns didn’t exist. Notice that each main effect is like doing a separate 1-way ANOVA on that variable.

Advanced procedures: interaction The cells also are tested to see if one cell is significantly larger (or smaller) than the others. This is a test of the interaction and checks to see if a single cell is significantly different from the rest. If one cell is significantly higher or lower than the others, it is the result of a combination of the independent variables.

Multiple Regression

An extension of simple linear regression, multiple regression is based on observed data. In the case of multiple regression, two or more predictors are used; there are multiple predictors and a single criterion.

Let’s assume that you have selected 3 continuous variables as predictors and 1 continuous variable as criterion. You might want to know if gender, stress and time of day impact typing performance.

Each predictor is tested against the criterion separately. If a single predictor appears to be primarily responsible for changes in the criterion, its influence is measured. Every combination of predictors is also used tested. So both main effects and interactions can be tested. If this sounds like a factorial ANOVA, you’re absolutely correct.

You could think of Multiple Regression and ANOVA as siblings. factorial ANOVAs use discrete variables; Multiple Regression uses continuous variables. If you were interested in using income as one of your predictors (independent variables), you could use discrete categories of income (high, medium and low) and test for significance with an ANOVA. If you wanted to use measure income on a continuous variable (actual income earned), the procedure would be a Multiple Regression.

You also could think of Multiple Regression and the parent of ANOVA. Analysis of Variance is actually a specific example of Multiple Regression; it is the discrete variable version. Analysis of Variance uses categorical predictors. Multiple Regression can use continuous or discrete predictors (in any combination); it is not restricted to discrete predictors.

Both factorial ANOVA and Multiple Regression produce a F statistic and both have only one outcome measure. Both produce a F score that is compared to the Critical Values of F table. Significance is ascribed if the calculated value is large than the standard given in the table.

Both procedures have only one outcome measure. There may be many predictors in a study but there is only one criterion. You may select horse weight, jockey height, track condition, past winning and phase of the moon as predictors of a horse race but only one outcome measure is used. Factorial ANOVA and Multiple Regression are multiple predictor-single criterion procedures.

Multivariate Analysis

Smetime called MANOVA (pronounced man-o-va), multivariate analysis is actually an extension of multiple regression. Like multiple regression, multivariate analysis has multiple predictors. In addition to multiple predictors, multivariate analysis allows multiple outcome measures.

Advanced procedures: multivariate analysis Now it is possible to use gender, income and education as predictors of happiness AND health. You are no longer restricted to only a single criterion. With multivariate analysis, the effects and interactions of multiple predictors can be examined. And their impact on multiple outcomes can be assessed.

The analysis of a complex multiple-predictor multiple-criteria model is best left to a computer but the underlying process is the calculation of correlations and linear regressions. As variables are selected for the model, a decision is made whether it is predictor or a criterion. Obviously, aside from the experimenter’s theory, the choice of predictor or criterion is arbitrary. In multivariate analysis, a variable such as annual income could be either a predictor or a criterion.

Complex Modeling

There are a number of statistical procedures at the high end of modeling. Relax! You don’t have to calculate them. I just want you to know about them.

In particular, I want to make the point that there is nothing scary about the complex models. There are involved and require lots of tedious calculations but that’s why God gave us computers. Since we are blessed to have stupid but remarkably fast mechanical slaves, we should let them do the number crunching.

Advanced procedures: causal modeling

It is enough for us to know that a complex model—at its heart—is a big bundle of correlations and regressions. Complex models hypothesize directional and nondirectional relationships between variables. Each factor may be measured by multiple measures. Intelligence might be defined as the combination of 3 different intelligence tests, for example.

And income might be a combination of both salary plus benefits minus vacation. And education might be years in school, number of books read and number of library books checked out. The model, then, becomes the interaction of factors that are more abstract than single variable measures.

Underlying the process, however, are principles and procedures you already know. Complex models might try to determine if one more predictor helps or hurts but the model is evaluated just like a correlation: percentage of variance accounted for by the relationships.

How To Calculate Statistics

Want to jump ahead?

Book

Statictics Safari

Photo by Ray Fragapane on Unsplash

March 29, 2023 by ktangen

One-Way ANOVA

Day 9: Pre-analysis

One-Way ANOVA Is a pretest of variance before you do other analyses. When more than 2 groups are to be compared, multiple t-tests are conducted because of the increased likelihood of Type I error. Instead, before subgroup comparisons are made, the variance of the entire design is analyzed. This pre-analysis is called an Analysis of Variance (ANOVA for short). Using the F-test (like an Analysis of Regression), an ANOVA makes a ratio of variance between the subgroups (due to the manipulation of the experimenter) to variance within the subgroups (due to chance).

1-way ANOVA: Boats are like independent variable levels; same category different

Essentially, a 1-Way ANOVA is an overgrown t-test. A t-test compares two means. A 1-Way ANOVA lets you test the differences between more than two means. Like a t-test, there is only one independent variable (hence the “1-way”). It is an ANOVA because it analyzes the variance in the scores. The acrostic ANOVA stands for ANalysis Of VAriance.

In general, you can design experiments where people are re-used (within-subjects designs) or used only once (between-subjects design). The difference is all about time.

Within-Subjects Designs

Sometimes we want to take repeated measures of the same people over time. These specialized studies are called within-subjects or repeated measures designs. Conceptually, they are extensions of the correlated t-test; the means are compared over time.

Like correlated t-tests, the advantages are that subjects act as their own controls, eliminating the difficulty of matching subjects on similar backgrounds, skills, experience, etc. Also, within-subject designs have more power (require less people to find a significant difference) and consequently are cheaper to run (assuming you’re paying your subjects).

They also suffer from the same disadvantages. There is no way of knowing if the effects of trial one wear off before the subjects get trial 2. The more trials in a study the larger the potential problem. In a multi-trial study, the treatment conditions could be impossibly confounded.

A more detailed investigation of within-subject designs is beyond the score of this discussion. For now, realize that it is possible, and sometimes desirable, to construct designs with repeated measures on the same subjects. But it is not a straight-forward proposition and requires more than an elementary understanding of statistics. So we’re going to focus on between-subjects designs.

Between-Subjects Designs

In a between-subject design, subjects are randomly assgined to groups. The groups vary along one independent variable. It doesn’t matter if you have 3 groups (high, medium and low) or ten groups or 100 groups…as long as they only vary on one dimension. Three types of cars is one independent variable (cars) with 3 groups. Ten types of ice cream can also be one independent variable: flavor.

Like an Analysis of Regression, an Analysis of Variance uses a F test. If F is equal to or larger than the value in the standard table, the F is considered significant, and the results are unlikely to be due to chance.

1-Way

1-way ANOVA It is called 1-way because there is one independent variable is this design. It is called an ANOVA because that’s an acrostic for ANalysis Of VAriance. An 1-way analysis of variance is a pre-test to prevent Type I error.

Although we try to control Type I error by setting our alpha level at a reasonable level of error (typically 5%) for one test, when we do several tests, we run into increased risk of seeing relationships that don’t exist. One t-test has a 5/100 chance of having Type I error. But multiple t-tests on the same data set destroy the careful controls we set in place.

We can use a t-test to compare the means of two groups. But to compare 3, 4 or more groups, we’d have to do too many t-tests; so many that we’d risk finding a significant t-test when none existed. If there were 4 groups (A, B, C and D, we’ll call them), to compare each condition to another you’d have to make the following t-tests:

The chances too good that we’ll find one of those tests to look significant but not be. What we need is a pre-analysis of data to test the overall design and then go back, if the overall variance is significant, and conduct the t-tests.

Comparisons

A Ratio

The premise of an ANOVA is to compare the amount of variance between the groups to the variance within the groups.

The variance within any given group is assumed to be due to chance (one subject had a good day, one was naturally better, one ran into a wall on the way out the door, etc.). There is no pattern to such variation; it is all determined by chance.

If no experimental conditions are imposed, it is assumed that the variance between the groups would also be due to chance. Since subjects are randomly assigned to the groups, there is no reason other than chance that one group would perform better than another.

After the independent variable is manipulated, the differences between the groups is due to chance and the independent variable. By dividing the between group variance by the within variance, the chance parts should cancel each other out. The result should be a measure of the impact the independent variable had on the dependent variable. At least that’s the theory behind the F test.

The F Test

F table
Yes, this is the same F test we used doing an Analysis of Regression. And it has the same summary table.

Notice that the titles have changed. We know talk about Between Sum of Squares, not Regression SS. The F test (named after its author, R.A. Fischer) is the ratio of between-group variance (called between mean squares or mean squares between) to within-group variance (called within mean squares or mean squares within).

What To Do

After you calculate the F, you compare it to the critical value in a table of Critical Values of F. There are several pages of critical values to choose from because the shape of F distribution changes as the number of subjects in the study decreases. To find the right critical value, go across the degrees of freedom for regression (df between) and down the df within.

Simply compare the value you calculated for F to the one in the table. If your F is equal to or higher than the book, you win: what you see is significantly different from chance. The table we most often use is the .05 alpha level because our numbers aren’t very precise, so we’re willing to accept 5% error in our decisions. In other words, our alpha level is set .05 (the amount of error we are willing to accept). Setting the criterion at .05 alpha indicates that we want to be wrong no more than 5% of the time. Being wrong in this context means to see a significant relationship where none exists.

5% Error

Two points should be made: (a) 5% is a lot of error and (b) seeing things that don’t exist is not good. Five-percent of the population of the US is over 50 million people; that’s a lot of error. If elevators failed 5% of the time, no one would ride them. If OPEC trims production by 5%, they cut 1.5 million barrels a day. There are 230 million people use the internet, about 5% of the world’s population.

1-Way ANOVA: 5% error We use a relatively low standard of 5% because our numbers are fuzzy. Social science research is like watching a tennis game through blurry glasses which haven’t been washed in months. We have some understanding of what is going on-better than if we hadn’t attended the match-but no easy way to summarize the experience.

Second, seeing things that don’t exist is dangerous. In statistics, it is the equivalent of hallucination. We want to see the relationships that exist and not see additional ones that live only in our heads. Decisions which produce conclusions of relationship that don’t exist are called Type I errors.

If Type I error is statistical hallucination, Type II error is statistical blindness. It is NOT seeing relationships when they do exist. Not being able to see well is the pits (I can tell you from personal experience) but it’s not as bad as hallucinating. So we put most of our focus on limiting Type I error.

We pick an alpha level (how much Type I error we are willing to accept) and look up its respective critical value. If the F we calculate is smaller than the critical value, we assume the pattern we see is due to chance. And we continue to assume that it is caused by chance until it is so clear, so distinct, so accurate and it can’t be ignored. We only accept patterns that are significantly different from chance.

When the F we calculate is larger than the critical value, we are 95% sure that the pattern we see is not caused by chance. By setting the alpha level at .05, we have set the amount of Type I decision error at 5%.

Interpretation

If the F is significant, what do we do now?

Now, all of those t-tests we couldn’t do because we were afraid of Type I error are available for our calculating pleasure. So we do t-tests between:

AB
AC
BC

We might find that there is a significant difference between each group. Or we might find that there is a not a significant difference between two of the groups but that there is a significant difference between them and the third group.

Also, which did best depends on whether the numbers are of money (you want the higher means) or errors (you want to lower means). Doing the t-tests between each combination of means will tell us which ones are significant, and which are likely to be due to chance.

Just think, if the F had not been significant, there would not be anything left to do. We would have stopped with the calculating of F and concluded that the differences we see are due to chance. How boring, huh? It’s a lot more fun to do lots of t-tests. Where’s my calculator?

Advanced Procedures

Want to jump ahead?

Book

Statictics Safari

Photo by chris robert on Unsplash

Photo by Tsunami Green on Unsplash

March 29, 2023 by ktangen

Independent t-Test

Day 8: Testing 2 means

A t-test asks whether two means are significantly different. If the means, as representatives of two samples of the same variable, are equal or close to equal, the assumption is that the differences seen are due to chance. If the means are significantly different, the assumption is that the differences are due to the impact of an independent variable.

Independent t-test

Independent t-tests are an extension of z scores. Instead of comparing a score to a mean, t-tests compare two means. Two means are thought to be from the same population until they are so different they are very unlikely to be the same. The question is how different can you be and still be the same.

Assume that the t you calculated was a person. If that score is close to the mean of the t distribution, it is not significant; there are too many scores hanging around the mean to make it special. But if your calculated score is at one extreme of the distribution, this would be usual (or in stats terms: “significant”), and the relationship between your score and the t distribution might look like this:

When subjects are randomly assigned to groups, the t-test is said to be independent. That is, it tests the impact of an independent variable on a dependent variable. The independent variable is dichotomous (yes/no; treatment/control; high/low) and the dependent variable is continuous. If significant, the independent t-test supports a strong inference of cause-effect.

When subjects are given both conditions (both means are measures of the same subjects at different times), the t-test is said to be dependent or correlated. Because it uses repeated measures, the correlated-t is often replaced by using a regression (where the assumptions of covariance are more clearly stated).

You know what it is to be independent. It means you are in control. In research, an independent test means that the experimenter is in control. Subjects get the treatment condition that the experimenter chooses. The choice is independent of what the subject does, thinks or feels.

One of the most common approaches is for experimenters to randomly assign subjects to treatment or control. Subjects don’t know, and don’t choose, which treatment they get.

Independent t-test

The independent t-test assumes that one group of subjects has been randomly assigned to 2 groups. Each group contains the same number of subjects, has its own mean and has its own standard deviation.

Conceptually, the t-test is an extension of the z-score. A z score compares the difference between a raw score and the mean of the group to the standard deviation of the group. The result is the number of standard deviations between the score and the group mean.

Similarly, a t-test compares the difference between 2 means to the standard deviation of the pooled variance. That is, one mean pretends to be a raw score and the other mean is the mean of the group. The difference between these means is divided by a standard deviation; it’s calculated a little funny but conceptually it’s equivalent to the standard deviation used in calculating a z score.

An independent t-test is the difference between two means divided by their pooled variance Like a z score, a t-test is evaluated by comparing the calculated value to a standard. In the case of a z score, the standard is the Area Under the Normal Curve. Similarly, a t-test compares its calculated value to a table of critical values. When N is large (infinity, for example), the values in the two tables are identical.

For example, in a one-tailed test at .02 alpha, the critical region would be the top 2% of the distribution. A more common standard is an alpha level of .05; here the critical region would be the top 5% of the distribution. The z-score would be the one where 5% was beyond the z and 45% was between the mean and z (there’s another 50% below the mean but the table doesn’t include them). The appropriate z-score for the location where there is 5% beyond is 1.65. In the Critical Values of Student’s t, the critical value at the bottom of the .05 alpha 1-tailed column is 1.65.

Similarly, in a two-tailed test at the .05 alpha, the critical region would be the bottom 2.5% and the top 2.5%. The z-score for the bottom 2.5% is -1.96 and the z-score for the top 2.5% is +1.96. In the Critical Values of Student’s t table, the critical value at the bottom of the .05 alpha 2-tailed column is 1.96.

When the t-test has an infinite number of subjects, its critical value is the same as a z-score. At infinity, t-tests could be evaluated by referring to the Areas Under the Normal Curve table. A t-test, however, usually has a small number of subjects. Consequently, the values are modified to adjust for the small sample size.

Significance

The t-test tells us if there is a significant difference between the means. It is as if two armies met and decided that each side would send a representative to battle it out. The representative would not be the best from each side but the average, typical member of their respective groups. Similarly, by comparing the means, we are comparing the representatives of two groups. The entire cast is not involved, only a representative from each side.

An independent t-test is compared to a critical region
We typically do a two-tailed test. That is, we want to know if Group 2 is significantly better than Group 1 AND if it is significantly worse. We want to know both things, so we start at the mean and assume that in order to be significantly different from chance, the t statistic has to be at either of the 2 tails. At .05 alpha (the amount of Type I error we are willing to accept), the critical region is split into two parts, one at each tail. Although the overall alpha level is 5%, there is only 2.5% at each tail.

In one-tailed tests, the entire 5% in a .05 alpha test would be at one end. That is, we would only want to know if Group 2 was significantly better than Group 1; we wouldn’t care if it was worse. It doesn’t happen very often that our hypotheses are so finely honed that we are interested in only one end of the distribution. We general conduct a 2-tailed test of significance. Consequently, the t statistic might be positive or negative, depending on which mean was put first. There is no theoretical reason why one mean should be placed first in a two-tailed test, so apart from identifying which group did better, the sign of the t-test can be ignored.

Correlated t-test

Correlated t-tests are sometimes called repeated-measures or within-subjects designs.
Instead of randomly assigning subjects, some studies reuse people. The advantage is that each person acts as their own control group. Since no one is more like you than you, the control group can’t be more like the treatment group. The t-test for repeated measures designs is called a correlated t-test.

The second advantage is that a correlated t-test has more power (is able to use less people to conduct the study). An independent t-test has N-2 degrees of freedom. So if 20 people are randomly assigned to 2 groups, the study has 18 degrees of freedom. In a correlated t-test, if we use all 20 people, the study has 19 degrees of freedom.

The third advantage to correlated designs (also called within-subjects or repeated measures designs) is cost. Reusing people is cheaper. If subjects are paid to participate, they are paid for being in the study, regardless of how many trials it takes. Reusing people is also cheaper in time, materials and logistical effort. Once you have a willing subject, it’s hard to let them go.

The primary disadvantage of a correlated t-test is that it is impossible to tell if the effects of receiving one treatment will wear off before receiving the second condition. If people are testing 2 drugs, for example, will the first drug wear off before subjects are given the second drug?

A second problem with the pre- and post-test design often used with correlated t-tests is in its mathematical assumptions. Although the arguments are beyond the scope of this discussion, statisticians differ on the theoretical safety of using difference scores. Some worry that subtracting post-tests from pre-tests may add additional error to the process.

Consequently, a better way of testing correlated conditions is to use a correlation, a linear regression or an analysis of regression. Correlations test for relationship and can be used on ordinal and ratio data. Similarly, linear regression and analysis of regression make predictions and test for goodness of fit without relying on difference scores.

Using t-tests for hypothesis testing and estimation

Hypothesis testing is like venturing out onto a frozen lake. The primary hypothesis is that the lake is frozen but you proceed as if it weren’t. You’re cautious until you’re sure the ice is thick enough to hold you. The H0 is that the ice is not frozen; this is your null hypothesis (no difference from water). When you have tested the ice (jumping up and down on it or cutting a hole in it to measure the thickness of the ice), you then decide to accept the null hypothesis (no difference from water) or reject that hypothesis and accept the H1 hypothesis that the lake is frozen and significantly different from water.

We use t-tests to make confidence estimations. When t is significant, we are saying that we are confident that our findings are true 95% of the time (assuming the alpha level is set at .05). Our confidnece estimates are interval estimates of a distribution of t scores. A significant t says that its value falls in a restricted part of the distribution (upper 5%, for example).

Estimation is like getting your car fixed. If you go to a repair shop and they estimate the cost to repair you car is $300, that’s a point estimate. An interval estimate would be a range of numbers. If the shop says it will cost between $200-400, that’s an interval estimate

One-Way ANOVA

Want to jump ahead?

Book

Statictics Safari

Photo by Gigi on Unsplas

March 29, 2023 by ktangen

Probability

Day 7: Chance

Moving from describing events to predicting their likelihood involves probabilities, odds, and the use of Fisher’s F test. Probabilities compare of an event to the total number of possible events (4 aces out of 52 cards equals a probability of .077). Odds compare sides: 4 aces in a deck against 48 cards that aren’t aces (which equals odds of 1:12).

Probability

We base many of our decisions on probabilities. Is it likely to rain tomorrow? What is the probability of a new car breaking down? What are the chances our favor team will win their next game?

We are in search of causation. We want to know if what we see is likely to be due to chance. Or are we seeing a pattern that has meaning. So we begin by calculating the likelihood of events occurring by chance. We calculate probabilities and odd.

Probabilities and odds are related but not identical. They are easy to tell apart because probabilities are stated as decimals and odds are stated as ratios. But the big difference between them is what they compare. Probabilities compare the likelihood of something occurring to the total number of possibilities. Odds compare the likelihood of something occurring to the likelihood of it’s not occurring.

If you roll your ordinary, friendly six-sided die with the numbers 1 through 6 (one on each side), the probability of getting a specific number is .167. This is calculated by taking how many correct answers there are (1), divided by how many total possibilities (6), and expressing it in decimal form (.167). The odds of getting a specific number is how many correct answers (1), against how many incorrect answers. So the odds of rolling a 4 is 1:5…or 5:1 against you.

Let’s try another example. The odds of pulling a King out of a deck of cards is the number of possible correct answers (4), against the number of incorrect answers (48). So the odds are 4:48, which can be reduced to 1:12. The probability of pulling an ace is 4 divided by 52, which equals .077. Probabilities are always decimals, and odds are always ratios.

To calculate the probability of two independent events occurring at the same time, we multiply the probabilities. If the probability of you eating ice cream is .30 (you really like ice cream) and the probability of your getting hit by a car is .50 (you sleep in the middle of the street), the probability that you’ll be eating ice cream when you get hit by a car is .15. Flipping a coin twice (2 independent events) is calculated by multiplying .5 times .5. So the probability of rolling 2 heads in a row is .25. Rolling snake eyes (ones) on a single roll of a pair of dice has a probability of .03 (.167 times .167).

A major question in research is whether or not the data looks like chance. Does the magic drug we created really cure the comon cold or is it a random pattern that just looks like the real thing?

To answer our question, we compare things we think aren’t due to chance to those which we believe are due to chance. We know people vary greatly in performance. We all have strengths and weaknesses. So we assume that people in a control will vary because of chance, not because of what we did to them. But people in different treatment groups should vary because of the experiment, and not just because of chance. Later, we will use this procedure to compare differences between experimental groups to variation within each group. That is, we will compare between-subjects variance to error variance (within-subjects variance).

For the present, we can use the same test (Fischer’s F) to test the significance of a regression. Does the data we collected approximate a straight line? An Analysis of Regression (ANOR) tests whether data is linear. That is, it tests of the fit of the data to a straight line. It, like regression, assumes the two variables being measured are both changing. It works well for testing two continuous variables (like age and height in children) but not so well when one of the variables no longer varies (like age and height in adults).

Analysis of Regression (ANOR) is an application of probability to linear regression. The ANOR uses a F-test, which is a ratio of variances. It is the ratio of understood variance to unexplained variance. To find the likelihood that a regression can be explained by a straight line, the number derived from an F test is compared to a table of probabilities. If the value you calculated is bigger than (or equal to) the value in the book, the pattern you see in the data is unlikely to be due to chance.

Probability and scatter plot
When one thing causes another it tends to have a linear pattern. Someone stomps on your toes, you scream. The harder the stomp, the more you scream. If you graphed this pattern, you’d find that pressure and loudness looked somewhat like a diagonal line. It would only be somewhat straight because high pressure doesn’t always hurt more than light pressure, it just usually does.

In contrast, if there were no relationship between pressure and loudness, the pattern would be more like a circle. The relationship between chance variables has its own pattern: no consistent pattern. Chance tends to change. It might look exactly like a circle, then somewhat like a circle, somewhat like a straight line, and then back to looking like a circle. The trick with chance is inconsistency.

The reason we replicate studies is to show consistency. We know that the results of one study might look like causation but the next study will show opposite results. The results of a single study might be due to chance. So we conduct a series of studies in the area, publish the results, and let other researchers try to replicate our findings.

Null Hypothesis

Our other protection against chance is we use a null hypothesis. We start with the hypothesis that what we see is going to be due to chance. And we discard that hypothesis only when there is a substantial reason to do so. We assume that a variable has no significant impact on another. Unless we have evidence to the contrary, we accept the nullhypothesis. We need a significant amount of evidence for us to reject the null hypothesis, and to say that there is a significant relationship between variables.

Our approach is equivalent to presuming the innocence of a criminal suspect. People are assumed to be innocent until proved guilty. And if they are not proved guilty, they are not declared innocent; they are “not guilty.” Similarly, the lack of a significant finding in research doesn’t mean that a causal relationship doesn’t exist, only that we didn’t find it.

Alpha Level

We also limit the amount of error we are willing to accept. That is, we decide ahead of time how far away from being a circle (how close to being a straight line) does the data have to be before we say it is significant. It’s good to set the standard beforehand. Otherwise, we’d be tempted to change the standard to fit the data. To avoid that problem, we typically only allow an error rate of 5%, which is the same cutoff score we used for z-scores and correlations.

We could use a higher standard: 3% error or 1% error, but we use a relative low standard of 5% because our numbers are fuzzy. Social science research is like watching a tennis game through blurry glasses that haven’t been washed in months. We have some understanding of what is going on—better than if we hadn’t attended the match—but no easy way to summarize the experience. So 5% is a good level for us.

Decision Error

The kind of error we are limiting to 5% is decision error. We’re setting the point in the distribution beyond which we’re going to discard our null hypothesis (nothing is going on) and accept our alternative hypothesis (something is going on). If we set the standard too low, everything we test will look significant. Seeing significant finding in random events is the equivalent of a statistical hallucination. We only want to see the relationships that exist and not see additional ones that live only in our heads. Decisions which produce conclusions of relationship that don’t exist are called Type I errors.

We pick an alpha level (amount of Type I error) and look up its respective critical value in a table. If the value we calculate is smaller than the critical value, we assume the pattern we see is due to chance. And we continue to assume that it is caused by chance until it is so clear, so distinct, so significant that it can’t be ignored. We only accept patterns that are significantly different from chance.

Independent t test

Want to jump ahead?

Book

Statictics Safari

Photo by Naser Tamimi on Unsplash

March 29, 2023 by ktangen

Regression

Day 6: Prediction

With regression, you can make predictions. Accurate predictions requires a strong correlation. When there is a strong correlation between two variables (positive or negative), you can make accurate predictions from one to the other. If sales and time are highly correlated, you can predict what sales will be in the future…or in the past. You can enhance the sharpness of an image by predicting what greater detail would look like (filling in the spaces between the dots with predicted values). Of course the accuracy of your predictions, depends on the strength of the correlation. Weak correlations produce lousy predictions.

Regression is like looking into the future or the past

An extension of the correlation, a regression allows you to compare your data looks to a specific model: a straight line. Instead of using a normal curve (bell-shaped hump) as a standard, regression draws a straight line through the data. The more linear your data, the better it will fit the regression model.Once a line of regression is drawn, it can be used to make specific predictions. You can predict how many shoes people will buy based on how many hats they buy, assuming there is a strong correlation between the two variables.

Just as a correlation can be seen in a scatterplot, a regression can be represented graphically too. A regression would look like a single straight line drawn through as many points on the scatterplot
as possible. If your data points all fit on a straight line (extremely unlikely), the relationship between the two variables would be very linear.

Scatterplot J

Most likely, there will be a cluster or cloud of data points. If the scatterplot is all cloud and no trend, a regression line won’t help…you wouldn’t know where to draw it: all lines would be equally bad.

But if there the scatterplot reveals a general trend, some lines will obviously be better than others. In essence, you try draw a line that follows the trend but divides or balances the data points equally.

In a positive linear trend, the regression line will start in the bottom left part of the scatterplot and go toward the top right part of the figure. It won’t hit all of the data points but it will hit most or come close to them.

Regression line through data

You can use either variable as a predictor. The choice is yours. But the results mostly likely won’t be the same, unless the correlation between the two variables is perfect (either +1 or -1). So it matters which variable is selected as a predictor and which is characterized as the criterion (outcome variable).

Predicting also assumes that the relationship between the two variables is strong. A weak correlation will produce a poor line of prediction. Only strong (positive or negative) correlations will produce accurate predictions.

A regression allows you to see if the data looks like a straight line. Obviously, if your data is cyclical, a straight line won’t represent it very well. But if there is a positive or negative trend, a straight line is a good model. It is not so much that we apply the model to the data; more like we collect the data and ask if it looks this model (linear), that model (circular or cyclic) or that model (chance).

If the data approximates a straight line, you can then use that information to predict what will happen in the future. Predicting the future assumes, of course, that conditions remain the same. The stock market is hard to predict because it gets changing, up and down, slowly up, quickly down. It’s too erratic to predict its future, particularly in the short run.

If you roll a bowling ball down a lane and measure the angle it is traveling, you can predict where the ball will hit when it reaches the pins. The size, temperature and shape of the bowling lane are assumed to remain constant for the entire trip, so a linear model would work well with this data. If you use the same ball on a grass lane which has dips and bulges, the conditions are not constant enough to accurately predict its path.

Predicting the future also assumes that the relationship between the two variables is strong. A weak correlation will produce a poor line of prediction. Only strong (positive or negative) correlations will produce accurate predictions.

A regression is composed of three primary characteristics. Any two of these three can be used to draw a regression line: pivot point, slope and intercept.

First, the regression line always goes through the point where the mean of X and the mean of Y meet. This is reasonable since the best prediction of a variable (knowing nothing else about it) is its mean. Since the mean is a good measure of central tendency (where everyone is hanging out), it is a good measure to use.

Second, a regression line has slope. For every change in X, slope will indicate the change in Y. If the correlation between X and Y is perfect, slope will be 1; every time X gets larger by 1, Y will get larger by 1. Slope indicates the rate of change in Y, given a change of 1 in X.

Third, a regression line has a Y intercept: the place where the regression line crosses the Y axis. Think of it as the intersection between the sloping regressing line and vertical axis.

Regression means to go back to something. We can regress to our childhood; regress out of a building (leave the way we came in). Or regress back to the line of prediction. Instead of looking at the underlying data points, we use the line we’ve created to make predictions. Instead of relying on real data, we regress to our prediction line.

There are two major determinants of a prediction’s accuracy: (a) the amount of variance the predictor shares with the criterion and (b) the amount of dispersion in the criterion.

Taking them in order, if the correlation between the two variables is not strong, it is very difficult to predict from one to the other. In a strong positive correlation, you know that when X is low Y is low. Know where one variable is makes it easy to the general location of the other variable.

A good measure of predictability, therefore, is the coefficient of determination (calculated by squaring r). R-squared (r2) indicates how much the two variables have in common. If r2is close to 1, there is a lot of overlap between the variables and it becomes quite easy to predict one from the other.

Even when the correlation is perfect, however, predictions are limited by the amount of dispersion in the criterion. Think of it this way: if everyone has the same score (or nearly so), it is easy to predict that score, particularly if the variable is correlated with another variable. But if everyone has a different score (lots of dispersion from the mean), guessing the correct value is difficult.

The standard error of estimate (see) takes both of these factors into consideration and produces a standard deviation of error around the prediction line. A prediction is presented as plus or minus its see.

The true score of a prediction will be within 1 standard error of estimate of the regression line 68% of the time. If the predicted score is 15 (just to pick a number), we’re 68% sure that the real score is 15 plus or minus 3 (or whatever the see is).

Similarly, we’re 96% sure that the real score falls within two standard deviations of the regression line (15 plus or minus 6). And we’re 99.9% sure that the real score fall within 3 see of the prediction (15 plus or minus 9).

Statistics

How To Calculate

Day 10: Complex models

ANOR

Factorial ANOVA

Multiple Regression

Multivariate Analysis

Complex Modeling

Want to jump ahead?

Book

Day 9: Pre-analysis

Within-Subjects Designs

Between-Subjects Designs

1-Way

A Ratio

The F Test

What To Do

5% Error

Interpretation

Want to jump ahead?

Book

Photo by chris robert on Unsplash

Day 8: Testing 2 means

Independent t-test

Significance

Correlated t-test

Using t-tests for hypothesis testing and estimation

Want to jump ahead?

Book

Photo by Gigi on Unsplas

Day 7: Chance

Null Hypothesis

Alpha Level

Decision Error

Want to jump ahead?

Book

Photo by Naser Tamimi on Unsplash

Day 6: Prediction

Want to jump ahead?

Book

Photo by Daniel Lerman on Unsplash

Footer

Search

KenTangen.com

My Channel