Day 7: Chance
Moving from describing events to predicting their likelihood involves probabilities, odds, and the use of Fisher’s F test. Probabilities compare of an event to the total number of possible events (4 aces out of 52 cards equals a probability of .077). Odds compare sides: 4 aces in a deck against 48 cards that aren’t aces (which equals odds of 1:12).
We base many of our decisions on probabilities. Is it likely to rain tomorrow? What is the probability of a new car breaking down? What are the chances our favor team will win their next game?
We are in search of causation. We want to know if what we see is likely to be due to chance. Or are we seeing a pattern that has meaning. So we begin by calculating the likelihood of events occurring by chance. We calculate probabilities and odd.
Probabilities and odds are related but not identical. They are easy to tell apart because probabilities are stated as decimals and odds are stated as ratios. But the big difference between them is what they compare. Probabilities compare the likelihood of something occurring to the total number of possibilities. Odds compare the likelihood of something occurring to the likelihood of it’s not occurring.
If you roll your ordinary, friendly six-sided die with the numbers 1 through 6 (one on each side), the probability of getting a specific number is .167. This is calculated by taking how many correct answers there are (1), divided by how many total possibilities (6), and expressing it in decimal form (.167). The odds of getting a specific number is how many correct answers (1), against how many incorrect answers. So the odds of rolling a 4 is 1:5…or 5:1 against you.
Let’s try another example. The odds of pulling a King out of a deck of cards is the number of possible correct answers (4), against the number of incorrect answers (48). So the odds are 4:48, which can be reduced to 1:12. The probability of pulling an ace is 4 divided by 52, which equals .077. Probabilities are always decimals, and odds are always ratios.
To calculate the probability of two independent events occurring at the same time, we multiply the probabilities. If the probability of you eating ice cream is .30 (you really like ice cream) and the probability of your getting hit by a car is .50 (you sleep in the middle of the street), the probability that you’ll be eating ice cream when you get hit by a car is .15. Flipping a coin twice (2 independent events) is calculated by multiplying .5 times .5. So the probability of rolling 2 heads in a row is .25. Rolling snake eyes (ones) on a single roll of a pair of dice has a probability of .03 (.167 times .167).
A major question in research is whether or not the data looks like chance. Does the magic drug we created really cure the comon cold or is it a random pattern that just looks like the real thing?
To answer our question, we compare things we think aren’t due to chance to those which we believe are due to chance. We know people vary greatly in performance. We all have strengths and weaknesses. So we assume that people in a control will vary because of chance, not because of what we did to them. But people in different treatment groups should vary because of the experiment, and not just because of chance. Later, we will use this procedure to compare differences between experimental groups to variation within each group. That is, we will compare between-subjects variance to error variance (within-subjects variance).
For the present, we can use the same test (Fischer’s F) to test the significance of a regression. Does the data we collected approximate a straight line? An Analysis of Regression (ANOR) tests whether data is linear. That is, it tests of the fit of the data to a straight line. It, like regression, assumes the two variables being measured are both changing. It works well for testing two continuous variables (like age and height in children) but not so well when one of the variables no longer varies (like age and height in adults).
Analysis of Regression (ANOR) is an application of probability to linear regression. The ANOR uses a F-test, which is a ratio of variances. It is the ratio of understood variance to unexplained variance. To find the likelihood that a regression can be explained by a straight line, the number derived from an F test is compared to a table of probabilities. If the value you calculated is bigger than (or equal to) the value in the book, the pattern you see in the data is unlikely to be due to chance.
When one thing causes another it tends to have a linear pattern. Someone stomps on your toes, you scream. The harder the stomp, the more you scream. If you graphed this pattern, you’d find that pressure and loudness looked somewhat like a diagonal line. It would only be somewhat straight because high pressure doesn’t always hurt more than light pressure, it just usually does.
In contrast, if there were no relationship between pressure and loudness, the pattern would be more like a circle. The relationship between chance variables has its own pattern: no consistent pattern. Chance tends to change. It might look exactly like a circle, then somewhat like a circle, somewhat like a straight line, and then back to looking like a circle. The trick with chance is inconsistency.
The reason we replicate studies is to show consistency. We know that the results of one study might look like causation but the next study will show opposite results. The results of a single study might be due to chance. So we conduct a series of studies in the area, publish the results, and let other researchers try to replicate our findings.
Null Hypothesis
Our other protection against chance is we use a null hypothesis. We start with the hypothesis that what we see is going to be due to chance. And we discard that hypothesis only when there is a substantial reason to do so. We assume that a variable has no significant impact on another. Unless we have evidence to the contrary, we accept the nullhypothesis. We need a significant amount of evidence for us to reject the null hypothesis, and to say that there is a significant relationship between variables.
Our approach is equivalent to presuming the innocence of a criminal suspect. People are assumed to be innocent until proved guilty. And if they are not proved guilty, they are not declared innocent; they are “not guilty.” Similarly, the lack of a significant finding in research doesn’t mean that a causal relationship doesn’t exist, only that we didn’t find it.
Alpha Level
We also limit the amount of error we are willing to accept. That is, we decide ahead of time how far away from being a circle (how close to being a straight line) does the data have to be before we say it is significant. It’s good to set the standard beforehand. Otherwise, we’d be tempted to change the standard to fit the data. To avoid that problem, we typically only allow an error rate of 5%, which is the same cutoff score we used for z-scores and correlations.
We could use a higher standard: 3% error or 1% error, but we use a relative low standard of 5% because our numbers are fuzzy. Social science research is like watching a tennis game through blurry glasses that haven’t been washed in months. We have some understanding of what is going on—better than if we hadn’t attended the match—but no easy way to summarize the experience. So 5% is a good level for us.
Decision Error
The kind of error we are limiting to 5% is decision error. We’re setting the point in the distribution beyond which we’re going to discard our null hypothesis (nothing is going on) and accept our alternative hypothesis (something is going on). If we set the standard too low, everything we test will look significant. Seeing significant finding in random events is the equivalent of a statistical hallucination. We only want to see the relationships that exist and not see additional ones that live only in our heads. Decisions which produce conclusions of relationship that don’t exist are called Type I errors.
If Type I error is statistical hallucination, Type II error is statistical blindness. It is NOT seeing relationships when they do exist. Not being able to see well is the pits (I can tell you from personal experience) but it’s not as bad as hallucinating. So we put most of our focus on limiting Type I error.
We pick an alpha level (amount of Type I error) and look up its respective critical value in a table. If the value we calculate is smaller than the critical value, we assume the pattern we see is due to chance. And we continue to assume that it is caused by chance until it is so clear, so distinct, so significant that it can’t be ignored. We only accept patterns that are significantly different from chance.
.
Independent t test
Want to jump ahead?
- What is statistics?
- Ten Day Guided Tour
- How To Calculate Statistics
- Start At Square One
- Practice Items
- Resources
- Final Exam