Day 3: Score diversity
We found the middle of the group because most people score about the same on any variable we measure. Now that you’ve found a representative for the group, how representative is the mean? Is the group unified and nearly everyone has the same score? Or are there wide fluctuations within the group? We want one number that will tell us if the scores are very similar to each other or if the group is composed of heterogeneous scores.
ll measures of dispersion get larger when the distribution of scores is more widely varied. A narrow distribution (a lot of similar scores) has a small amount of dispersion. A wide distribution (lots of different scores) has a wide distribution. The more dispersion, the more heterogeneous (dissimilar) the scores will be.
There are five measures of dispersion:
- Range
- Mean Absolute Deviation (Mean Variance)
- Sum of Squares
- Variance
- Standard Deviation
All measures of dispersion get larger when the distribution of scores is more widely varied. A narrow distribution (a lot of similar scores) has a small amount of dispersion. A wide distribution (lots of different scores) has a wide distribution. The more dispersion, the more heterogeneous (dissimilar) the scores will be.
Range
Range is easy to calculate. It is the highest score minus the lowest score. If the highest score is 11 and the lowest score is 3, the range equals 8.
Mean Absolute Deviation (MAD)
As the name suggests, mean variance (or mean absolute deviation) is a measure of variation from the mean. It is the average of the absolute values of the deviations from the mean. That is, the mean is subtracted from each raw score and the resulting deviations (called “little d’s”) are averaged (ignoring whether they are positive or negative).
Sum of Squares
Conceptually, Sum of Squares (abbreviated SS) is an extension of mean variance. Instead of taking the absolute values of the deviations, we square the critters (deviations), and add them up.
Variance
Variance of a population is always SS divided by N. This is true whether it is a large population or a small one. Variance of a large sample (N is larger than 30) is also calculated by Sum of Squares divided by N. If there are 40 or 400 in the sample, variance is SS divided by N.
However, if a sample is less than 30, it is easy to underestimate the variance of the population. Consequently, it is common practice to adjust the formula for a small sample variance. If N<30, variance is SS divided by N-1. Using N-1 instead of N results is a slightly larger estimate of variance and mitigates against the problem of using a small sample.
Standard deviation
This measure of dispersion is calculated by taking the square-root of variance. Regardless of whether you used N or N-1 to calculate variance, standard deviation is the square-root of variance. If variance is 7.22, the standard deviation is 2.69. If variance is 8.67, the standard deviation equals 2.94.
Technically, the square-root of a population variance is called sigma and the square-root of a sample variance is called the standard deviation. As a general rule, population measures use Greek symbols and sample parameters use English letters.
Conceptually, all measures of dispersion work the same. The more dissimilar the score, the larger the values. In practical terms, the standard deviation is the most useful.
With a mean and standard deviation you can describe any normal distribution. The mean is the center point. From there we can cover the entire distribution in three steps in one direction or three steps in the other. Each step isone standard deviation each.
Although the steps are the same size, the amount of ground covered (or in our case, the number is scores) is not. A normal distribution is a symmetrical mountain of scores. Most scores are at or close to the mean. There are less and less scores the farther you are from the mean.
The percentages in a normal curve, from left to right, are 2, 14, 34, 34, 14, 2. If you add the middle two steps together, you have accounted for 68% of the scores. If you select the 4 middle-most steps, you’ve accounted for 96% of the scores. If you select all six steps, you’ve included virtually everyone. We won’t say everyone because even if you’ve tested millions of people, one is just amazingly extreme.
A mean plus and minus 3 standard deviations accounts for everyone, everyone but Ralph (whoever Ralph is). With a mean of a hundred and a standout deviation of 15, you know that a normal IQ is between 85 and 115. You know that 68% of the scores fall within that range.
And
It doesn’t matter what the topic is or who the people are. With any normal distribution, you get the same pattern.
This is the pattern of chance. The universe is composed of many random things; all follow this pattern. Musical ability, running speed, background radiation in space, feet size, and bad luck. If a variable is normally distributed, this pattern holds.
if you take a bucket of blocks and dump it on the floor, the objects will be randomly distributed. It will be a pile with a center spot. It will be a mountain of toys.
I have some blocks that don’t work that way. When you dump them out, they line up in rows and columns. This immediately tells you it is a trick. It is not a random arrangement. My blocks are magnetic. When a variable doesn’t look like a mountain, look for a cause.
In addition
With a mean and standard deviation, you can find anyone. Half the scores are above the mean in a normal curve. And half the scores are below. Positive stdev. Are above the mean. Negative stdev. Are below the mean . You can find any score or point in a distribution if you know the mean and standard deviation. That’s what z-scores are all about. Check out Day 3 of our 10-day tour: z-scores.
Here is How To Calculate Statistics.
.
Z Scores
Want to jump ahead?
- What is statistics?
- Ten Day Guided Tour
- How To Calculate Statistics
- Start At Square One
- Practice Items
- Resources
- Final Exam
Book
Statictics Safari
If you know nothing about statistics, start with the video series Square One.
Photo by Jerry Zhang on Unsplash