Day 2: Finding the middle
We use central tendency because there is one. If you take an armload of toys and drop them on the floor, they don’t line up in rows. They don’t arrange themselves in a triangle or circle. They fall in a heap. A pile that looks like a small mountain: a frequency distribution.
Although some use case studies, naturalistic observation, and single subject studies (N=1), most research is group based. Usually, there are lots of numbers from lots of subjects, all waiting to be crunched.
The place to begin is to collect the data together. If left unattended, data would cover the desks of researchers and gather dust. To be useful, it must be organized into a data matrix: a row-column table of scores. The spreadsheet of scores has a row for each subject. Each row contains all of the scores for that individual but neatly laid out in columns. A quick view of the spreadsheet will show if there are any missing scores (empty cells).
Each row is a person; each column is a variable. Traditionally, the farthest column to the left contains the ID number of the subject. The simplest data matrix has two columns: one for the ID number and one for the score. And it would have as many rows as subjects in the study.
After forming a data matrix the next step is usually to plot the data. Each variable is plotted separately: a graph for each factor being measured. Sometimes the variables are summarized in histograms (vertical bar graphs). Often the graphs are frequency distributions: overviews of the raw data. Each score is listed from lowest to highest (left to right). If more than one person has the same score, the graph points are stacked vertically.
Once the data is organized and graphed, it’s time to describe it. The shortest description of a group is how many people are in it (N). A more graphic and information descriptions would be a frequency distribution. The “frequency” is indicated in the height of the graph: the more people who have the same score, the taller the graph is. The “distribution” (width) shows how many different scores there were.
The major challenge of descriptive statistics is finding a representative of the entire group of scores. We look for a score to represent an entire group because we believe that people are more alike than different. We believe that chance follows a pattern. And that pattern is a heap in the middle with less and less on the edges. We look for atheists center because there is a center to a group. To understand most people, all we need to do is describe the middle of the group; the middle of the distribution is where most scores are.
There are three major measurements of central tendency: mean, median and mode.
The mode is the most popular person. And the mode is the most common score (highest point of the frequency distribution. But it’s hard to be accurate reading graphs, and the mode isn’t very useful for advanced statistical analysis. The general rule is: if it’s easy to calculate, it’s not very helpful.
A better measure of a group representative is the median. The median is the middlemost score. If you start at the ends and count toward the middle, whichever score you end on is the median. It’s fairly easy to calculate but the scores have to be arranged from lowest to highest (or highest to lowest) in order to count toward the middle. And the median isn’t very useful for advanced statistical analysis.
That leaves the mean (also called the average). If a frequency distribution was a seesaw, the mean would be the point where it balanced. The mean represents the average, typical person. It’s the hypothetical middle point that balances the entire distribution, which is why we end up with 2.4 children or 3.1 cars. Unlike the median and mode, the mean is very sensitive to outlying scores.
Calculation is harder than pointing or counting, but not really all the tough. You add up all the scores and divide by the number of scores. Pretty simple.
There is little difference between the mean and the score next to it. Everyone in the middle of the pack is about the same. It’s the way nature is built.The goal is to find a way to summarize a large group of numbers. One part of that process is to find a group’s representative. We want one number that will tell us about the entire group.
So, if no one has the same score, the frequency distribution would look like a straight horizontal line. If everyone had the same score, it would be represented by a vertical line. If there is some variability in scores but several people with the same score, the distribution will have both width and height. The typical frequency distribution varies from left to right but most scores are in the middle. The result is a graph that looks like a mountain…or a dome…or the bottom of a bell.
If frequency distributions are not “normal bell-shaped curves,” they might be positively skewed, negatively skewed, or bimodal.
Notice that this “bell-shaped” curve is symmetrical. There are more scores in the middle than at the ends. There are a few scores at the ends but most are in the middle. Philosophically, we believe this describes people well. If we measure them on almost anything, most will be in the middle of the distribution but a few will be at each end. Although there are a few very musical people, and a few very unmusical people, most are in the middle of the musical ability distribution. This is normal. It’s how we define average.
Sometimes the data doesn’t look like a normal bell-shaped curve. Usually, it’s because the researcher did something wrong (asked only highly gifted people) or limited the sample in some manner. The result is a skewed distribution: a normal curve with a long tail. The direction of the tail gives the distribution its name: a tail to the right (toward the high scores) is a positively-skewed distribution. Most folk scored low on the variable but a few (maybe only one person) scored quite high. A negatively-skewed distribution is normal except for an outlying score toward the negative (lowest scores).
.
Dispersion
Want to jump ahead?
- What is statistics?
- Ten Day Guided Tour
- How To Calculate Statistics
- Start At Square One
- Practice Items
- Resources
- Final Exam
Book
Statictics Safari
Photo by Xavi Cabrera on Unsplash