ktangen

March 29, 2023 by ktangen

Guided Tour

think of this process as touring statistics. It’s a tour because we’re not going to see everything there is to see. There are lots of books and courses on the subject, so we’ll only hit the highlights. And since I’m not going to leave you on your own, I think of it as a guided tour.

List any good tour, our trip is divided into days. I’ve divided statistics into ten days. By the end of the tour, you’ll have learned a lot about statistics. You’ll know when to use what procedure. You’ll learn how to interpret data. And you’ll learn how to be a thoughtful user of statistics.

Each “day” follows the same format. We’ll start with a brief overview of the topic. Then you’ll be given a choice about what you want to do next. The choices will include:

Reading “a bit more” about the topic
Reading “even more” (a longer description)
Listening to a lecture or two
Learning how to calculate the procedure we’re highlighting
Reviewing vocabulary and formulas
Taking a quiz
And anything else I can think of that will help you with the material.

If you’re a people-person who is facing statistics for the first time, this tour is for you. The math won’t be hard, we’ll go step by step, and I won’t make you memorize formulas. I think statistics is scary when you face it alone. So let’s do it together.

The funny thing about being a people-person: the more people you meet, the more numbers you collect. There are test scores, evaluations, key performance indicators, and…more paperwork. People have to be supervised and coordinated. And the numbers that describe them have to be collected, understand, summarized and reported. Where there are people, you’ll find numbers.

Fortunately, we are born with great measuring skills. We are natural collectors of information. We sense sights, sounds and smells. Our brains automatically calculate how far we are from the car ahead of us, how deep the potholes are, and how fast the bicyclist is going. We calculate slopes, shapes, shades and angles. We make thousands of decisions; all based on data we collect. In our everyday lives, we are scientists.

In statistics, we are going to hone the skills we already possess. In particular, we are going to use numbers to describe people. It’s a fairly indirect approach but highly useful. The indirectness means, of course, that we’ll make mistakes. But the usefulness of the system makes up for that problem.

Think of the difference between maps and the real world. Maps are representations of the real thing. They are subject to error (ever have a set of directions lead you astray?). But they are highly portable and useful. Numbers are the same way for people. We shouldn’t confuse the score on an intelligence test with intelligence. Just as maps are the reality, scores aren’t the reality. But representations of reality can be helpful.

Research focuses on group data. This is both an advantage and disadvantage. On the good side, we can see the broad picture. We can use group data to make generalizations, heuristics and rules that often work for most people. On the bad side, looking at a group means we will miss some individuals. It’s like coming to the conclusion that peanuts can safely to added to a school lunch. As a group generalization, it’s true. Most people can eat peanuts and benefit from doing so. But such a finding will be fatal to some children.

So remember that group data is helpful for a generalization. Statistics is a great tool for general surveying of vast wildernesses, not so good at telling what’s over the next hill, and really rotten at being able to tell whether you will fall into quicksand. So statistics looks at patterns, trends, and things that are typically true.

We often use statistics to describe things. Descriptive statistics tries to reduce a large pile of data into as few numbers as it possibly can. The ultimate goal is to find a single number that describes a whole group. In contrast, inferential statistics is when we use data as the basis for estimates, predictions and guesses. Descriptive stats is telling you that it’s raining right now. Inferential stats tries to figure out if it will rain tomorrow.

Both descriptive and inferential statistics rely on data. Data is any collection of numbers. But before we get to numbers, we have to design our study. In Day 1, you’ll learn the kind of questions we need to ask before we do a study. If you’re math phobic, relax. There are no numbers in Day 1!

Statistics is an adventure. So put on your walking shoes, grab your hat, and let’s

March 29, 2023 by ktangen

Dispersion

Day 3: Score diversity

We found the middle of the group because most people score about the same on any variable we measure. Now that you’ve found a representative for the group, how representative is the mean? Is the group unified and nearly everyone has the same score? Or are there wide fluctuations within the group? We want one number that will tell us if the scores are very similar to each other or if the group is composed of heterogeneous scores.

Dispersion in like a mountain.

ll measures of dispersion get larger when the distribution of scores is more widely varied. A narrow distribution (a lot of similar scores) has a small amount of dispersion. A wide distribution (lots of different scores) has a wide distribution. The more dispersion, the more heterogeneous (dissimilar) the scores will be.

There are five measures of dispersion:

Range

Mean Absolute Deviation (Mean Variance)

Sum of Squares

Variance

Standard Deviation

All measures of dispersion get larger when the distribution of scores is more widely varied. A narrow distribution (a lot of similar scores) has a small amount of dispersion. A wide distribution (lots of different scores) has a wide distribution. The more dispersion, the more heterogeneous (dissimilar) the scores will be.

Range

Range is easy to calculate. It is the highest score minus the lowest score. If the highest score is 11 and the lowest score is 3, the range equals 8.

Mean Absolute Deviation (MAD)

As the name suggests, mean variance (or mean absolute deviation) is a measure of variation from the mean. It is the average of the absolute values of the deviations from the mean. That is, the mean is subtracted from each raw score and the resulting deviations (called “little d’s”) are averaged (ignoring whether they are positive or negative).

Sum of Squares

Conceptually, Sum of Squares (abbreviated SS) is an extension of mean variance. Instead of taking the absolute values of the deviations, we square the critters (deviations), and add them up.

Variance

Variance of a population is always SS divided by N. This is true whether it is a large population or a small one. Variance of a large sample (N is larger than 30) is also calculated by Sum of Squares divided by N. If there are 40 or 400 in the sample, variance is SS divided by N.

However, if a sample is less than 30, it is easy to underestimate the variance of the population. Consequently, it is common practice to adjust the formula for a small sample variance. If N<30, variance is SS divided by N-1. Using N-1 instead of N results is a slightly larger estimate of variance and mitigates against the problem of using a small sample.

Standard deviation

This measure of dispersion is calculated by taking the square-root of variance. Regardless of whether you used N or N-1 to calculate variance, standard deviation is the square-root of variance. If variance is 7.22, the standard deviation is 2.69. If variance is 8.67, the standard deviation equals 2.94.

Technically, the square-root of a population variance is called sigma and the square-root of a sample variance is called the standard deviation. As a general rule, population measures use Greek symbols and sample parameters use English letters.

Conceptually, all measures of dispersion work the same. The more dissimilar the score, the larger the values. In practical terms, the standard deviation is the most useful.

With a mean and standard deviation you can describe any normal distribution. The mean is the center point. From there we can cover the entire distribution in three steps in one direction or three steps in the other. Each step isone standard deviation each.

Although the steps are the same size, the amount of ground covered (or in our case, the number is scores) is not. A normal distribution is a symmetrical mountain of scores. Most scores are at or close to the mean. There are less and less scores the farther you are from the mean.

The percentages in a normal curve, from left to right, are 2, 14, 34, 34, 14, 2. If you add the middle two steps together, you have accounted for 68% of the scores. If you select the 4 middle-most steps, you’ve accounted for 96% of the scores. If you select all six steps, you’ve included virtually everyone. We won’t say everyone because even if you’ve tested millions of people, one is just amazingly extreme.

A mean plus and minus 3 standard deviations accounts for everyone, everyone but Ralph (whoever Ralph is). With a mean of a hundred and a standout deviation of 15, you know that a normal IQ is between 85 and 115. You know that 68% of the scores fall within that range.

And

It doesn’t matter what the topic is or who the people are. With any normal distribution, you get the same pattern.

This is the pattern of chance. The universe is composed of many random things; all follow this pattern. Musical ability, running speed, background radiation in space, feet size, and bad luck. If a variable is normally distributed, this pattern holds.

if you take a bucket of blocks and dump it on the floor, the objects will be randomly distributed. It will be a pile with a center spot. It will be a mountain of toys.

I have some blocks that don’t work that way. When you dump them out, they line up in rows and columns. This immediately tells you it is a trick. It is not a random arrangement. My blocks are magnetic. When a variable doesn’t look like a mountain, look for a cause.

In addition

With a mean and standard deviation, you can find anyone. Half the scores are above the mean in a normal curve. And half the scores are below. Positive stdev. Are above the mean. Negative stdev. Are below the mean . You can find any score or point in a distribution if you know the mean and standard deviation. That’s what z-scores are all about. Check out Day 3 of our 10-day tour: z-scores.

Here is How To Calculate Statistics.

Z Scores

Want to jump ahead?

Book

Statictics Safari

If you know nothing about statistics, start with the video series Square One.

Photo by Jerry Zhang on Unsplash

March 29, 2023 by ktangen

Central Tendency

Day 2: Finding the middle

We use central tendency because there is one. If you take an armload of toys and drop them on the floor, they don’t line up in rows. They don’t arrange themselves in a triangle or circle. They fall in a heap. A pile that looks like a small mountain: a frequency distribution.

Central tendency is like trying to find the middle of a pile of toys

Although some use case studies, naturalistic observation, and single subject studies (N=1), most research is group based. Usually, there are lots of numbers from lots of subjects, all waiting to be crunched.

The place to begin is to collect the data together. If left unattended, data would cover the desks of researchers and gather dust. To be useful, it must be organized into a data matrix: a row-column table of scores. The spreadsheet of scores has a row for each subject. Each row contains all of the scores for that individual but neatly laid out in columns. A quick view of the spreadsheet will show if there are any missing scores (empty cells).

Each row is a person; each column is a variable. Traditionally, the farthest column to the left contains the ID number of the subject. The simplest data matrix has two columns: one for the ID number and one for the score. And it would have as many rows as subjects in the study.

After forming a data matrix the next step is usually to plot the data. Each variable is plotted separately: a graph for each factor being measured. Sometimes the variables are summarized in histograms (vertical bar graphs). Often the graphs are frequency distributions: overviews of the raw data. Each score is listed from lowest to highest (left to right). If more than one person has the same score, the graph points are stacked vertically.

Once the data is organized and graphed, it’s time to describe it. The shortest description of a group is how many people are in it (N). A more graphic and information descriptions would be a frequency distribution. The “frequency” is indicated in the height of the graph: the more people who have the same score, the taller the graph is. The “distribution” (width) shows how many different scores there were.

The major challenge of descriptive statistics is finding a representative of the entire group of scores. We look for a score to represent an entire group because we believe that people are more alike than different. We believe that chance follows a pattern. And that pattern is a heap in the middle with less and less on the edges. We look for atheists center because there is a center to a group. To understand most people, all we need to do is describe the middle of the group; the middle of the distribution is where most scores are.

There are three major measurements of central tendency: mean, median and mode.

The mode is the most popular person. And the mode is the most common score (highest point of the frequency distribution. But it’s hard to be accurate reading graphs, and the mode isn’t very useful for advanced statistical analysis. The general rule is: if it’s easy to calculate, it’s not very helpful.

A better measure of a group representative is the median. The median is the middlemost score. If you start at the ends and count toward the middle, whichever score you end on is the median. It’s fairly easy to calculate but the scores have to be arranged from lowest to highest (or highest to lowest) in order to count toward the middle. And the median isn’t very useful for advanced statistical analysis.

That leaves the mean (also called the average). If a frequency distribution was a seesaw, the mean would be the point where it balanced. The mean represents the average, typical person. It’s the hypothetical middle point that balances the entire distribution, which is why we end up with 2.4 children or 3.1 cars. Unlike the median and mode, the mean is very sensitive to outlying scores.

Calculation is harder than pointing or counting, but not really all the tough. You add up all the scores and divide by the number of scores. Pretty simple.

There is little difference between the mean and the score next to it. Everyone in the middle of the pack is about the same. It’s the way nature is built.The goal is to find a way to summarize a large group of numbers. One part of that process is to find a group’s representative. We want one number that will tell us about the entire group.

So, if no one has the same score, the frequency distribution would look like a straight horizontal line. If everyone had the same score, it would be represented by a vertical line. If there is some variability in scores but several people with the same score, the distribution will have both width and height. The typical frequency distribution varies from left to right but most scores are in the middle. The result is a graph that looks like a mountain…or a dome…or the bottom of a bell.

If frequency distributions are not “normal bell-shaped curves,” they might be positively skewed, negatively skewed, or bimodal.

Notice that this “bell-shaped” curve is symmetrical. There are more scores in the middle than at the ends. There are a few scores at the ends but most are in the middle. Philosophically, we believe this describes people well. If we measure them on almost anything, most will be in the middle of the distribution but a few will be at each end. Although there are a few very musical people, and a few very unmusical people, most are in the middle of the musical ability distribution. This is normal. It’s how we define average.

Sometimes the data doesn’t look like a normal bell-shaped curve. Usually, it’s because the researcher did something wrong (asked only highly gifted people) or limited the sample in some manner. The result is a skewed distribution: a normal curve with a long tail. The direction of the tail gives the distribution its name: a tail to the right (toward the high scores) is a positively-skewed distribution. Most folk scored low on the variable but a few (maybe only one person) scored quite high. A negatively-skewed distribution is normal except for an outlying score toward the negative (lowest scores).

Dispersion

Want to jump ahead?

Book

Statictics Safari

Photo by Xavi Cabrera on Unsplash

March 29, 2023 by ktangen

Social Psychology

Social psychology is the study of how being in a group influences us. Groups can increase or decrease aggression, creativity, and independent thinking. We can act exactly the same or quite, depending on which group we join.

we start off in a family, go to school and play with friends. We are never not part of a group. Those groups help shape us. We influence them, and they influence us. It is a reciprocal relationship.

[Read more…] about Social Psychology

March 28, 2023 by ktangen

Quiz: Escape

1. How many trials does taste aversion require:

a. one
b. ten
c. fifty
d. hundreds

2. Something bad has to happen before you can develop:

a. assimilation
b. avoidance
c. anxiety
d. trace conditioning

3. PTSD:

a. occurs only in war
b. occurs only in men
c. can be caused by bad dreams
d. can occur in children

4. Which is the best treatment for PTSD:

a. dream analysis
b. discussing parental misdeeds
c. shock therapy
d. desensitization

5. Running away from a room filling with smoke is.

a. avoidance
b. aversion
c. escape
d. common sense

1. How many trials does taste aversion require:

a. one
b. ten
c. fifty
d. hundreds

2. Something bad has to happen before you can develop:

a. assimilation
b. avoidance
c. anxiety
d. trace conditioning

3. PTSD:

a. occurs only in war
b. occurs only in men
c. can be caused by bad dreams
d. can occur in children

4. Which is the best treatment for PTSD:

a. dream analysis
b. discussing parental misdeeds
c. shock therapy
d. desensitization

5. Running away from a room filling with smoke is.

a. avoidance
b. aversion
c. escape
d. common sense

March 28, 2023 by ktangen

Quiz: Study Skills

1. When reading a textbook, you should:

Ta. read the headlines
b. reread it often
c. take notes
d. underline

2. Immediately knowing you don’t know a word is called:

a. negative recognition
b. latent inhibition
c. positive bonding
d. chunking

3. When you think you know it, teach a bunch of:

a. kindergarteners
b. teddy bears
c. artichokes
d. puppies

4. The best way to consolidate what you know is:

a. study a different topic
b. eat some chocolate
c. do situps
d. sleep

5. Remember that studying:

a. is a skill that needs practice
b. is best if spaced over time
c. is hard work
d. all of the above

1. When reading a textbook, you should:

a. read the headlines
b. reread it often
c. take notes
d. underline

Take notes so you can combine the book info with your other notes.

2. Immediately knowing you don’t know a word is called:

a. negative recognition
b. latent inhibition
c. positive bonding
d. chunking

3. When you think you know it, teach a bunch of:

a. kindergarteners
b. teddy bears
c. artichokes
d. puppies

Teddy bears were mentioned in the notes but any of these will work. The idea is to practice saying it out loud.

4. The best way to consolidate what you know is:

a. study a different topic
b. eat some chocolate
c. do situps
d. sleep

All are good; sleep is best.

5. Remember that studying:

a. is a skill that needs practice
b. is best if spaced over time
c. is hard work
d. all of the above

Guided Tour

Dispersion

Day 3: Score diversity

And

In addition

Z Scores

Want to jump ahead?

Book

Statictics Safari

If you know nothing about statistics, start with the video series Square One.

Central Tendency

Day 2: Finding the middle

Dispersion

Want to jump ahead?

Book

Statictics Safari

Photo by Xavi Cabrera on Unsplash

Social Psychology

Quiz: Escape

Quiz: Study Skills

Search

KenTangen.com

My Channel

ktangen

Day 3: Score diversity

And

In addition

Want to jump ahead?

Book

If you know nothing about statistics, start with the video series Square One.

Day 2: Finding the middle

Want to jump ahead?

Book

Photo by Xavi Cabrera on Unsplash

Footer

Footer

Search

KenTangen.com

My Channel