• Skip to primary navigation
  • Skip to main content
  • Skip to footer

Captain Psychology

  • Topics
  • Notes
  • Videos
  • Syllabus

Statistics

March 29, 2023 by ktangen

Correlation

Day 5 Correlation Day 5: Relationship

This is the first 2-variable model we’ll consider. Both variables (designated X and Y) are measures obtained from the same subjects. Basically, a mathematical representation of a scatterplot, a correlation indicates whether the variables move together in the same direction (+ correlation), move in opposite directions (- correlation) or move separately (0 correlation). Correlations are widely used to measure reliability, validity and commonality.

Correlation would mean wings of a bird flap together.

When one wing goes up, does the other wing to up, down or stay the same? In a strong positive correlation, when one wing goes up, the other usually goes up too. In a strong negative correlation, when one wing goes up, the other usually goes down. In a weak correlation, either positive or negative, when one wing goes up, the other wing does whatever it wants.

With correlations we are only observing, but we’re going to look at two variable and see how they are related to each other. When one variable changes, we want to know what happens to the other variable. In a perfect correlation, the two variable with move together. When there is no correlation, the variables will act independently of each other.

To use this simple and yet powerful method of description, we must collect two pieces of information on every person. These are paired observations. They can’t be separated. If we are measuring height and weight, it’s not fair to use one person’s height and another person’s weight. The data pairs must remain linked. That means that you can’t reorganize one variable (how highest to lowest, for example) without reorganizing the other variable. The pairs must stay together.

Sign & Magnitude

A correlation has both sign and magnitude. The sign (+ or -) tells you the direction of the relationship. If one variable is getting larger (2, 4, 5, 7, 9) and the other variable is headed in the same direction (2, 3, 6, 8, 11), the correlation’s sign is positive. In a negative correlation, while the first variable is getting larger (2, 4, 5, 7, 9), the second variable is getting smaller (11, 8, 6, 3, 2).

Correlation is about direction of change, like arrows

The magnitude of a correlation is found in the size of the number. Correlation coefficients can’t be bigger than 1. If someone says they found a correlation of 2.48, they did something wrong in the calculation. Since the sign can be positive or negative, a correlation must be between -1 and +1.

The closer the coefficient is to 1 (either + or -), the stronger the relationship. Weak correlations (such as .13 or -.08) are close to zero. Strong correlations (such as .78 or -.89) are close to 1. Consequently, a coefficient of -.92 is a very strong correlation. And +.25 indicates a fairly weak positive correlation.

Magnitude is how close the coefficient is to 1; sign is whether the relationship is positive (headed the same same) or negative (inverse).

Correlations don’t prove causation. A strong correlation is a necessary indicator of causation but it is not sufficient. When a cause-effect relationship exists, there will be a strong correlation between the variables. But a strong correlation does not mean that variable A causes variable B.

In correlations, A can cause B. Or, just as likely, B can cause A. Or, just as likely, something else (call it C) causes both A and D to occur.

For a simple example, let’s assume that we know nothing about science. But we do notice that when the sun comes up, it gets warm outside. From a statistical point of view, we can’t tell which causes which. Perhaps the sun coming up makes it get warm. But it is as likely that when it gets warm the sun comes up. Or the sun and warmth are caused by something else: a dragon (pulling the sun behind it) flies across the sky blowing it’s hot breath on the earth (making it warm).

You might laugh at this illustration but think how shocked you’d be if tomorrow it got warm and the sun didn’t come up!

It is, of course, perfectly OK to infer causation from correlational data. But we must remember that these inferences are not proofs; they are leaps of faith. Leaping is allowed but we must clearly indicate that it is an assumption, not a fact

Reliability & Validity

Although correlations can’t prove cause and effect, they are very useful for measuring reliability and validity. Reliability means that you get the same results every time you use a test. If you’re measuring the temperature of liquid and get a reading of 97-degrees, you would expect a reliable thermometer to yield the same result a few second later. If your thermometer gives different readings of the same source over a short period of time, it is unreliable and you would throw it away.

We expect many things in our lives to be reliable. When you flip on a light switch, you expect the light to come on. When you get on an elevator and push the “down” button, you don’t expect the elevator to go sideways. If you twice measure the length of a table, a reliable tape measure will yield the same result. Even if your measuring skill is poor, you expect the results to be close (not 36 inches and then 4 inches). You expect the same results every time.

Reliability, then, is the correlation between two observations of the same event. Test reliability is determined by giving the test once and then given the same test to the same people 2 weeks later. With this test-retest method, you would expect a high positive correlation between the first time the test was given and the second time.

A test with a test-retest reliability of .90 (which many intelligence tests have) is highly reliable. A correlation of .45 shows a moderate amount of reliability, and a coefficient close to zero indicates the test is unreliable. Obviously, a negative test-retest reliability coefficient would indicate something was wrong. People who got high scores the first time should be getting high scores the second time, if the test is reliable.

There are 3 basic types of reliability correlations. A test-retest coefficient is obtained by giving and re-giving the test. A “split half” correlation is found by correlating the total score for the first half with the total score for the second half for each subject. A parallel forms correlation shows the reliability of two tests with similar items.

Correlations also can be used to measure validity. Although a reliable test is good, it is possible to be reliably (consistently) wrong. Validity is the correlation between a test and an external criterion. If you create a test of musical ability, you expect that musicians will score high on the test and those judged by expects to be unmusical to score low on the test. The correlation between the test score and the expert’s rating is a measure of validity.

Validity is whether a test measures what it says it measures; reliability is whether a test is consistent. Clearly, reliability is necessary not sufficient for a test to be valid.

Significance

It is possible to test a correlation coefficient for significance. A significant correlation means the relationship is not likely to be due to chance. It doesn’t mean that X causes Y. It doesn’t mean that Y causes X; or that another variable causes both X and Y. Although a correlation cannot prove which causes what, r can be tested to see if it is likely to be due to chance.

First, determine the degrees of freedom for the study. The degrees of freedom (df) for a correlation are N-2. If there are 7 people (pairs of scores), the df = 5. If there are 14 people, df = 12.

Second, enter the statistical table “Critical Values of the Pearson r” with the appropriate df. Let’s assume there were 10 people in the study (10 pairs of scores). That would mean the degrees of freedom for this study equals 8.

Go down the df column to eight, and you’ll see that in order to be significant a Pearson r with this few of people the magnitude of the coefficient has to be .632 or larger.

Notice that the table ignores the sign of the correlation. A negative correlation of -.632 or larger (closer to -1) would also be significant.

 

Evaluate r-squared

A correlation can’t prove that A causes B; it could be that B causes A…or that C causes both A & B. The coefficient of determination is an indication of the amount of relationship between the two variables. It gives the percentage of variance that is accounted for by the relationship between the two variables.

To calculate the coefficient of determination, simply take the Pearson r and square it. So, .89 squared = .79. In this example, 79% of the variance can be explained by the relationship between the two variables. Using a Venn diagram, it is possible to see the relationship between the two variables. It is the area of overlap:

To calculate the amount of variance that is NOT explained by the relationship (called the coefficient of non-determination), subtract r-squared from 1. In our example, 1-r2 = .21. That is, 21% of the variance is unaccounted for..

.

Regression

Captain psychology

Want to jump ahead?

  • What is statistics?
  • Ten Day Guided Tour
    • Measurement
    • Central Tendency
    • Dispersion
    • Z Scores
    • Correlation
    • Regression
    • Probably
    • Independent t-Test
    • One-Way ANOVA
    • Advanced Procedures
  • How To Calculate Statistics
  • Start At Square One
  • Practice Items
  • Resources
    • Formulas
    • Critical Values of t
    • Critical Values of F
    • Critical Values of r
    • Nonparametrics
    • Decision Tree.
  • Final Exam

Book

Statistics SafariStatictics Safari

Photo by Boris Smokrovic on Unsplash

 

Filed Under: Statistics

March 29, 2023 by ktangen

z Scores

Day 4 z scoresDay4: Where am I

An entire distribution can often be reduced to a mean and standard deviation. A z-score uses that information to indicate the location of an individual score. Essentially, z-scores indicate how many standard deviations you are away from the mean. If z = 0, you’re at the mean. If z is positive, you’re above the mean; if negative, you’re below the mean. In practical terms, z scores can range from -3 to +3.

Z cores: how compare to others; like one red in field of yellow flowers

Composed of two parts, the z-score has both magnitude and sign. The magnitude can be interpreted as the number of standard deviations the raw score is away from the mean. The sign indicates whether the score is above the mean (+) or below the mean (-). To calculate the z-score, subtract the mean from the raw score and divide that answer by the standard deviation of the distribution. In formal terms, the formula is

Using this formula, we can find z for any raw score, assuming we know the mean and standard deviation of the distribution. What is the z-score for a raw score of 110, a mean of 100 and a standard deviation of 10? First, we find the difference between the score and the mean, which in this case would be 110-100 = 10. The result is divided by the standard deviation (10 divided by 10 = 1). With a z score of 1, we know that the raw score of 110 is one standard deviation above the mean for this distribution being studied.

Independent t-test

Applications

A-applicationsThere are 5 primary applications of z-scores.

First, z-scores can be used for describing the location of an individual score. If your score is at the mean, your z score equals zero. If z = 1, you are one standard deviation above the mean. If z = -2, you are two standard deviations below the mean. If z = 1.27, you score is a bit more than one and 1/4th standard deviations above the mean.

What is the z-score for a raw score of 104, a mean of 110 and a standard deviation of 12? 104-110 equals -6; -6 divided by 12 equals -.5. The raw score of 104 is one-half a standard deviation below the mean.

Second, raw scores can be evaluated in relation to some set z-score standard; a cutoff score. For example, all of the scores above a cutoff z-score of 1.65 could be accepted. In this case, z-scores provide a convenient way of describing a frequency distribution regardless of what variable is being measured.

Each z score’s location in a distribution is associated with an area under the curve. A z of 0 is at the 50th percentile and indicates that 50% of the scores are below that point. A z score of 2 is associated with the 98th percentile. If we wanted to select the top 2% of the individuals taking a musical ability test, we would want those who had a z score of 2 or higher. Z scores allow us to compare an individual to a standard regardless of whether the test had a mean of 60 or 124.

Most statistics textbooks have a table that shows the percentage of scores at any given point of a normal distribution. You can begin with a z score and find an area or begin with an area and find the corresponding z score. Areas are listed as decimals: .5000 instead of 50%. In order to save space, tables only list positive values are shown. The tables also assume you know that 50% of the scores fall below the mean and 50% above the mean. The table usually has 3 columns: the z score, the area between the mean and z, and the area beyond z.

The area between the mean and z is the percentage of scores located between z and the mean. A z of 0 has an area between the mean and z of 0 and the area beyond (the area toward the end of the distribution) as .5000. Although there are no negatives, notice that a z score of -0 would also have an area beyond (toward the low end of the distribution) of .5000.

A z score of .1, for example, has an area between the mean and z of .0398. That is, 3.98% of the scores fall within this area. And the third column shows that the area beyond (toward the positive end of the distribution) is .4602. If the z has -.1, the area from the mean down to that point would account for 3.98% of the scores and the area beyond (toward the negative end of the distribution) would be .4602.

Areas under the curve can be combined. For example, to calculate the percentile of a z of .1, the area between the mean and z (.0398) is added to the area below z (which you know to be .5000). So the total percentage of scores below a z of .1 is 53.98 (that is, .0398 plus .5000). A z score of .1 is at the 53.98th percentile.

Third, an entire variable can be converted to z-scores. This process of converting raw scores to z-scores is called standardizing and the resulting distribution of z-scores is a normalized or standardized distribution. A standardized test, then, is one whose scores have been converted from raw scores to z-scores. The resultant distribution always has a mean of 0 and a standard deviation of 1.

Standardizing a distribution gets rid of the rough edges of reality. If you’ve created a nifty new test of artistic sensitivity, the mean might be 123.73 and the standard deviation might be 23.2391. Interpreting these results and communicating them to others would be easier if the distribution was smooth and conformed exactly to the shape of a normal distribution. Converting each score on your artistic sensitivity test to a z score, converts the raw distribution’s bumps and nicks into a smooth normal distribution with a mean of 0 and a standard deviation of 1. Z scores make life prettier.

Fourth, once converted to a standardized distribution, the variable can be linearly transformed to have any mean and standard deviation desired. By reversing the process, z-scores are converted back to raw score by multiplying each by the desired standard deviation and add the desired mean. Most intelligence tests have a mean of 100 and a standard deviation of 15 or 16. But these numbers didn’t magically appear. The original data looked as wobbly as your test of artistic sensitivity. The original distribution was converted to z scores and then the entire distribution was shifted.

To change a normal distribution (a distribution of z scores) to a new distribution, simply multiply by the standard deviation you want and add the mean you want. It’s easy to take a normalized distribution and convert it to a distribution with a mean of 100 and a standard deviation of 20. Begin with the z scores and multiply by 20. A z of 0 (at the mean) is still 0, a z of 1 is 20 and a z of -1 is -20. Now add 100 to each, and the mean becomes 100 and the z of 1 is now 120. The z of -1 becomes 80, because 100 plus -20 equals 80. The resulting distribution will have a mean of 100 and a standard deviation of 20.

Fifth, two distributions with different means and standard deviations can be converted to z-scores and compared. Comparing distributions is possible after each distribution is converted into z’s. The conversion process allows previously incomparable variables to be compared. If a child comes to your school but she old school used a different math ability test, you can estimate her score on your school’s test by converting both to z scores.

If her score was 65 on a test with a mean of 50 and a standard deviation of 10, her z score was 1.5 on the old test (65-50 divided by 10 equals 1.5). If your school’s test has a mean of 80 and a standard deviation of 20, you can estimate her score on your test as being 1.5 standard deviations above the mean; a score of 110 on your test.

.

Correlation

Captain psychology

Want to jump ahead?

  • What is statistics?
  • Ten Day Guided Tour
    • Measurement
    • Central Tendency
    • Dispersion
    • Z Scores
    • Correlation
    • Regression
    • Probably
    • Independent t-Test
    • One-Way ANOVA
    • Advanced Procedures
  • How To Calculate Statistics
  • Start At Square One
  • Practice Items
  • Resources
    • Formulas
    • Critical Values of t
    • Critical Values of F
    • Critical Values of r
    • Nonparametrics
    • Decision Tree.
  • Final Exam

Book

Statistics SafariStatictics Safari

Photo by Rupert Britton on Unsplash

Filed Under: Statistics

March 29, 2023 by ktangen

Guided Tour

think of this process as touring statistics. It’s a tour because we’re not going to see everything there is to see. There are lots of books and courses on the subject, so we’ll only hit the highlights. And since I’m not going to leave you on your own, I think of it as a guided tour.

List any good tour, our trip is divided into days. I’ve divided statistics into ten days. By the end of the tour, you’ll have learned a lot about statistics. You’ll know when to use what procedure. You’ll learn how to interpret data. And you’ll learn how to be a thoughtful user of statistics.

Each “day” follows the same format. We’ll start with a brief overview of the topic. Then you’ll be given a choice about what you want to do next. The choices will include:

  • Reading “a bit more” about the topic
  • Reading “even more” (a longer description)
  • Listening to a lecture or two
  • Learning how to calculate the procedure we’re highlighting
  • Reviewing vocabulary and formulas
  • Taking a quiz
  • And anything else I can think of that will help you with the material.

If you’re a people-person who is facing statistics for the first time, this tour is for you. The math won’t be hard, we’ll go step by step, and I won’t make you memorize formulas. I think statistics is scary when you face it alone. So let’s do it together.

The funny thing about being a people-person: the more people you meet, the more numbers you collect. There are test scores, evaluations, key performance indicators, and…more paperwork. People have to be supervised and coordinated. And the numbers that describe them have to be collected, understand, summarized and reported. Where there are people, you’ll find numbers.

Fortunately, we are born with great measuring skills. We are natural collectors of information. We sense sights, sounds and smells. Our brains automatically calculate how far we are from the car ahead of us, how deep the potholes are, and how fast the bicyclist is going. We calculate slopes, shapes, shades and angles. We make thousands of decisions; all based on data we collect. In our everyday lives, we are scientists.

In statistics, we are going to hone the skills we already possess. In particular, we are going to use numbers to describe people. It’s a fairly indirect approach but highly useful. The indirectness means, of course, that we’ll make mistakes. But the usefulness of the system makes up for that problem.

Think of the difference between maps and the real world. Maps are representations of the real thing. They are subject to error (ever have a set of directions lead you astray?). But they are highly portable and useful. Numbers are the same way for people. We shouldn’t confuse the score on an intelligence test with intelligence. Just as maps are the reality, scores aren’t the reality. But representations of reality can be helpful.

Research focuses on group data. This is both an advantage and disadvantage. On the good side, we can see the broad picture. We can use group data to make generalizations, heuristics and rules that often work for most people. On the bad side, looking at a group means we will miss some individuals. It’s like coming to the conclusion that peanuts can safely to added to a school lunch. As a group generalization, it’s true. Most people can eat peanuts and benefit from doing so. But such a finding will be fatal to some children.

So remember that group data is helpful for a generalization. Statistics is a great tool for general surveying of vast wildernesses, not so good at telling what’s over the next hill, and really rotten at being able to tell whether you will fall into quicksand. So statistics looks at patterns, trends, and things that are typically true.

We often use statistics to describe things. Descriptive statistics tries to reduce a large pile of data into as few numbers as it possibly can. The ultimate goal is to find a single number that describes a whole group. In contrast, inferential statistics is when we use data as the basis for estimates, predictions and guesses. Descriptive stats is telling you that it’s raining right now. Inferential stats tries to figure out if it will rain tomorrow.

Both descriptive and inferential statistics rely on data. Data is any collection of numbers. But before we get to numbers, we have to design our study. In Day 1, you’ll learn the kind of questions we need to ask before we do a study. If you’re math phobic, relax. There are no numbers in Day 1!

Statistics is an adventure. So put on your walking shoes, grab your hat, and let’s

Filed Under: Statistics

March 29, 2023 by ktangen

Dispersion

Day 3 Dispersion Day 3: Score diversity

We found the middle of the group because most people score about the same on any variable we measure. Now that you’ve found a representative for the group, how representative is the mean? Is the group unified and nearly everyone has the same score? Or are there wide fluctuations within the group? We want one number that will tell us if the scores are very similar to each other or if the group is composed of heterogeneous scores.

Dispersion in like a mountain.

ll measures of dispersion get larger when the distribution of scores is more widely varied. A narrow distribution (a lot of similar scores) has a small amount of dispersion. A wide distribution (lots of different scores) has a wide distribution. The more dispersion, the more heterogeneous (dissimilar) the scores will be.

There are five measures of dispersion:

Range
Mean Absolute Deviation (Mean Variance)
Sum of Squares
Variance
Standard Deviation

All measures of dispersion get larger when the distribution of scores is more widely varied. A narrow distribution (a lot of similar scores) has a small amount of dispersion. A wide distribution (lots of different scores) has a wide distribution. The more dispersion, the more heterogeneous (dissimilar) the scores will be.

Range

Range is easy to calculate. It is the highest score minus the lowest score. If the highest score is 11 and the lowest score is 3, the range equals 8.

Mean Absolute Deviation (MAD)

As the name suggests, mean variance (or mean absolute deviation) is a measure of variation from the mean. It is the average of the absolute values of the deviations from the mean. That is, the mean is subtracted from each raw score and the resulting deviations (called “little d’s”) are averaged (ignoring whether they are positive or negative).

Sum of Squares

Conceptually, Sum of Squares (abbreviated SS) is an extension of mean variance. Instead of taking the absolute values of the deviations, we square the critters (deviations), and add them up.

Variance

Variance of a population is always SS divided by N. This is true whether it is a large population or a small one. Variance of a large sample (N is larger than 30) is also calculated by Sum of Squares divided by N. If there are 40 or 400 in the sample, variance is SS divided by N.

However, if a sample is less than 30, it is easy to underestimate the variance of the population. Consequently, it is common practice to adjust the formula for a small sample variance. If N<30, variance is SS divided by N-1. Using N-1 instead of N results is a slightly larger estimate of variance and mitigates against the problem of using a small sample.

Standard deviation

This measure of dispersion is calculated by taking the square-root of variance. Regardless of whether you used N or N-1 to calculate variance, standard deviation is the square-root of variance. If variance is 7.22, the standard deviation is 2.69. If variance is 8.67, the standard deviation equals 2.94.

Technically, the square-root of a population variance is called sigma and the square-root of a sample variance is called the standard deviation. As a general rule, population measures use Greek symbols and sample parameters use English letters.

Conceptually, all measures of dispersion work the same. The more dissimilar the score, the larger the values. In practical terms, the standard deviation is the most useful.

With a mean and standard deviation you can describe any normal distribution. The mean is the center point. From there we can cover the entire distribution in three steps in one direction or three steps in the other. Each step isone standard deviation each.

Although the steps are the same size, the amount of ground covered (or in our case, the number is scores) is not. A normal distribution is a symmetrical mountain of scores. Most scores are at or close to the mean. There are less and less scores the farther you are from the mean.

The percentages in a normal curve, from left to right, are 2, 14, 34, 34, 14, 2. If you add the middle two steps together, you have accounted for 68% of the scores.  If you select the 4 middle-most steps, you’ve accounted for 96% of the scores. If you select all six steps, you’ve included virtually everyone. We won’t say everyone because even if you’ve tested millions of people, one is just amazingly extreme.

A mean plus and minus 3 standard deviations accounts for everyone, everyone but Ralph (whoever Ralph is). With a mean of a hundred and a standout deviation of 15, you know that a normal IQ is between 85 and 115. You know that 68% of the scores fall within that range.

And

It doesn’t matter what the topic is or who the people are. With any normal distribution, you get the same pattern.

This is the pattern of chance. The universe is composed of many random things; all follow this pattern. Musical ability, running speed, background radiation in space, feet size, and bad luck. If a variable is normally distributed, this pattern holds.

if you take a bucket of blocks and dump it on the floor, the objects will be randomly distributed. It will be a pile with a center spot. It will be a mountain of toys.

I have some blocks that don’t work that way. When you dump them out, they line up in rows and columns. This immediately tells you it is a trick. It is not a random arrangement. My blocks are magnetic. When a variable doesn’t look like a mountain, look for a cause.

In addition

With a mean and standard deviation, you can find anyone. Half the scores are above the mean in a normal curve. And half the scores are below. Positive stdev. Are above the mean. Negative stdev. Are below the mean . You can find any score or point in a distribution if you know the mean and standard deviation. That’s what z-scores are all about. Check out Day 3 of our 10-day tour: z-scores.

Here is How To Calculate Statistics.

.

Z Scores

Captain psychology

Want to jump ahead?

  • What is statistics?
  • Ten Day Guided Tour
    • Measurement
    • Central Tendency
    • Dispersion
    • Z Scores
    • Correlation
    • Regression
    • Probably
    • Independent t-Test
    • One-Way ANOVA
    • Advanced Procedures
  • How To Calculate Statistics
  • Start At Square One
  • Practice Items
  • Resources
    • Formulas
    • Critical Values of t
    • Critical Values of F
    • Critical Values of r
    • Nonparametrics
    • Decision Tree.
  • Final Exam

Book

Statistics SafariStatictics Safari

If you know nothing about statistics, start with the video series Square One.

 

 

Photo by Jerry Zhang on Unsplash

Filed Under: Statistics

March 29, 2023 by ktangen

Central Tendency

Day 2 Central Tendency Day 2: Finding the middle

We use central tendency because there is one. If you take an armload of toys and drop them on the floor, they don’t line up in rows. They don’t arrange themselves in a triangle or circle. They fall in a heap. A pile that looks like a small mountain: a frequency distribution.

Central tendency is like trying to find the middle of a pile of toys

Although some use case studies, naturalistic observation, and single subject studies (N=1), most research is group based.  Usually, there are lots of numbers from lots of subjects, all  waiting to be crunched.

The place to begin is to collect the data together. If left unattended, data would cover the desks of researchers and gather dust. To be useful, it must be organized into a data matrix: a row-column table of scores. The spreadsheet of scores has a row for each subject. Each row contains all of the scores for that individual but neatly laid out in columns. A quick view of the spreadsheet will show if there are any missing scores (empty cells).

Each row is a person; each column is a variable. Traditionally, the farthest column to the left contains the ID number of the subject. The simplest data matrix has two columns: one for the ID number and one for the score. And it would have as many rows as subjects in the study.

After forming a data matrix the next step is usually to plot the data. Each variable is plotted separately: a graph for each factor being measured. Sometimes the variables are summarized in histograms (vertical bar graphs). Often the graphs are frequency distributions: overviews of the raw data. Each score is listed from lowest to highest (left to right). If more than one person has the same score, the graph points are stacked vertically.

Once the data is organized and graphed, it’s time to describe it. The shortest description of a group is how many people are in it (N). A more graphic and information descriptions would be a frequency distribution. The “frequency” is indicated in the height of the graph: the more people who have the same score, the taller the graph is. The “distribution” (width) shows how many different scores there were.

The  major challenge of descriptive statistics is finding a representative of the entire group of scores. We look for a score to represent an entire group because we believe that people are more alike than different. We believe that chance follows a pattern. And that pattern is a heap in the middle with less and less on the edges. We look for atheists center because there is a center to a group. To understand most people, all we need to do is describe the middle of the group; the middle of the distribution is where most scores are.

There are three major measurements of central tendency: mean, median and mode.

The mode is the most popular person. And the mode is the most common score (highest point of the frequency distribution. But it’s hard to be accurate reading graphs, and the mode isn’t very useful for advanced statistical analysis. The general rule is: if it’s easy to calculate, it’s not very helpful.

A better measure of a group representative is the median. The median is the middlemost score. If you start at the ends and count toward the middle, whichever score you end on is the median. It’s fairly easy to calculate but the scores have to be arranged from lowest to highest (or highest to lowest) in order to count toward the middle. And the median isn’t very useful for advanced statistical analysis.

That leaves the mean (also called the average). If a frequency distribution was a seesaw, the mean would be the point where it balanced. The mean represents the average, typical person. It’s the hypothetical middle point that balances the entire distribution, which is why we end up with 2.4 children or 3.1 cars. Unlike the median and mode, the mean is very sensitive to outlying scores.

Calculation is harder than pointing or counting, but not really all the tough. You add up all the scores and divide by the number of scores. Pretty simple.

There is little difference between the mean and the score next to it. Everyone in the middle of the pack is about the same. It’s the way nature is built.The goal is to find a way to summarize a large group of numbers. One part of that process is to find a group’s representative. We want one number that will tell us about the entire group.

So, if no one has the same score, the frequency distribution would look like a straight horizontal line. If everyone had the same score, it would be represented by a vertical line. If there is some variability in scores but several people with the same score, the distribution will have both width and height. The typical frequency distribution varies from left to right but most scores are in the middle. The result is a graph that looks like a mountain…or a dome…or the bottom of a bell.

If frequency distributions are not “normal bell-shaped curves,” they might be positively skewed, negatively skewed, or bimodal.

Notice that this “bell-shaped” curve is symmetrical. There are more scores in the middle than at the ends. There are a few scores at the ends but most are in the middle. Philosophically, we believe this describes people well. If we measure them on almost anything, most will be in the middle of the distribution but a few will be at each end. Although there are a few very musical people, and a few very unmusical people, most are in the middle of the musical ability distribution. This is normal. It’s how we define average.

Sometimes the data doesn’t look like a normal bell-shaped curve. Usually, it’s because the researcher did something wrong (asked only highly gifted people) or limited the sample in some manner. The result is a skewed distribution: a normal curve with a long tail. The direction of the tail gives the distribution its name: a tail to the right (toward the high scores) is a positively-skewed distribution. Most folk scored low on the variable but a few (maybe only one person) scored quite high. A negatively-skewed distribution is normal except for an outlying score toward the negative (lowest scores).

.

Dispersion

Captain psychology

Want to jump ahead?

  • What is statistics?
  • Ten Day Guided Tour
    • Measurement
    • Central Tendency
    • Dispersion
    • Z Scores
    • Correlation
    • Regression
    • Probably
    • Independent t-Test
    • One-Way ANOVA
    • Advanced Procedures
  • How To Calculate Statistics
  • Start At Square One
  • Practice Items
  • Resources
    • Formulas
    • Critical Values of t
    • Critical Values of F
    • Critical Values of r
    • Nonparametrics
    • Decision Tree.
  • Final Exam

Book

Statistics SafariStatictics Safari

Photo    by Xavi Cabrera on Unsplash

 

Filed Under: Statistics

January 3, 2023 by ktangen

Measurement

Day 1 Measurement Day 1: Before data collection

Measurement is a pre-number crunching activity in statistics. No math is required! But to do research, you must know–at least in general–what you’re trying to prove. Let’s summarize it in five questions:

Measurement is like a flexible ruler

Before you conduct a study, use your theory to answer five questions: (1) what are you trying to prove, (2) what is it like in practice, (3) who is predicting whom, (4) who is being studied, and (5) what do the numbers mean? Theories are used to guide research; models are used to test theories. Because theories are composed of constructs, they are untested theoretical realities. But models are built for the purpose of being tested; they are composed of variables.

[Read more…] about Measurement

Filed Under: Statistics

  • « Go to Previous Page
  • Page 1
  • Interim pages omitted …
  • Page 8
  • Page 9
  • Page 10
  • Page 11
  • Go to Next Page »

Footer

Search

KenTangen.com

My Channel

Copyright © 2025 · Executive Pro on Genesis Framework · WordPress · Log in