• Skip to primary navigation
  • Skip to main content
  • Skip to footer

Captain Psychology

  • Topics
  • Notes
  • Videos
  • Syllabus

Statistics

April 1, 2023 by ktangen

Not Your Average American City

As you can see, this is not your average American city. For one thing, this is Hong Kong. Minimally, to be the average American city, it has to be in America. 🙂

But Hong Kong can serve as an example of how to best approach the problem of description.

Every city can be described by numbers. There are certain items that could be counted: number of lights shining, height of buildings, number of people living in it, etc. And each of these variables (as opposed to constants) could be used to describe where you live. So, clearly, the first step is to decide what to measure. Are we looking for average population, average rainfall, or average income?

Let’s stick with population for now, and see where it leads. After all, how hard could this be? Let’s just take the population of the US and divide it by the number of cities in the country. That will give us the average city size.

But what exactly is a city? If we count only people who live within the actual boundaries of the city limits, aren’t we underestimating its size? For example, Hong Kong Island has a population around 1.3 million people, but the greater Hong Kong area has a population of nearly 7 million people. In the US, the greater Chicago would be comparable in population, depending on how big you make your “greater area.”

A related issue is that defining a city turns out to be a bit tricky. For example, Maza, North Dakota claims to be a city, and yet boasts a population of five. Another contender, Marineland in Florida, has a population of 7. Just think, if a family of five moves in, they could double your population.

And yet, Framingham, MA, which has a population of about 67,000, claims it is not a city. It says it is the largest town in the US.  Apparently, being a city may depend on your type of local government, on if it has formed itself into a corporation, or just how you feel about it.

Calculating something as simple as an average can have its complications. Numbers look so clear cut and stable. But even descriptive statistics depend a lot on our definitions. So when we see a number, we have to remember that assumptions went into it. And those assumptions are critically important. We can’t separate our numbers from our assumptions.

Photo by Andres Garcia on Unsplash

Filed Under: Statistics

April 1, 2023 by ktangen

Baseball Stats

In the 1950s, it wasn’t unusual for children to be quite conversant about baseball statistics. Part of that ability was tied to the popularity of the game, but part of it must be attributed to bubble gum.

It was a pretty good deal, compared to the chewable cigarettes, for example. You got a big piece of (not terribly flavorful) gum. And with the gum, you got a baseball card. It was a single card for a penny, or a five pack (sometimes 6) for a nickel.

The cards were comparable to playing cards in size and shape. The front of the card had a picture of the player, his name, team affiliation, and the position he played. The back of the card listed the player’s stats: height, weight, bats (left or right), throws (left or right), some major achievements, where he was born, and his birthday.

Then the good stuff. There were numbers for games, AB, runs, hits, 2B, 3B, HR, RBI and B ave. There also were some numbers about fielding (PO, A, and E). For pitchers, there were stats for wins, earned run average (ERA) and strikeouts.

Here’s what the abbreviations mean:

AB = at bat. It’s the number of times up to bat; but not counting getting hit by a ball, getting to base on balls, and other unusual events.

2B = a two-base hit, also called a double.

3B = a three-base hit, also called a triple.

HR = home runs (a four base hit).

RBI = runs batted in (number of other runners to cross the plate because of the player’s batting.

B Ave = batting average (hits divided by at bats).

PO = put outs (tagging out opposing runners)

A = assists (helping other fielders)

E = errors (mistakes)

In the case of baseball cards, statistics can be profitable too. Check your attic. If you have a 1951 Mickey Mantle rooky card (made by Bowman) or the 1952 Mickey Mantle card (made by Topps), I’ll give you a dollar for it. That’s just the kind of guy I am.

The card’s worth a lot of money but I’ll give you a dollar. As I said, that’s just the kind of guy I am.

 

Filed Under: Statistics

April 1, 2023 by ktangen

Why Can We Predict Stars But Not Wall Street

Some predictions are easier to make than others. Accurately predicting the movement of the stars and planets is possible because they maintain regular patterns. The planets have relatively set paths around the sun. There is some variation but the pattern is well established and occurs repeatedly. Similarly, stars follow consistent projectories. They don’t jump erratically; they maintain reliable courses.

Star and planet data fit the assumptions of prediction well. They have set courses, reliable patterns, and replicable observations. Statistics work well on data that fit these parameters. Any behavioral pattern that is consistent is relatively easy to predict.

In humans, scores on intelligence tests are quite consistent. Year after year, you tend to get about the same score. Moods, however, change quickly, and don’t follow a consistent pattern. Consequently, predicting moods is very hard to do.

Financial markets would be predictable if they were consistent. But stock prices jump, fall, slowly rise, and fade away. There are too many twists and turns for good prediction. So don’t blame statistics for not predicting the next major financial collapse. The data simply doesn’t meet the requirement of consistency.

If it’s any comfort, statistics is equally bad at predicting financial turnarounds. Prosperity could suddenly appear. A new discovery could be about to happen. Great news could be at hand. Statistics can sometimes explain patterns of the past but it’s not very good at seeing into the future.

Filed Under: Statistics

April 1, 2023 by ktangen

What Type Of Error Am I Making

Science does a terrible job predicting individual behavior. It’s not that we don’t try. We’re just not very good at it.

Science is really good at predicting a group. We collect data on a group, and use it to predict what the group, in general, will do. If you want to know if a drug works, we gather a group of people together, randomly assign them to wonder-drug and sugar-pill treatment conditions, and we see who lives and dies.

Then, we tell you that the new drug is great. But what we mean to say is that they it is generally great. Most of the people in the study did better on the drug. A few individuals had fabulous results. And a few people died. But, overall, it’s a great drug.

Unfortunately, you’re not a group. As an individual, it is extremely hard to predict what will happen to you on the new drug. You might be like most people. If so, take the drug.

You might be unusual. If you’re unusual, the drug will do wonders or kill you. But we can’t say which it will be.

Can we do any better predicting what movies you will watch? Yes and no.

On the yes side, there are several helpful factors. First, you do this behavior more than once. Science is always better at predicting events that repeat themselves. If you regularly rent movies, there might well be a pattern in your behavior. So if you’ve rented thousands of movies, predicting your behavior is feasible. In fact, if you watch lots of movies, you’re probably up for watching anything. But if you have only rented one movie, guessing your next movie is extremely difficult.

Science would prefer to predict a group, not an individual. And it would prefer to predict regularly repeating behavior, not occasional, periodic, or spurious behavior. If you blow up buildings on a regular basis, predicting that you’ll be violent in the future isn’t so hard. If you only blow up a building here or there, it’s difficult to model that behavior.

In 2006, Netflix offered a million dollars to anyone who could improve the predicting accuracy of Cinematch, their how-about-renting-this-movie software. To reach that goal, programmers tried to model consumer behavior.

The contestants were given a large file containing movie titles and dates. No information about the customers was included. So predicting individual behavior wasn’t possible. Like testing a new drug, movie ratings predict what will happen to a group of scores. The software will only predict you to the extent that you are a lot like the people in the data set.

Predicting movie ratings is even harder than it might seem. Remember, the ratings are on a 5-point scale. And that scale uses ordinal numbers.

Ordinal numbers give us 1st, 2nd and 3rd place, but no information about how close the race was. First and second places could be really close, or quite far apart. So you might score Jaws high but is it a lot higher or only slightly higher than any other Spielberg film?

A related problem is that people aren’t consistent in their ratings. When you ask people to re-rate a movie, they give the same answer. This isn’t surprising, given that moods change. Although inconsistent ratings probably happen more often with those in the middle, we have to be in the right mood for even movies we love.

Predicting starts with a simple linear regression (see Day 5). You gather data and see what the general pattern is. If there is one simple straight line, your task will be quite easy. But more complicated data sets often require more work.

The general term is called modeling. Essentially, you calculate the correlation (see Day 4) between all of the variables, and see if you can find patterns or clusters of correlations. If you rated one funny movie high, you might do the same with another funny movie.

Here’s how modeling works. Start by imagining a room where movie titles are floating in air. When you look closer, you can see that all of the funny movies are floating near the ceiling; and the dark, scary films are near the floor.

You also notice that they are arranged left to right by their target age group: kid movies to the left, senior citizens to the right. The third dimension, the depth of the room, indicates popularity (most-liked to least-liked). This three dimensional space is a model of what these movies have in common.

Now think of these titles as flowing through the room, changing every few seconds. It’s a stream of information that has spurts, lulls, waves and transitions. In this rapidly changing sea of data, try to hit one title with a dart. Think of it as “pin the tail on the movie.” It is not an easy task.

In addition, you need to add more dimensions; three is not enough to describe them all. Movies vary on theme, quality of photography, cleverness of titles, the fame of actors, the quality of directors, the skill of editors, and on their cultural, spiritual, religious and political context. They could also be rating on happy endings, exotic settings, and intricate costumes.

Descriptions of these inter-correlations are called maps. Not only do they help clarify the data, they can also be quite pretty (see http://www.the-ensemble.com/).

Statistics often looks for consistent patterns. We predict what is best for groups of people based on other group data. We are great at predicting what large groups of people repeated do. We’re pretty good at predicting what large groups of people sometimes do. And we’re lousy at predicting what you will do.

Filed Under: Statistics

April 1, 2023 by ktangen

Netflix Uses Stats To Target You


Science does a terrible job predicting individual behavior. It’s not that we don’t try. We’re just not very good at it.

Science is really good at predicting a group. We collect data on a group, and use it to predict what the group, in general, will do. If you want to know if a drug works, we gather a group of people together, randomly assign them to wonder-drug and sugar-pill treatment conditions, and we see who lives and dies.

Then, we tell you that the new drug is great. But what we mean to say is that they it is generally great. Most of the people in the study did better on the drug. A few individuals had fabulous results. And a few people died. But, overall, it’s a great drug.

Unfortunately, you’re not a group. As an individual, it is extremely hard to predict what will happen to you on the new drug. You might be like most people. If so, take the drug.

You might be unusual. If you’re unusual, the drug will do wonders or kill you. But we can’t say which it will be.

Can we do any better predicting what movies you will watch? Yes and no.

On the yes side, there are several helpful factors. First, you do this behavior more than once. Science is always better at predicting events that repeat themselves. If you regularly rent movies, there might well be a pattern in your behavior. So if you’ve rented thousands of movies, predicting your behavior is feasible. In fact, if you watch lots of movies, you’re probably up for watching anything. But if you have only rented one movie, guessing your next movie is extremely difficult.

Science would prefer to predict a group, not an individual. And it would prefer to predict regularly repeating behavior, not occasional, periodic, or spurious behavior. If you blow up buildings on a regular basis, predicting that you’ll be violent in the future isn’t so hard. If you only blow up a building here or there, it’s difficult to model that behavior.

In 2006, Netflix offered a million dollars to anyone who could improve the predicting accuracy of Cinematch, their how-about-renting-this-movie software. To reach that goal, programmers tried to model consumer behavior.

The contestants were given a large file containing movie titles and dates. No information about the customers was included. So predicting individual behavior wasn’t possible. Like testing a new drug, movie ratings predict what will happen to a group of scores. The software will only predict you to the extent that you are a lot like the people in the data set.

Predicting movie ratings is even harder than it might seem. Remember, the ratings are on a 5-point scale. And that scale uses ordinal numbers.

Ordinal numbers give us 1st, 2nd and 3rd place, but no information about how close the race was. First and second places could be really close, or quite far apart. So you might score Jaws high but is it a lot higher or only slightly higher than any other Spielberg film?

A related problem is that people aren’t consistent in their ratings. When you ask people to re-rate a movie, they give the same answer. This isn’t surprising, given that moods change. Although inconsistent ratings probably happen more often with those in the middle, we have to be in the right mood for even movies we love.

Predicting starts with a simple linear regression (see Day 5). You gather data and see what the general pattern is. If there is one simple straight line, your task will be quite easy. But more complicated data sets often require more work.

The general term is called modeling. Essentially, you calculate the correlation (see Day 4) between all of the variables, and see if you can find patterns or clusters of correlations. If you rated one funny movie high, you might do the same with another funny movie.

Here’s how modeling works. Start by imagining a room where movie titles are floating in air. When you look closer, you can see that all of the funny movies are floating near the ceiling; and the dark, scary films are near the floor.

You also notice that they are arranged left to right by their target age group: kid movies to the left, senior citizens to the right. The third dimension, the depth of the room, indicates popularity (most-liked to least-liked). This three dimensional space is a model of what these movies have in common.

Now think of these titles as flowing through the room, changing every few seconds. It’s a stream of information that has spurts, lulls, waves and transitions. In this rapidly changing sea of data, try to hit one title with a dart. Think of it as “pin the tail on the movie.” It is not an easy task.

In addition, you need to add more dimensions; three is not enough to describe them all. Movies vary on theme, quality of photography, cleverness of titles, the fame of actors, the quality of directors, the skill of editors, and on their cultural, spiritual, religious and political context. They could also be rating on happy endings, exotic settings, and intricate costumes.

Descriptions of these inter-correlations are called maps. Not only do they help clarify the data, they can also be quite pretty (see http://www.the-ensemble.com/).

Statistics often looks for consistent patterns. We predict what is best for groups of people based on other group data. We are great at predicting what large groups of people repeated do. We’re pretty good at predicting what large groups of people sometimes do. And we’re lousy at predicting what you will do.

Filed Under: Statistics

April 1, 2023 by ktangen

Terms Statistics

Vocab

 

Day 1: Measurement

  • constant vs variable
  • construct
  • convenient pool
  • continuous vs discrete
  • criteria vs predictors
  • CUSSIT
  • dependent vs independent variable
  • descriptive vs inferential statistics
  • dichotomous
  • hypothesis
  • interval scale
  • intervening variable
  • levels of measurement
  • model vs theory
  • moderator variable
  • nominal scale
  • operational definition
  • ordinal scale
  • population vs sample
  • population of interest
  • random sampling
  • random selection
  • ratio
  • stratification
  • suppressor variable
  • testable hypothesis

 

Day 2: Central Tendency

  • a theoretical
  • average
  • bell-shaped curve
  • bimodal distribution
  • case notes; case study
  • central tendency
  • data matrix
  • frequency distribution
  • histogram
  • interval scale
  • mean
  • median
  • mode
  • N
  • N = 1 experiment
  • naturalistic observation
  • negatively skewed distribution
  • nominal scale
  • normal curve
  • observer effect
  • ordinal scale
  • outlying score
  • positively skewed distribution
  • ratio scale
  • replication
  • self-report
  • skewed distribution
  • Skinner box
  • X
  • SX

 

Day 3: Dispersion

  • intervening variable
  • levels of measurement
  • model vs theory
  • moderator variable
  • nominal scale
  • operational definition
  • ordinal scale
  • population vs sample
  • population of interest
  • random sampling
  • random selection
  • ratio
  • stratification
  • suppressor variable
  • testable hypothesis

 

Day 4: z-scores

  • baseline
  • checklist
  • cutoff score
  • grade equivalent
  • linear transformation
  • magnitude
  • normalized distribution
  • percentile
  • sign
  • standardized distribution
  • standardized score
  • z-score

 

Day 5: Correlation

  • coefficient of determination
  • coefficient of nondetermination
  • correlation
  • correlation coefficient
  • df
  • linear
  • magnitude
  • monotonic
  • negative correlation
  • parallel forms
  • Pearson r
  • phi
  • point-biserial
  • positive correlation
  • r
  • r2
  • reliability
  • scatterplot
  • sign
  • significance
  • split half
  • test-retest
  • validity

 

 

Day 6: Regression

  • a
  • b
  • egression line:
  • extrapolation
  • intercept
  • interpolation
  • regression
  • slope
  • standard error of estimate (see)
  • straight line
  • time series
  • X predicting Y
  • Y predicting X

 

 

Day 7: Probability 

  • alpha level
  • Analysis of Regression (ANOR)
  • checklist
  • criterion reference testing
  • critical values of F
  • df
  • F
  • k
  • k-1
  • mean squares
  • mean squareserror
  • mean squaresregression
  • N-1
  • N-k
  • partitioning
  • Type I error
  • Type II error

 

 

Day 8: Independent t-test 

  • 1-tailed test
  • 2-tailed test
  • correlated t-test
  • critical value
  • df
  • estimation
  • hypothesis testing
  • independent t-test

 

 

Day 9: One-Way ANOVA

  • 1-Way ANOVA
  • ANOVA
  • confounded
  • controls
  • F
  • mean squares
  • msbetween
  • mswithin
  • repeated measures design
  • SS within
  • SSbetween
  • SStotal
  • within-subjects design

 

Day 10: Advanced Procedures

  • 2Ă—3
  • causal modeling
  • cell
  • continuous predictor
  • dependent variable
  • discrete predictor
  • factorial ANOVA
  • independent variable
  • interaction
  • main effect
  • MANOVA
  • modeling
  • Multiple Regression
  • multivariate
  • multivariate analysis
  • n
  • N
  • two by three

Filed Under: Statistics

  • « Go to Previous Page
  • Page 1
  • Page 2
  • Page 3
  • Page 4
  • Page 5
  • Interim pages omitted …
  • Page 11
  • Go to Next Page »

Footer

Search

KenTangen.com

My Channel

Copyright © 2025 · Executive Pro on Genesis Framework · WordPress · Log in