Simulation
- Describe the clinic’s clients
- Does Daniel qualify for anger management training? (checklist)
- Thomas is acting up in school (base rate; ABA reversal
- Intervention team (case study)
- Customer satisfaction (survey)
- Knowledge (test construction)
- (speeded)
- (mastery; criterion referenced)
- Carlos wants to take advanced physics
Rosa wants in a gifted program (z) - Transform a distribution (normalize)
- Fire Dept want to hire 3 people (reliability)
- School wants to use personality test to hire teachers (validity)
- Wants to know how well child is doing in school? (ethics)
- Predict school performance from IQ (regress)
- Predict clinic income for next year (time series)
- Predict summer employment (curvalinear)
- Does giving clients homework have a significant impact on therapy? (ANOR)
- Does magnetic therapy work? (t-test)
- (1-way)
- Gender & counselor style (factorial)
- (multiple regression)
- (MANOVA)
1 If Square 1 is thinking up a theory,
2 Lit Review. Square 2 is seeing what others have to say on the matter
3 Select variable. Theories are not directly testable. So using your theory and literature review, you select variables that can be tested.
4 After you build your theory, conduct a lit review, and select your varialbes, it is time to generate operational definitions.
5 pick a design. Which design is best for you? Here are some of your options.
6. Who to study. . You have determined the general parameters of the study, now it is time to figure out who you are going to study.
7 Random selection. Now that you know who to study, you must decide how to choose them.
8 Prepare your materials. After choosing your subjects, it’s time to get everything ready for them. You have to write your tests, create your slides, build your maze and setup your equipment.
9 Write a proposal. When everything is set, you need to get permission from your Institutional Review Board (IRB) to conduct your study. To do so, you write a proposal.
10 Conduct study. The first 9 squares were getting you ready for this. In Square 10, you get informed consent from your participants and collect the data.
11 Data table, matrixWhat do you do when faced with a pile of numbers? Here’s where you begin:
Square 12: Levels of Measurement
July 11, 2009 by kltangen
Filed under Square One, Videos
After you have organized your data, you need to consider what the numbers mean. Are they being used as names, rankings, quasi-numbers or ratios? Which level of measurement is involved?
In other words, you’re at Square 12
Square 13: Graph It
July 11, 2009 by kltangen
Filed under Square One, Videos
Before you do any major number-crunching, it’s a good idea to get an overview of the data. It’s easy to see things by using histograms, pie chars and frequency distributions.
When you’re ready to making pretty pictures, you’re in Square 13
14 Central Tendency. The 14th episode is about how to find the center of a distribution and why you’d want to. If you’re going descriptive statistics, you’re at Square 14
aaaaaaaaa
Summary of Measurement
NOMINAL
Used as a name
Makes no mathematical assumption.
0, 12 and 1 have no preference.
Examples:
The # on a race car
Bank ID number
Airplane model #
Part numbers
The # on the side of your horse
ORDINAL
Used to report rank or order.
Assumes the numbers can be arranged in order. Allows descriptions of 1st, 2nd and 3rd place but steps need not be the same size. Winning a close race receives the same score as an easy win.
Examples:
Finish order in contest
College sports ranking
Rating scales
The finish order of your horse
INTERVAL
Used to count conceptual characteristics (IQ, aggression, etc.)
Assumes numbers indicate equal units. Allows distinctions to be made between difficult and easy races but does not allow “twice as much” comparisons. O does not mean lack of intelligence, etc.
Examples:
The # of test items passed.
Temperature in Fahrenheit
Temperature in Celsius
The # of hurdles your horse jumps
RATIO
Used to measure physical characteristics.
Assumes 0 is absolute (indicates lack of entity being measured). Allows 2:1, 3:2, “twice as much” and “half as much” comparisons. 0 means no time has elapsed or no distance has been traveled, etc.
Examples:
Distance, time and weight
Temperature in Kelvin
Miles per gallon
How fast your horse runs
Summary of Central Tendency
ITEM A
11
3
12
1
3
6
4
3
Mean
Median
Mode
Positively-skewed distribution. Mean will be higher than median and mode. Median is better representative of where most scores are located.
ITEM B
9
8
8
7
6
5
8
1
6
Mean
Median
Mode
Negatively-skewed distribution. Mean will be lower than median and mode. Median is better representative of where most scores are located.
ITEM C
5
5
5
5
5
5
5
Mean
Median
Mode
This is a constant. Everyone has the same score.
Summary of Dispersion
If everyone has the same score, there is no dispersion from the mean. If everyone has a different scores, dispersion is at it’s maximum but there is no commonality in the scores. In a normal distribution, there are both repeated scores (height) and dispersion (width).
Percentiles, quartiles and stanines imply that distributions look like plateaus. Scores are assumed to be spread out evenly, like lines on a ruler. People are nicely organized in equal-sized containers.
SS, variance and standard deviation imply that distributions look like a mountain. Scores are assumed to be clustered in the middle, people are more alike than different. People are mostly together at the bottom on the bowl with a few sticking to the sides.
You can describe an entire distribution as 3 steps (standard deviations) to the left and 3 steps to the right of the mean. The percentages go 2, 14, 34, 34, 14, and 2. This is believed to be true of all normally distributed variables, regardless of what it measures.
Summary of z-Score
A z-score indicates how many steps a person is from the mean. A raw score below the mean corresponds to negative z score; a score which is above the mean would have a positive z. The standard deviation indicates how big each step is. Approximately 68% of the scores lie within one standard deviation of the mean. That is, a majority of the distribution is from z = -1 to z = +1.
There are 5 primary applications of z-scores:
a. locating an individual score
b. using z as a standard. Individual raw scores are converted to z-scores and compared to a set standard. Two common standards are z = 1.65, which represents a 1-tailed area of 95%, and z = + 1.96 or – 1.96 (between which is a 2-tailed area of 95%).
c. standardizing a distribution and smoothing its data.
d. making a linear transformations of variables; converting the mean and standard deviation to numbers that easier to remember or handle.
e. comparing 2 raw score distributions with different means and standard deviations.
Summary of Correlation
-
To measure the strength of relationship between two variables, it would be best to use a correlation
-
A correlation can only be between -1 and +1.
-
The closer the correlation coefficient is to 1 (either + or -), the stronger the relationship.
-
The sign indicates the direction of relationship.
-
The coefficient of determination is calculated by squaring r. The coefficient of determination shows how much area the two variables share; the percentage of variance explained (accounted for).
-
The coefficient of nondetermination is calculated by subtracting the coefficient of determination from 1. The coefficient of nondetermination shows how much the two variables don’t share; the percentage of unexplained variance.
-
To calculate the correlation between two continuous variables, the Person product-moment coefficient is used. To calculate the correlation between two discrete variables, the phi coefficient is used. To calculate the correlation between one discrete and one continuous variable, the point biserial coefficient is used.
-
Correlations are primarily a measure of consistency, reliability, and repeatability.
-
Correlations are based on two paired-observations of the same subjects.
-
A cause-effect relationships has a strong correlation but a strong correlation doesn’t guarantee a cause-effect relationship. In a correlation, A can cause B or B can cause A or both A and B can be caused by another variable. Inferences of cause-effect based on correlations are dangerous. A correlation shows that a relationship is not likely to be due to chance but it cannot indicate which variable was cause and which effect.
-
Test-retest coefficients are correlations.
-
In order to make good predictions between two variables, a strong correlation is necessary.
Summary of Regression
The variable with the smallest standard deviation is the easiest to predict. The less dispersion, the easier to predict.
Without knowing anything else about a variable, the best predictor of it is its mean.
The angle of a regression line is called the slope. Slope is calculated by dividing the Sxy by the SSx.
The point where the regression line crosses the criterion axis is called the intercept.
Predicting the future based on past experience is best done with a regression.
Predicting scores between known values is called interpolation.
Predicting scores beyond known values is called extrapolation.
Regression works best when a relationship is strong and linear.
Regression works best when the correlation is strong.
The error around a line of prediction is consistent along the whole line. It doesn’t vary or waver along the line, so there is only 1 standard error of estimate for the entire line.
The error around a line of prediction can be estimated with the standard error of estimate.
Plus or minus one SEE accounts for 68% of the prediction errors.
A regression is based on paired-observations on the same subjects.
Pre- and Post-test performance is best analyzed by using a regression.
Summary of Advanced Procedures
The General Linear Model is “general” because it includes a broad group of procedures, including correlation, regression and the more complex linear models. And it includes both continues and discrete variables. It’s linear because it assumes that the relationship between model components is consistent. When one variable goes up (has larger numbers), the other variable consistent reacts. The reaction can be go the same way (positive) or go the opposite way (negative). But the assumption is that changes in one variable will be accompanied by changes in the other variable.
Another assumption is that causation may not be proved but it can be inferred. Although random assignment might increase one’s confidence in cause-effect conclusions, causation can be inferred based simply on consistency. Such an assumption can be risky but we do it all the time. We assume that the earth gets warm because the sun rises. We’ve never randomly assigned the sun to rising and now rising conditions. But we feel quite confident is our conclusion that the sun causes the heat, and no the other way around.
Here are nine applications of the General Linear Model
Continuous Models compare:
a. frequency distribution One variable (predictor or criterion)
b. correlation Two regressions
c. regression Single predictor; single criterion Same as F test or t-squared
d. multiple regression Multiple predictors; single criterion Same as ANOVA
e. multivariate analysis Multiple predictors; multiple criteria
f. causal modeling Multiple measures of a factor
Discrete Models Compare:
a. t-test 2 means; 1 independent variable
b. one-way ANOVA 3 or more means; 1 independent variable
c. factorial ANOVA Multiple means on 2+ independent variables
stem-leaf graph
histogram
box plot
scattergram
mean
median
mode
range
variance
standard deviation
quartiles
interquartile range
correlation coefficient
population
sample
probability theory
binomial distribution
normal distribution
t-test and t distribution
F test and F distributions
chi-squared probability distributions
estimation procedures
confidence interval
hypothesis testing
t-tests
analysis of variance
goodness of fit
contingency tables