Day 10: Complex models

It is helpful to have an overview of designs more advanced than those covered in a typical statistics course. Complex models build on the principles we already discussed. Although their calculation is beyond the scope of this discussion (that’s what computers are for), here is an introduction to procedures that use multiple predictors, multiple criteria and multivariate techniques to test interactions between model components.

Advanced procedures: discovering light

Until now, our models have been quite simple. One individual, one group, or one variable predicting another. We have explored the levels of measurement, the importance of theories and how to convert theoretical constructs into model variables. We have taken a single variable, plotted its frequency distribution and described its central tendency and dispersion. We have used percentiles and z-scores to describe the location of an individual score in relation to the group.

In addition to single variable models, we studied two variable models, such as correlations, regressions, t-tests and one-way ANOVAs. We have laid a thorough foundation of research methods, experimental design, and descriptive and inferential statistics.

Despite their simplicity, these procedures are very useful. You can use a correlation to measure the reliability and validity of a test, machine or system of management, training or production. You can use a linear regression to time data a rare archaeological find, predict the winner of a race or analyze a trend in the stock market. You can use the t-test to test a new drug against a placebo or compare 2 training conditions. You can use the 1-way ANOVA to test several psychotherapies, compare levels of a drug or brands of computers.

Also, the procedures we’ve studied so far can be combined into more complex models. The most complex models have more variables but they are variations of the themes we have already encountered.

ANOR

Analysis of variance (ANOR) tests a regression to see how straight of a line it is. It is a goodness of fit test. It tests how good the data fits our straight line model.

Starting off, we assume our data looks like chance. It is not an organized pattern; it’s a circle with no linearity. Our null hypothesis is that our data has no significant resemblance to a straight line. We are assuming our data will not match (fit) our model (straight line). We will keep that assumption until it is clear that the data fits the model. But the fit has to be good; it has to be significant.

We are using X to predict Y. We are hoping the variations in Y can be explained by the variations in X. Prediction is based on commonality. When X and Y are highly correlated, it is easy to make predictions from one variable to another. When there is little or no correlation, X is not a good predictor of Y; they are operating independently.

In statistic talk, an ANOR partitions the variance into mean squares Regression (what we understand) and mean squares Error (what we can’t explain). Mean squares is another name for variance. We are going to make a ratio of understood variance to not-understood variance. We will compare this ratio with the values in an F table.

Factorial ANOVA

A factorial AVOVA is good for testing interactions. It is like combining 1-way ANOVAs together. The purpose of combining the designs is to test for interactions. A 1-way ANOVA can test to see if different levels of salt will influence compliments but what happens if the soft drink is both salty and sweet.

Interactions can be good or bad. Some heart medications work better when given together. For example, Digoxin and calcium channel blockers go together because they work on different channels. Together they are better than each would be separately. But other heart medications (phenylpropanolamine with MAO inhibitors) can result in fast pulse, increased blood pressure, and even death. This is why we’re often warned not to mix drugs without checking with our doctor.

The ability to check how variables interact is the primary advantage of complex research designs and advanced statistical techniques. Although a 1-Way ANOVA can test to see if different levels of aspirin help relieve headaches. A factorial ANOVA can be used to test both aspirin and gender as predictors of headaches. Or aspirin, gender, time of day, caffeine, and chicken soup. Any number of possible explanations and combination of explanations can be tested with the techniques of multiple regression, MANOVA, factorial ANOVA and causal modeling.

A factorial ANOVA tests the impact of 2 or more independent variables on one dependent variable. It tests the influence of many discrete variables on one continuous variable. It has multiple independent variables and one dependent variable.

Advanced procedures: 1-way ANOVA

A 1-way ANOVA model tests multiple levels of 1 independent variable. Let’s assume the question is if stress causes people to work multiplicationn problems. Subjects are randomly assigned to a treatment level (high, medium and low, for example) of one independent variable (stress, for example). And their performance on one dependent variable (number of errors) is measured.

If stress impacts performance, you would expect errors to increase with the level of attention. The variation between the cells is due to the treatment given you. Variation within each cell is thought to be due to random chance.

A 2-way ANOVA has 2 independent variables. Here is a design which could look at gender (male; female) and stress (low, medium and high):

It is called a 2×3 (“two by three”) factorial design. If each cell contained 10 subjects, there would be 60 subjects in the design. A design for amount of student debt (low, medium and high) and year in college (frosh, soph, junior and senior) and ) would have 1 independent variable (debt) with 3 levels and 1 independent (year in school) with 4 levels.

Advanced procedures: 3x4 design
This is a 3×4 factorial design. Notice that the number (3, 4, etc) tells how many levels in the independent variable. The number of numbers tells you how many independent variables there are. A 2×4 has 2 independent variable. A 3×7 has 2 independent variables (one with 3 levels and one with 7 levels). A 2x3x4 factorial design has 3 independent variables.

Factorial designs can do something 1-way ANOVAs can’t. Factorial designs can test the interaction between independent variables. Taking pills can be dangerous and driving can be dangerous; but it often is the interaction between variables that interests us the most.

Advanced procedures: main effect Analyzing a 3×4 factorial design involves 3 steps: columns, rows and cells. The factorial ANOVA tests the columns of the design as if each column was a different group. Like a 1-way ANOVA, this main effect tests the columns as if the rows didn’t exist.

The second main effect (rows) is tested as if each row was a different group. Advanced procedures: nd main effect
It tests the rows as if the columns didn’t exist. Notice that each main effect is like doing a separate 1-way ANOVA on that variable.

Advanced procedures: interaction The cells also are tested to see if one cell is significantly larger (or smaller) than the others. This is a test of the interaction and checks to see if a single cell is significantly different from the rest. If one cell is significantly higher or lower than the others, it is the result of a combination of the independent variables.

Multiple Regression

An extension of simple linear regression, multiple regression is based on observed data. In the case of multiple regression, two or more predictors are used; there are multiple predictors and a single criterion.

Let’s assume that you have selected 3 continuous variables as predictors and 1 continuous variable as criterion. You might want to know if gender, stress and time of day impact typing performance.

Each predictor is tested against the criterion separately. If a single predictor appears to be primarily responsible for changes in the criterion, its influence is measured. Every combination of predictors is also used tested. So both main effects and interactions can be tested. If this sounds like a factorial ANOVA, you’re absolutely correct.

You could think of Multiple Regression and ANOVA as siblings. factorial ANOVAs use discrete variables; Multiple Regression uses continuous variables. If you were interested in using income as one of your predictors (independent variables), you could use discrete categories of income (high, medium and low) and test for significance with an ANOVA. If you wanted to use measure income on a continuous variable (actual income earned), the procedure would be a Multiple Regression.

You also could think of Multiple Regression and the parent of ANOVA. Analysis of Variance is actually a specific example of Multiple Regression; it is the discrete variable version. Analysis of Variance uses categorical predictors. Multiple Regression can use continuous or discrete predictors (in any combination); it is not restricted to discrete predictors.

Both factorial ANOVA and Multiple Regression produce a F statistic and both have only one outcome measure. Both produce a F score that is compared to the Critical Values of F table. Significance is ascribed if the calculated value is large than the standard given in the table.

Both procedures have only one outcome measure. There may be many predictors in a study but there is only one criterion. You may select horse weight, jockey height, track condition, past winning and phase of the moon as predictors of a horse race but only one outcome measure is used. Factorial ANOVA and Multiple Regression are multiple predictor-single criterion procedures.

Multivariate Analysis

Smetime called MANOVA (pronounced man-o-va), multivariate analysis is actually an extension of multiple regression. Like multiple regression, multivariate analysis has multiple predictors. In addition to multiple predictors, multivariate analysis allows multiple outcome measures.

Advanced procedures: multivariate analysis Now it is possible to use gender, income and education as predictors of happiness AND health. You are no longer restricted to only a single criterion. With multivariate analysis, the effects and interactions of multiple predictors can be examined. And their impact on multiple outcomes can be assessed.

The analysis of a complex multiple-predictor multiple-criteria model is best left to a computer but the underlying process is the calculation of correlations and linear regressions. As variables are selected for the model, a decision is made whether it is predictor or a criterion. Obviously, aside from the experimenter’s theory, the choice of predictor or criterion is arbitrary. In multivariate analysis, a variable such as annual income could be either a predictor or a criterion.

Complex Modeling

There are a number of statistical procedures at the high end of modeling. Relax! You don’t have to calculate them. I just want you to know about them.

In particular, I want to make the point that there is nothing scary about the complex models. There are involved and require lots of tedious calculations but that’s why God gave us computers. Since we are blessed to have stupid but remarkably fast mechanical slaves, we should let them do the number crunching.

Advanced procedures: causal modeling

It is enough for us to know that a complex model—at its heart—is a big bundle of correlations and regressions. Complex models hypothesize directional and nondirectional relationships between variables. Each factor may be measured by multiple measures. Intelligence might be defined as the combination of 3 different intelligence tests, for example.

And income might be a combination of both salary plus benefits minus vacation. And education might be years in school, number of books read and number of library books checked out. The model, then, becomes the interaction of factors that are more abstract than single variable measures.

Underlying the process, however, are principles and procedures you already know. Complex models might try to determine if one more predictor helps or hurts but the model is evaluated just like a correlation: percentage of variance accounted for by the relationships.

Advanced Procedures