Science does a terrible job predicting individual behavior. It’s not that we don’t try. We’re just not very good at it.
Science is really good at predicting a group. We collect data on a group, and use it to predict what the group, in general, will do. If you want to know if a drug works, we gather a group of people together, randomly assign them to wonder-drug and sugar-pill treatment conditions, and we see who lives and dies.
Then, we tell you that the new drug is great. But what we mean to say is that they it is generally great. Most of the people in the study did better on the drug. A few individuals had fabulous results. And a few people died. But, overall, it’s a great drug.
Unfortunately, you’re not a group. As an individual, it is extremely hard to predict what will happen to you on the new drug. You might be like most people. If so, take the drug.
You might be unusual. If you’re unusual, the drug will do wonders or kill you. But we can’t say which it will be.
Can we do any better predicting what movies you will watch? Yes and no.
On the yes side, there are several helpful factors. First, you do this behavior more than once. Science is always better at predicting events that repeat themselves. If you regularly rent movies, there might well be a pattern in your behavior. So if you’ve rented thousands of movies, predicting your behavior is feasible. In fact, if you watch lots of movies, you’re probably up for watching anything. But if you have only rented one movie, guessing your next movie is extremely difficult.
Science would prefer to predict a group, not an individual. And it would prefer to predict regularly repeating behavior, not occasional, periodic, or spurious behavior. If you blow up buildings on a regular basis, predicting that you’ll be violent in the future isn’t so hard. If you only blow up a building here or there, it’s difficult to model that behavior.
In 2006, Netflix offered a million dollars to anyone who could improve the predicting accuracy of Cinematch, their how-about-renting-this-movie software. To reach that goal, programmers tried to model consumer behavior.
The contestants were given a large file containing movie titles and dates. No information about the customers was included. So predicting individual behavior wasn’t possible. Like testing a new drug, movie ratings predict what will happen to a group of scores. The software will only predict you to the extent that you are a lot like the people in the data set.
Predicting movie ratings is even harder than it might seem. Remember, the ratings are on a 5-point scale. And that scale uses ordinal numbers.
Ordinal numbers give us 1st, 2nd and 3rd place, but no information about how close the race was. First and second places could be really close, or quite far apart. So you might score Jaws high but is it a lot higher or only slightly higher than any other Spielberg film?
A related problem is that people aren’t consistent in their ratings. When you ask people to re-rate a movie, they give the same answer. This isn’t surprising, given that moods change. Although inconsistent ratings probably happen more often with those in the middle, we have to be in the right mood for even movies we love.
Predicting starts with a simple linear regression (see Day 5). You gather data and see what the general pattern is. If there is one simple straight line, your task will be quite easy. But more complicated data sets often require more work.
The general term is called modeling. Essentially, you calculate the correlation (see Day 4) between all of the variables, and see if you can find patterns or clusters of correlations. If you rated one funny movie high, you might do the same with another funny movie.
Here’s how modeling works. Start by imagining a room where movie titles are floating in air. When you look closer, you can see that all of the funny movies are floating near the ceiling; and the dark, scary films are near the floor.
You also notice that they are arranged left to right by their target age group: kid movies to the left, senior citizens to the right. The third dimension, the depth of the room, indicates popularity (most-liked to least-liked). This three dimensional space is a model of what these movies have in common.
Now think of these titles as flowing through the room, changing every few seconds. It’s a stream of information that has spurts, lulls, waves and transitions. In this rapidly changing sea of data, try to hit one title with a dart. Think of it as “pin the tail on the movie.” It is not an easy task.
In addition, you need to add more dimensions; three is not enough to describe them all. Movies vary on theme, quality of photography, cleverness of titles, the fame of actors, the quality of directors, the skill of editors, and on their cultural, spiritual, religious and political context. They could also be rating on happy endings, exotic settings, and intricate costumes.
Descriptions of these inter-correlations are called maps. Not only do they help clarify the data, they can also be quite pretty (see http://www.the-ensemble.com/).
Statistics often looks for consistent patterns. We predict what is best for groups of people based on other group data. We are great at predicting what large groups of people repeated do. We’re pretty good at predicting what large groups of people sometimes do. And we’re lousy at predicting what you will do.