What do you get when you mix behaviorism and humanism? You get social learning theory.
The learning part of social learning theory comes from Skinner and Hull. The social part comes from Maslow. Mix them together and you get Bandura, Dollard & Miller, and Rotter.
Skinner, BF
Born in Susquehanna, Pennsylvania, Skinner was an English major in college (Hamilton College) and then pursued psychology (at Harvard). In contrast to Hull, Skinner approached psychology inductively. He proposed an atheoretical methodology which preferred operational definitions to intervening variables.
Best known for his model of learning, Skinner emphasized the importance of what happens after a response. Not S-R, but S-R-C (stimulus-response-consequence), Skinner expanded Thorndike’s law of effect to an entire system of reinforcement.
In place of classical respondent conditioning, Skinner proposed operant conditioning. According to his model, behavior which is followed by a positive reinforcer (reward) is more likely to occur.
Conceding that there are too many stimuli to categorize, Skinner focused on the response and its consequence. Positive reinforcers increase behavior strength; positive punishment decreases behavior temporarily (as long as the punisher is present). Only extinction (the continued absence of a reward) decreases behavior permanently (e.g., if they stop paying you, you don’t go to work). Negative reinforcement (the removal of something bad) increases the likelihood of behavior and negative punishment (the removal of something good) temporarily decreases it.
Note that Skinner did not hypothesize drive, insight or any internal process. He didn’t necessarily deny their existence as much as thought them to be unknowable. For Skinner, if it didn’t impact behavior, whatever went on in the black box of the mind was unimportant.
Basing his findings on animal research (mostly rats and pigeons), Skinner identified five schedules of reinforcement: continuous reinforcement, fixed interval (FI), fixed ratio (FR), variable interval (VI) and variable ratio (VR). Continuous reinforcement is used to shape (refine) a behavior. Every time the subject performs the desired behavior, it is rewarded. Continuous reinforcement leads to quick learning and (after the reinforcement is stopped) quick descent.
Fixed Interval (FI) describes the condition where a certain amount of time must past before a correct response is rewarded (e.g., getting paid every two weeks). FI produces a “scalloped” pattern (the closer it gets to pay day the more often the proper response is given).
Fixed Ratio (FR) requires a certain number of responses to be made before a behavior is rewarded (e.g., 10 widgets must be made before you are paid). In Variable Interval and Variable Ratio schedules of reinforcement, the required amount of time or the number of responses varied. These partially reinforcement schedules (never quite sure when you’ll be rewarded) are quite resistant to extinction.
According to Skinner, rewards should be given appropriately. Parents should reward behaviors they want and ignore (extinguish) behaviors they don’t want. Giving attention to a child (such as when giving a punishment) actually rewards the child with your presence and sends a mixed message. Behavior can be shaped by rewarding successive approximations but practice without reinforcement doesn’t improve performance.
Skinner relied heavily on replication. His experimental evidence did not rely on statistical analyses or large subject pools. He performed carefully designed experiments with strict controls and simply counted the responses.
In an attempt to apply his research to practical problems, Skinner adapted his operant conditioning chamber (he hated the popular title of “Skinner box”) to child rearing. His “Baby Tender” crib was an air conditioned glass box which he used for his own daughter for two and a half years. Although commercially available, it was not a popular success.
During WWII, Skinner designed a missile guidance system using pigeons as “navigators.” Although his system was feasible, the Army rejected it out of hand. The PR problems of pigeon bombers must have been extensive.
Skinner’s also originated programmed instruction. Using a teaching machine (or books with small quizzes which lead to different material), small bits of information are presented in an ordered sequence. Each frame or bit of information must be learned before one is allowed to proceed to the next section. Proceeding to thenext section is thought to be rewarding.
Hull, Clark (1884-1954)
Clark Leonard Hull had a tough childhood. In his late teens, an outbreak of typhoid fever took the lives of several of his classmates. Then, at the age of 24, Hull contracted polio, which precipitated his change from mining engineer to psychologist.
Hull was skilled at inventing equipment his needed to perform an experiment. For a study on the effect of tobacco on performance, he designed a system for delivering heated air (tobacco and tobacco filled) to the subjects so they would not know which experimental treatment they were receiving. Similarly, Hull constructed a machine to calculate inter-item correlations for a series of studies he performed on aptitude testing.
Not surprisingly, Hull believed that people are basically machines. His complex theory of learning is a combination of Newton’s deductive method, Pavlov’s classical conditioning, and Euclidean geometry. For Hull, experimental observations were validity checks on the internal postulates he had previously deduced.
Hull’s Hypothetico-Deductive Theory includes habit strength (the tendency to respond), evenly spaced trials, and reinforcement. Using inferred state and intervening variables, Hull described learning as an interactive system of probabilities.
Too complex for many and too theoretical for others, Hull was a pioneer in using animal research to generalize to human behavior. Despite his poor eyesight and poor health, Hull set a standard of experimental excellent and theoretical integrity which still serves as a model today.
Bandura, Albert (1925 -1921).
Although trained in behaviorism, Albert Bandura (1925-present) maintained that it would take too long for people to learn everything by associating stimuli or being rewarded. We are much more capable than that. According to Bandura, people primarily learn by watching others.
Vicarious learning resonates with personal experience. Most can remember learning how to do something by watching their parents, siblings or friends do it. We watch people we know to learn how to fit in, what to do in a crisis, how to behave in public, and how to treat those we love. We watch famous people and learn how to wear our clothes, have elaborate weddings or adopt children from other countries. Demonstration learning is also a popular pastime on television: how to cook, how to buy a house, how to play golf, and how to behave when you’ve made a winning goal.
The process of observation learning is pretty straightforward. We watch what someone does. We make a mental note (representation) of how they did it. And we use our mental model as a guide of how we should behave. Bandura suggests four stages in the modeling process: attention (tracking the environment), retention (converting observations into a cognitive rule), reproduction (being able to apply the rule correctly) and motivation (having a reason to do the behavior).
When I was growing up, I watched my older brothers, and used them as models of how (and how not) to behave. I learned how to tie a Winsor knot, lift weights, snap a wet towel at a friend, and, most of all, how to look cool. On more than one occasion, I watched what they with the intention of learning from them. I found that modeling can help but it’s not without its difficulties.
Before digital photography, making even a small contact print seemed like magic. With the lights on, you opened the lid of a little box, and put a negative on the glass. In the dark, you put a piece of photosensitive paper on top of the negative, closed the door, pushed a button to turn on a light which would shine through the negative onto the paper, counted for a few seconds, let go of the button (to turn off the light), placed the photosensitive paper into a tray with some chemicals, counted for several seconds, moved the paper to another tray with different chemicals, counted for a while, moved the paper to another tray and swooshed it around for a bit. If you did all of this correctly, an image formed on the paper, and you were the greatest (or at least proudest) photographer in the world. Simple, right?
I watched my brother do it thousands of times (according to ten-year-old world view). I was sure I could do it too. He didn’t want to be bothered with me, so he went off to do something else. But I could use his equipment, as long as I swore (repeatedly) not to break the equipment or screw up. So I started. I put the negative in place, added the paper, counted carefully, moved the paper, counted…I did everything just as I’d seen him do. But it didn’t work.
I tried it again; still didn’t work. Finally, sheepishly, I asked him for help. After yelling a bit at me for wasting his valuable paper and time, he (at my parents’ urging) asked me exactly what I had done. I told all of the steps I’d taken: placing the negative and the paper, closing the lid, counting..everything. “What about pushing the button?” he asked. “Button?” I said. “What button?”
While watching him, I was able to observe everything he did, except pushing the button; it was on his side of the table. And since he wasn’t explaining the process to me (his tolerance only extended to allowing me to be present), I didn’t know there was a button to be pushed.
I had correctly encoded what I saw. I also retained the rule and was able to apply it correctly. My retention, reproduction and motivation stages were above reproach. I even correctly encoding what I saw. But clearly, if our knowledge of the environment is incomplete, we won’t encode the model properly. Of the four stages, attention is the most important.
Most people don’t have a problem with the retention stage of modeling. We are very efficient at converting our observations into cognitive rules. Parents are often surprised when their children put a rule into action: swearing when you bash your finger with a hammer, kicking the cat when you’re angry, yelling at cars who have cut you off, licking your fingers while eating, and shopping when you’re depressed.
Reproduction, however, is not quite as simple. I’ve watched many Olympic and professional athletes. I’ve seen basketball players who can jump up, backwards and throw the ball at the same time. I’ve seen marathon runners, speed skaters, and pole-vaulters perform. But I can’t reproduce those behaviors. I get the idea of how to play golf (use a stick to hit a ball into the hole) but I can’t do it well. Reproduction of ideas into practice is the most difficult stage of modeling.
The last stage, motivation, is where Bandura is most at odds with Skinner. According to operant conditioning, reinforcement is necessary for learning to occur. But Bandura believes learning and performance are separate items. Acquiring the rule and applying it occur is separate stages. According to Bandura, learning occurs prior to performance. You watch and learn, even if you don’t do the behavior. Reinforcement only impacts the likelihood of applying what you know.
To understand people, we must look at the environment and behavior, but we also must include the person: how the model was encoded, which goals were in operation. Bandura calls this interaction between person, behavior and environment “reciprocal determinism.” His theory is behaviorism+. Environment-behavior plus what happens inside the person.
Bandura points out that people are motivated by their goals and dreams. People more likely to perform a modeled behavior if the consequence is something they value (helps reach their goal). Consequently, we tend to model ourselves after people who are similar to us or those we admire (want to be similar to). We provide our own rewards (self-reinforcement), and are capable of delaying gratification. Learning occurs by observing others to getting an idea of how to behave, and using this information as a guide of what to do in future situations. Learning is more than imitation. It is a process of active discovery.
Discovery learning is more than just observing. The key difference is active encoding. Observing is the first step but then the observations have to be converted into symbols. The encoding of a model produces bettern retention than simply observing. Whether the information is converted into words or images (Bandura doesn’t specify how the model is encoded), the conversion of an observation into a mental representation improves retention.
Modeling can, of course, be extended beyond behaviors. It can also be applied to modeling attitudes and emotions. TV commercials are essentially modeling sessions. You watch people you admire use products the sponsors want to sell. The reasoning is if you admire the model or the outcome goal, you’ll want to buy the product. For Bandura, modeling explains the acquisition of behaviors and attitudes.
One application of Bandura’s theory is self-regulation. A three-step process, self-regulation is essentially using standard as a model. The first step is self-observation. Look at yourself and track the behavior you want to change. If you wish, you can use a behavioral chart or diary for documentation. The second step is judgment. Compare your behavior with a standard: you internal standard or an external one (what your friends do, what your doctor says, etc.). Judging also implies the establishment of a goal (walk a mile per day, read a book a month, etc.). The third step is to convert your personal rule into action. This “self-response” step includes rewarding yourself when you meet your standard and punishing yourself when you don’t.
Self-regulation also includes things that didn’t make it into the steps. Some form of environment planning and intervention is also included. You should alter your environment to make reaching your goal easier. Throw out the cookies, cakes and ice cream, so you won’t eat sweets. Or pour the booze down the sink. Remove things that might interfere with your goal, or at least avoid some of the cues. Bandura also recommends personal contracts. The contracts should be specific, written, and witnessed. They should clearly state the behavior to be performed and the respective consequences for compliance and non-compliance.
Like Skinner, Bandura is not in favor of using punishment excessively. He proposed three possible consequences for excessive punishment: (a) compensation (acting as if you were superior to cover your failures), (b) inactivity (being bored, depressed and inactive), and (c) escape (into fantasy, drugs, etc.). Bandura didn’t specify why one outcome would be more or less likely for an individual…which pretty much makes the prediction useless. But his general rule still stands: don’t be mean to yourself.
Dollard, John (1900-1980) & Miller, Neal (1909-2002).
It was the 1960s, and everyone was interested in self discovery, cross-disciplinary education, and making-love-not-war. In this environment, old theories were explained in new terms, often by adding a social dimension. One such effort at Yale, found John Dollard (anthropologist) and Neal Miller (psychologist) joining forces to explain psychoanalytic principles in more modern terms. The result was Dollard-Miller’s psychoanalytic learning theory.
They combined Sigmund Freud and Clark Hull. Hull maintained that behavior is reinforced by drive reduction. Drives are strong stimuli that produce discomfort (hunger, thirst, etc.). A drive impels us to action when we encounter a cue. You’re already hungry (drive) when you hear your tummy growl (cue). The cue triggers a behavior designed to reduce the drive (get up and go to the kitchen). If you are successful in reducing the drive (you find a bag of cookies), the reduction in hunger reinforces that sequence, making it more likely to happen next time you’re hungry and hear your tummy growl.
Primary reinforcers are events that reduce primary drives (physiological processes). Secondary reinforcers are events that reduce learned drives (acquired drives). That’s why eating a cookie doesn’t make you feel better about being lonely. Cookies can reduce the primary drive of hunger but not the secondary drive of feeling loved.
For Dollard & Miller, learning combines four processes: drive, cue, response and reinforcement. Drive is the engine. The cue tells you when, where and how to respond. Your response is any behavior or sequence of behaviors you perform. And reinforcement is the consequence of drive being reduced. If your behavior isn’t reinforced, that behavior will be extinguished (disappear). But the process doesn’t stop there. You keep trying different responses until one of them satisfies the drive. Like most drive theories, Dollard and Miller don’t explain where the drives come from; they settle for its being a given.
The best way to understand Dollard and Miller is to pretend you are a mouse in a maze. Having run this maze before, you quickly head toward to food but discover that the path you usually take has been blocked. This is Dollard and Miller’s definition of frustration: a blocked attempt to reduce drive. As a mouse, you scratch at the floor, try to climb the maze, bite at the blockage, and rush around in an agitated state. As a human, you do pretty much the same when your goals are blocked. If you lock your keys in the car, you stomp on the ground, yell at the car and pound on its window. All because your goal is blocked.
Frustration can also come from being unable to do two things at once. When the frustration is severe, Dollard and Miller call it conflict. Conflict is having incompatible responses that occur at the same time. It is the inability to respond simply to the drives that have been trigger. Conflict is trying to do two incompatible things at the same time.
There are four types of conflicts. Approach-approach is the choice between two things you like. It is the choice between cake and ice cream. In this situation, you tend to chose whichever is closer. Mice do the same thing. If you’re the mouse and put in the center of a straight maze with food at one end and food at the other end, you go to which ever goal is closer. People choose grocery stores, banks and gas stations this way. Assuming they are about equal value, you choose on the basis of convenience (immediacy of drive reduction).
In approach-avoidance, you’re at one end of a straight maze (no turns). At the other end is both food and electric shock. An experienced mouse runs toward the food but slows down as it gets closer to the food-shock combination. Conflict is wanting your food and avoiding the shock: two incompatible responses.
Let me put it in cognitive terms. This is completely un-Dollard & Miller) but it will help you remember it. You are in a straight maze. From where you are, the food looks pretty good, so you head toward it. But as you get closer to the target, you remember (having been here before) about the shock. The more you think about the shock, the slower you run toward the food. Thinking about the food is an “approach gradient.” The closer you get to a goal, the more exciting it is. Thinking about the shock is an “avoidance gradient.” The closer you get to something you dread, the less exciting it is.
We love approach gradients. Anticipating going to a big event, looking forward to your birthday, or thinking ahead to getting a new car. Remember how excited you were to get your driver’s license? The closer you got to that day, the more excited you were. People underestimate the value of an approach gradient. Children in particular love anticipation. If you want to get your kids excited, don’t surprise them by taking them to Disneyland. About two weeks before the day, tell them you’re going to take them. And then every day, when they ask “Is this the day?,” say “No, but it will be soon.” When the day actually comes, they’ll be super-excited.
The same is true of adults. Adults don’t really like surprise parties either. Surprise parties are the most fun for those putting on the surprise. Most recipients look confused and startled, more than happy and pleased. We look forward to vacation. We look forward to holidays. We love to anticipate events. It’s the approach gradient in us. We also dread visiting relatives, attending meetings and going to the dentist. And the closer we get to negative events, the worse they look. It’s our built-in avoidance gradient.
One of Dollard and Miller’s key principles is that avoidance gradients are steeped than approach gradients. When we’re happy to have a date but sorry we got stuck with a loser, we’re in an approach-avoidance conflict. Blind dates don’t sound too bad from a distance. But the closer we get to the day of the event, the worse it seems. “Why did I ever agree to do this.”
When you back out of something you previously agreed to, your avoidance gradient became steeper than the approach gradient. As long as the avoidance is small (something irritating but not overwhelming) when compared to the approach, we perform the behavior. But when avoidance exceeds approach, we opt out of the situation. We take back the clothes we can’t afford. We try to get out of the car lease we signed the day before. When avoidance is greater than approach, we call off the wedding.
Avoidance-avoidance conflicts occur when we’re stuck between two things we don’t want. Given a choice between a toothache and the dentist stabbing you with a needle, we try to do neither. When given the chose between two political candidates, neither of whom you like, many people choose not to vote. They hover in indecision and opt for “none of the above.” In such conflicts, we tend to choose whichever is the least objectionable. Or we avoid whichever is closer.
Conflicts don’t have to be simple, either. In “double approach-avoidance” conflicts, it is the choice between two ends of the maze, each with its own approach-avoidance conflict. For the mouse, this would be food & shock at one end of the maze, and food & shock at the other end too. The mouse begins running toward one end but slows at it gets closer. It then turns and runs toward the other end, where it slows down, turns and runs back. The mouse spends most of its time running back and forth in the maze, never getting shocked but never reducing its hunger. The human version is similar. It is the choice between going home for Thanksgiving to be with your dysfunctional family and staying where you but being lonely.
Dollard & Miller include unconscious behavior in their model. Although behaviorists typically believe that behavior is automatic, they tend to view the head as being empty. The mind either doesn’t do anything but produce behaviors or it is a black box of unknown processes. In contrast, Dollard & Miller make unconscious behavior a central theme of their model. According to their view, behaviors are unconscious because when we’re unaware of the cues that trigger the drive, or unaware of the drive itself. Unconscious simply means unlabeled. When a cognitive label is present, the behavior, drive or emotion is no longer unconscious.
Labeling plays an important part in making us less neurotic. According to Dollard & Miller, neurosis is better understood as the stupidity-misery syndrome. When we are neurotic, we are experiencing a strong, unconscious (unlabeled) emotional conflict. The result of our experience is that we can’t discriminate effectively and make bad decisions. That is, when we are unaware of our conflict (stupid), we make bad decisions that make us miserable. Our misery is a result of not labeling our conflicts. The solution is to discover the proper label. Like the fairy tale of Rumpelstiltskin (who put a curse on a family which could only be lifted if they guessed his name), we have to guess what we’re feeling and label it. Once labeled, the curse is broken: we are no longer stupid (unaware) and won’t make ourselves miserable.
Rotter, Julian
Like other social learning theorists, Julian Rotter (1916-present) combines behaviorism plus cognition. What we know about the environment impacts what we do. And the best way to predict what people will do is to understand how they think.
Rotter maintains that the likelihood of a particular behavior is influenced by our cognition of rewards. Skinner was essentially right: we do respond to rewards but his system was to simple. We don’t turn off our brains when we’re rewarded. We use our brain power to make calculations about ourselves, the environment and rewards themselves.
There are three component parts to Rotter’s system. First, as Skinner would predict, we look at the size of the reward. We prefer big rewards over small rewards. Given a choice, we prefer to make more versus less money, bigger versus smaller houses, and faster versus slower cars. If we’re going to receive compliments, we want lots of people to give them. If we’re going to lose weight, we want everyone to notice. In general, we want the biggest reward we can get.
Second, there is the expectancy of the reward. We like rewards but we really like rewards we know we can get. We’ll turn down a bigger reward if a smaller reward is closer, faster or more of a sure thing. We do risk assessment and determine the likelihood of a receiving a reward. The reason we choose immediacy of rewards is they have a higher expectancy of coming true.
Rotter’s main point is that we combine our calculations of expectancy (likelihood) and reinforcement value (reward size). I don’t usually play the lottery but I know the likelihood of winning is very low. I don’t expect to win. But when the jackpot is over $20 million, I’ll buy a ticket…just one. I still don’t expect to win but I figure it’s worth the shot for a large prize. We will take a risk on a situation with low expectation if the reward it high. Similarly, we tend to settle for less reward if expectation is high. This explains why people stay is safe low-paying jobs, and why people stay in predictable unhappy marriages.
According to this model, if you believe your chances of getting a job paying $200,000 a year is 20%, the job is worth $40,000 to you. Consequently, you might well choose to apply for $50,000 jobs that you are 90% sure you can get. In your mental calculations, you’d be $5,000 ahead by going for the lower paying job.
Our experience isn’t that we’re making mathematical calculations. But we are aware of wrestling with security versus reward. We realize that there are many more jobs available at mid-management than upper-management. More available jobs means more likely. We know that actors who set out to be multi-billionaires probably won’t reach that goal. A few megastars make huge salaries but most actors make very little money. Rotter is suggesting that we are more rational than we realize. We use value and expectancy to make major life decisions.
We don’t behave randomly. We not only responders to Pavlovian stimuli or solely influenced by rewards. As our environment changes, we use rules to determine what to do. Even in novel situations, we apply our knowledge of the past to the current conditions. Rotter suggest that we have two basic, relatively stable rules: (a) the bigger the reward the better, and (b) safer is better.
Rotter’s approach is optimistic, goal-driven, and adaptive (interactive with the environment). You will notice that the likelihood of a particular behavior in a specific situation is based on subjective probabilities. We calculate what we think the odds of an event occurring. We don’t know what will happen; we make subjective guesses. Our inconsistencies in action show that from time to time we interpret the same situation differently.
Rotter expanded his concept of expectancy to a broader, more generalized expectation: locus on control. Although we calculate the likelihood of specific events, our general tendencies of calculation can be described. Life’s situations aren’t independent. We actually use a relatively stable set of potentials for responding to situations. Overall, we can be described as primarily relying on internal or external expectations (locus of control).
Our locus of control is our view of the contingency between what we do and what we get. If we have an internal locus of control, we tend to believe that what we do helps us get rewards. An “internal” tends to be more political, proactive, and optimistic. They assume they will be successful because expect their behavior to produce rewards. Consequently, internals try to gather more information, change their environment, and influence others. They are also more likely to be anxious. Since they believe what they do matters, they take responsibility for everything…whether it’s their fault or not.
In contrast, “externals” tend to conform, and don’t expect much of life. They believe life is a matter of chance, fate or luck. Externals tend not to take responsibility for anything. Since they believe that what they do doesn’t impact what they get, there is little reason to work too hard at changing the inevitable. They are more susceptible to what Selligman called “learned helplessness.”
More
For more, check out Learning and Social Psychology.
Photo by michael weir on Unsplash