Thursday, March 8, 2007

Enthusiasm and Intermittent Reinforcement

When I began trying to learn how to use intermittent reinforcement for Lumi's training recently, my objective was to prepare Lumi better for freestyle trials, in which she needs to go about three minutes without food, toys, or play breaks. That is, she needs to go without any primary positive reinforcement (+R), unless performing the freestyle routine itself is an intrinsic reinforcer to some extent.

PARADIGM SHIFT
My reason for beginning to experiment with intermittent reinforcement was based loosely on a suggestion made to me on DogTrek. I speculated that the dog's experience with intermittent reinforcement is to some extent a learning paradigm, like clicker training. According to that concept, a dog accustomed to intermittent reinforcement for some behaviors would also be more comfortable with intermittent reinforcement for others.

So I chose a few behaviors — on doctor's orders, we weren't working on freestyle at the time — and began gradually putting them on intermittent reinforcement, specifically, a variable schedule of reinforcement (VSR).

RATIO STRAIN
This turned out to be more difficult with Lumi than I expected it to be. After only a small number of unreinforced reps, Lumi would seem to become confused by cues she had previously had reliable responses to. She might freeze and just look at me, or she might offer some completely different behavior.

One of my correspondents on DogTrek stated that this showed that Lumi did not have those behaviors on "stimulus control", so I researched the term a bit. What I learned is that stimulus control isn't an absolute, it's the measurable tendency of an organism to respond to a stimulus. According to that understanding, those behaviors and a hundred others that Lumi offers in response to cues are on stimulus control. They just aren't proofed for extinction when not being reinforced as often as needed.

Which makes Lumi like every other trained dog. That's because every trained dog has behaviors that are not intrinsically reinforcing, behaviors that she does instead because she has been conditioned to do them by positive or negative reinforcement. Those behaviors vary in their resistance to extinction, but they'll all extinguish eventually if not reinforced for a large enough number of reps. No one would suggest that those behaviors are not on stimulus control simply because they won't last forever without reinforcement. The same applies to Lumi's behaviors.

Instead, it was simply a case of ratio strain, exceeding the dog's capacity for performing without reinforcement. According to Pam Reid in Ex-Celerated Learning, dogs vary by individual in how long a behavior will persist without reinforcement, and I'm guessing some of it has to do with how much prior experience the dog has with ratios as well. Lumi has little experience with VSRs, and she also may have less intrinsic tolerance for them anyway. Pam speaks of a BC who might go 30 reps for the chance to chase a tennis ball, while some Saluki might not go 30 reps no matter what reward is available. I'd guess Lumi is somewhere in the middle of that spectrum.

UNEXPECTED RESULT
While putting a few of Lumi's behaviors on intermittent reinforcement proved to be more difficult than I expected, I had another surprise in store as well. Lumi's enthusiasm for the behaviors, as long as we didn't hit ratio strain, seemed to go up during the training, not down. Since motivation affects the amplitude of a response, it seems that intermittent reinforcement was increasing Lumi's motivation. I had heard of this phenomenon, but I guess it was counter-intuitive, because I still didn't expect it.

I've since discovered that I'm not the only one. Other trainers, too, make no connection between intermittent reinforcement and motivation. This seems especially true of those who for one reason or another have little use for VSRs in their training.

ON FURTHER REFLECTION
In thinking about it, I realized that it's actually inevitable that intermittent reinforcement would result in more vigorous, enthusiastic responses. The reason can be given in two words: "extinction burst".

An extinction burst is the phenomenon that sometimes occurs when an organism finds that a previously reinforced behavior is no longer being reinforced. Eventually, the behavior will extinguish (stop occuring), but first, the subject is likely to attempt more vigorous versions of the behavior in hopes that those versions will obtain the reinforcement that the less energetic version did not earn.

To the subject, intermittent reinforcement looks exactly like extinction until the next reinforcement occurs. Looked at in that light, it seems only natural to expect higher motivation for behaviors being trained on an intermittent schedule.

SLIPPING AWAY
That would seem to make intermittent reinforcement a pretty attractive training tool, and truth be told, I am attracted to it for that reason. But just as an extinction burst can quickly slip into extinction, a VSR can quickly slip into extinction, too. That's ratio strain, what I was seeing with Lumi.

In addition, just as an extinction burst can result in other kinds of behavior variability besides energy level, I've found it difficult to apply intermittent reinforcement without losing some precision in some of Lumi's behaviors.

But I think I've found one way to take advantage of this motivational boost without those disadvantages, and that is by means of non-contingent variable-value reinforcement.

DISPENSING TREATS
When you read about lab experiments, it seems as though behavioral scientists generally dispense a single-value reinforcement after each correct response, sometimes using automatic, mechanical dispensing equipment. When dog trainers are training behaviors, they often try to simulate that equipment in various ways: maintaining a neutral composure, giving the treats with precise timing, and dispensing uniform amounts.

In addition, many trainers also sometimes give what they call "jackpots". By this, they do not mean the random payoffs that occur in casino gambling, but extra rewards when the dog seems to show a breakthrough in understanding or higher than usual enthusiasm in her response.

That is not the kind of variable reinforcement I'm talking about for increasing motivation. The kind I'm talking about is actually more like the casino terminology, in that it is random and is not contingent on the quality of the correct response

Let's say the dog gives five correct responses in a row. Instead of giving an equal reinforcement for each response, say a single treat 7 units in weight, you might give the dog several treats each weighing 2 units. But the number of those treats would vary randomly from one rep to the next: 3 small treats for the first rep, then 1, then 4, then 7, then 3 again, for example. Both dogs received about the same number quantity of food, but the second dog also experienced the element of surprise from one rep to the next.

DOES IT WORK?
From my experience, many trainers have never tried that approach to giving reinforcement and are convinced that it would not have any effect on the dog's acquisition of the skill being trained. For them, trying to understand why it would work is beside the point because they are convinced it would not.

And for some dogs in some circumstances, I'm sure they're right. If the dog is particularly hungry for example, as the typical laboratory subject is, then variations in motivation, if they exist at all, may be too slight to observe. Also, motivation isn't the same as speed of acquisition. Two dogs may learn the same skill but with very different attitudes toward the training process and the skill itself, and no data may be collected to show those differences. It's comfortable to assume that if a scientist says he has not seen improved results with a particular approach, it would a waste of time for you to try it yourself on your own dog.

Nonetheless, at the recommendation of my friend Lee Baragona, a trainer with national championships in multiple sports with multiple dogs, I began experimenting with randomly varying numbers of treats for sequences of correct responses as described above. The result was immediate and striking: a distinct boost in Lumi's level of excitement for the game. I could even feel it affecting me as the treat giver. For example, I might think, "Here's a smaller amount than last time, Lumi, but you know what that means, don't you? Maybe more next time!"

VARIATIONS ON A THEME
Once you have the experience that that way of giving reinforcement does affect the dog's attitude, then you find yourself wondering about an explanation. For me, the explanation is that intermittent reinforcement is really just a special case of random non-contingent variations in reinforcement.

That is, in a VSR, the number of treats is still varying randomly from one correct rep to another. In a VSR, the number of treats is either 0 or 1, whereas in the more general case that I'm describing, the number of treats includes other possibilities. You might vary among counts of 1 through 7, or you might also respond to some correct responses with no reward. You would only do that, I guess, if the dog has enough reinforcement history for the correct response that she won't take zero to mean that she had performed the behavior incorrectly.

FOLLOWING FOOTSTEPS
In all of the above, I may well be following in the footsteps of researchers and authors who have explored learning and motivation in the past. Some may have reached the same conclusions, some the opposite. Nonetheless, it's interesting to experiment and try to understand what I'm seeing on my own.

It's one of the things that makes dog training so rewarding.

No comments: