Thursday, February 14, 2013

Reinforcement Schedules

Once you understand reinforcement and punishment and the different ways that they work, now it comes to WHEN you reinforce. This isn't quite as critical as knowing why training works the way it does in the first place, so I'll keep it short and sweet for those that are interested in the various types of reinforcement schedules and the results they produce.

The first main type of reinforcement schedule is continuous reinforcement. This is the type of schedule most commonly used in clicker training. Basically, this means that the behavior is reinforced each time it's given. In other words, if I'm teaching my horse to pick up his feet, I click and treat each time the horse picks his foot up. This is best used during the initial stages of learning as it create a strong association between the behavior and reinforcement. However, once the behavior is firmly associated with the reinforcement (and your horse knows what you expect from him), you can do two things - ask for more and/or switch to a partial reinforcement schedule. Personally, I do both. I'll explain the "asking for more" in the next post, but the basic idea is that the horse has to take the behavior one step further before getting a reinforcement (now he has to hold his foot up longer... and longer....) and you're actually asking for your horse to learn something new (i.e. Holding a foot rather than just picking it up). However, for this post, I'm going to explain switching to a partial reinforcement schedule in order to reinforce the SAME behavior that was already taught. This prevents what we call "extinction" - in other words, the behavior stopping since we're not reinforcing it anymore (for those of you who think that a clicker trained horse will ALWAYS need a clicker, listen up!).

Partial reinforcement: this means that the horse doesn't get a reinforcement every time it does what you're asking. Instead, it only gets a reinforcement part of the time. This way, you can ask for the behavior more often without a reinforcement (i.e. You can ask for behavior without a clicker) and the horse will still respond even though it doesn't get a treat every time.

I'm only going to worry about the things that are most important here - if you want to know more, Google "reinforcement schedules".

Here are the key terms you need to know to understand partial reinforcement:
Fixed = when you reinforce doesn't change.
Variable = it's unpredictable when you'll reinforce behavior
Ratio = when you reinforce depends on the number of times the behavior is performed
Interval = when you reinforce depends on the amount of time that has passed (I'm not going to discuss this one here, though).

There are four types of partial reinforcement, and different schedules lead to different results. I've included a graph below that illustrates these. I'm only going to explain fixed ratio and variable ratio here, though, because they directly apply to clicker training.

Fixed ratio means that you reinforce after a specific number of correct behaviors. Generally, this leads to a steady rate of responses in order to earn the reward with only a brief pause after getting the reward. For example, every time a kid completes three math problems, he gets a piece of candy, so he does three math problems, receives his candy, eats it and pauses, then decides he wants another one so gets back to work again. The weakness here is that, if the horse doesn't receive a treat after the expected time, the behavior can break down and the horse stops responding.

Variable ratio solves this problem. With variable ratio reinforcement, the horse never knows when it's going to get a reward - it can perform the desired behavior any number of times and may or may not receive the reinforcer. This is the most powerful reinforcement schedule as it produces a high and steady rate of the desired behavior. Don't believe me? This is how gambling addiction works: You never now when you're going to win, even without any sort of reward (and even lose money!), people keep on gambling and gambling because every now and then they win $5 back, $2 back, $10 back, etc., and they think they just might hit the jackpot with the next round.

This applies to clicker training when teaching the horse to respond the way you want it to without the clicker. I use this to reinforce behaviors that my horse knows and that I expect, but want to reward every now and then. When I'm "phasing out" the clicker, I'll ask the horse to do what I want and only click and treat every now and then. Thus, he learns that he can respond even without the clicker. Eventually, I won't use the clicker at all when asking for this behavior - this behavior is expected and the horse knows what he's supposed to be doing (thus I avoid the horse trying something else or getting confused because he didn't get a reinforcing click and treat). For things my horse knows REALLY well, I do click and treat every now and then just to say "good boy" in a way that's meaningful for him. I could probably be just fine without it, but I like to reinforce these behaviors every once in a while (i.e. Maybe once in a week or even a month) just because. Since he never knows when what he's doing might earn him a treat, he's always listening even when he doesn't get one!

No comments:

Post a Comment