Positive, negative; punishment, reinforcement. What does it all mean?

snowbunny · 4 July 2018

It is very easy to get a bit discombobulated by the definitions of punishment, reinforcement and whether something is considered positive or negative. We have been programmed to believe that "punishment" means something that is harsh or cruel, always bad. And that reinforcement is always good. As someone recently asked, "isn't all punishment negative?".

In actual fact, when talking about learning theory (which we are when training our dogs), there are very precise definitions of these four words:

Negative: something is being removed (think of it as a minus sign)
Positive: something is being added (think of it as a plus sign)
Reinforcement: something that makes a behaviour more likely to happen again in future
Punishment: something that makes a behaviour less likely to happen again in future

You can see from this that there is no ambiguity in the definitions themselves, and that there is no reference to whether the things being added or taken away are good or bad. Similarly, there is no judgement in the words "reinforcement" or "punishment"; they are merely defined by the result they have on the frequency of the preceding behaviour.

These four words can be combined into what we call "the four quadrants of operant learning".

But what does this mean in the real world? Let's think of some examples. Before we get into that, though, we have to understand that it is never the handler who gets to decide if something is reinforcing or punishing. Remember that the definition of those terms is simply defined by its outcome. The learning happens through cause and effect: a behaviour presented by the dog results in a consequence. If that behaviour diminishes over time, the consequence was, by definition, punishment. If the behaviour increases over time, it was reinforcement. We can never tell, in that moment, whether we are reinforcing, punishing or neither, because the only way of determining that is by observing what happens in the future. So what we actually do when training is make a hypothesis or prediction that we can then test. Some of them are obvious, others less so.

On to the examples:

----------------------------
Positive Reinforcement: Something is added, which makes the behaviour more likely to be repeated.

Behaviour: Dog walks on a loose lead
Added: Food
Prediction: Behaviour of walking on a loose lead will increase over time

----------------------------
Positive Punishment: Something is added, which makes the behaviour less likely to be repeated.

Behaviour: Dog pulls on lead
Added: Jerk of lead
Prediction: Behaviour of pulling on lead will decrease over time

----------------------------
Negative Reinforcement: Something is removed, which makes the behaviour more likely to be repeated

This is a little more complicated to describe. We are imagining a scenario where a dog who is pulling whilst wearing a prong collar.

Behaviour: Dog stops pulling
Removed: Pain of collar biting
Prediction: Behaviour of not pulling will increase over time

Negative reinforcement is also in play when a dog is working to avoid something he doesn't like.

----------------------------
Negative Punishment: Something is removed, which makes the behaviour less likely to be repeated

Behaviour: Dog pulls on lead
Removed: Access to forwards motion (ie by stopping)
Prediction: Behaviour of pulling will decrease over time

You can see from the above that these are four different ways of essentially achieving the same result; a dog that walks on a loose leash, all by using different quadrants.

This video from eileenanddogs shows some more examples:

Did you notice how she used a spray of water to both reinforce and to punish? This is very important, as it demonstrates that it is the learner who defined what is reinforcing and what is punishing, not the handler. Some dogs love to be sprayed, others hate it. Some puppies will be upset by someone shouting at them, others will revel in the attention.

Now, obviously not all the quadrants are created equally. Some are more acceptable to us than others. But it is important to note that that is simply an ethical point, not something based on the efficacy of the individual quadrants. It is wrong to say "positive punishment doesn't work" because, here's the rub: it does.

This post simply defines the four quadrants and tries to explain what they mean. It is not the place to discuss the ethics of each one; I'll leave that for another discussion. Just suffice to say that the ethos of this forum is very much about finding the least aversive way to teach the behaviours we want, so I am by no means promoting the use of prong collars!

To recap:
Positive does not mean "good" and negative does not mean "bad". They mean "adding" and "removing".
Punishment and reinforcement are only defined by their outcomes, not the actions.
Punishment is anything that reduces the likelihood of the preceding behaviour happening in the future.
Reinforcement is anything that increases the likelihood of the preceding behaviour happening in the future.
It is the learner who decides what is punishing and reinforcing, not the handler.
Punishment is not synonymous with cruelty.
Reinforcement is not always kind.

Any questions?

Positive, negative; punishment, reinforcement. What does it all mean?

snowbunny