Reward vs reinforcement

snowbunny · 9 October 2018

I was pondering on this on my walk this evening. We often use the words "reward" and "reinforcement" interchangeably, but that's not necessarily accurate. Case in point: I've mentioned elsewhere that Shadow is very scent-driven at the moment, and is ranging farther than I'd like. He gets a bit like this every autumn as the bunnies, deer, and other beasties tend to be out more at our walking time. So I've gone back to basics with him and have been heavily rewarding his check-ins like I did when he was a young'un. But here's the rub: it's not had any noticeable affect on his behaviour. This means, as much as I've been rewarding very generously, I haven't been reinforcing his checking in.

Do I hear tumbleweed?

Let's remember how we define reinforcement: it is a consequence of a behaviour that makes that increases the likelihood of that behaviour happening in the future. If the behaviour isn't increasing, then there's been no reinforcement, however many sausages I've given to the dog. I'm not saying Shadow doesn't like the sausage, he definitely does. But it's not enough to cause the behaviour of checking in to increase.

So I need to take a step back and come up with a plan that doesn't involve putting food into him for no return. What can I do to change that reward into a reinforcer? I have a few ideas

Lab_adore · 9 October 2018

Can't wait to hear them please @snowbunny! :idea:

Selina27 · 9 October 2018

snowbunny said:
What can I do to change that reward into a reinforcer? I have a few ideas

Exactly what I am facing at the present time, many things have improved recently but here we have a step back ! She can be right by me, and then she is gone on a scent! I am racking my brains to make a plan , haven't come up with anything apart from using her hunting as a reward. But how to do that? :mmm:

Snowy · 10 October 2018

snowbunny said:
I was pondering on this on my walk this evening.

Gosh I'm envious of those who can multi-task

I don't manage to ponder anything whilst we are out on a walk. I am 100% focussed on the boy, where he is, what he's doing and remembering to give him a happy acknowledgement every time he gives me a "check-in look", plus always keeping an eye on who might be coming the other way. After 5Km walking with the boy, I am about as exhausted as after 25Km of normal walking. :whew:

We're also having more issues with Nelson roaming outside the preferred 30m zone. I also returned to heavy rewards for him coming to check-in physically. However I've since dropped the level of rewards, made it more random, sometimes nothing and sometimes lots. He is now much more keen to hang around near me. But I'm also on the edge of my seat to hear what extra ideas you have SB. :happyfeet:

snowbunny · 10 October 2018

Heh, if I didn’t multi-task, I’d never get everything done that needs to be. I’ve been known to brush my hair and my teeth at the same time

Lowering the rate of reward doesn’t cut it for Shadow. After all, his reinforcement is out in the environment. All those scents! If giving him lots of sausage doesn't reinforce him for coming back, then giving him less sausage won't. Here's an interesting article on why variable reinforcement schedules don't work better than a fixed reinforcement schedule outside of a laboratory: Doesn’t Intermittent Reinforcement Create a Stronger Behavior? - eileenanddogs

So, it's about finding out what Shadow finds reinforcing. Food isn't cutting it. He'll eat it with pleasure, but if it's not increasing the behaviour of staying close and checking in, then it's not reinforcing. Don't get me wrong, Shadow does check in - a lot. It's a habit that we've built in since he was a tiny puppy, and what you do regularly you tend to do. However, it's nothing like as strong as the same behaviour in the girls and with the reinforcement in the outside world being so strong at the moment, every time he disconnects and roams that little bit farther than I want, the matching law is being tipped away from my favour.

That's why I need to tap into his real reinforcers. In this environment, the list will be pretty short:
Following scent
Chasing balls
Doing stuff

The plan is:
1. Go back to less exciting environments - working in our fields rather than in the woods.
2. When we do go out and about, keep him on a lead for much of the time so he can't rehearse. I don't "do" long lines because they're not suitable for this sort of terrain, where the trees are very dense and there's only a narrow footpath. If it were more open, I may choose to use one, although I find them exhausting! While he is on lead, reinforce heavily with games.
3. Make being in the invisible circle brilliant fun. That is, continue to build that reinforcement history for staying with me in the first place. Shadow loves "doing stuff", so lots of little games.
4. It's very telling that he tends to stick close when we're one on one, but when we're walking all together, he wanders more. That tells me that he is finding it punishing to wait his turn, so I need to heavily reinforce him for doing so. Make the waits shorter for now, and reinforce with brilliant stuff from my list above.
5. Get him to hunt for stuff for me. Lay some scent trails and have him follow them. Use those as reinforcers for staying close. There are two reinforcers at play there - following scent and "doing stuff".
6. Reward each and every check-in (even an ear flick) with the throw of a ball. I've moved away from using balls because of the way he catches them, so I'll see if I can manage this in the short term. If not, I'll use the Chuckit on a strap, which isn't as high value as the Chuckit alone, but the proof will be in the pudding as far as whether it's reinforcement enough. I'm continuing to build value in the frisbee, but we're not quite there yet. I won't be able to throw the ball in the woods (it really is very dense!) but I can do this in our fields and I can try having him hunt for the ball in the woods to see if that works there.
7. Once he has retrieved the ball, keep him with me with "doing stuff" for a very short time and then release him to the environment. His reward for staying with me for a little bit will be getting to go back to where all the marvellous scents are, which will be very reinforcing. I have to be careful I keep the duration of staying with me short for now so I'm continuously building his reinforcement history rather than relying on his previous training for him staying with me until released, which could end up frustrating and punishing. So, give me the ball - either chuck it again, or have a game of tug or a quick run through a few behaviours, or even a bit of new training for less than a minute, and then release.

There's probably more, but he's telling me it's time to go out

Boogie · 10 October 2018

I think in human terms that rewards tend to be more tangible than reinforcers.

For me involve cash, chocolate, gin and wine

But I reinforce behaviours by turning them into habits. I go to the gym three times a week, I don’t especially like it but once it’s a habit I find it hard not to do it. I do all my chores as pure habit (I always do this after this etc) even ‘tho I don’t want to do any of them.

Does this translate to dogs?

snowbunny · 10 October 2018

A reinforcement occurs due to the consequence of a behaviour - in any animal. So a habit isn't a reinforcer as that isn't a consequence of the behaviour itself. We think in terms of A-B-C where A = Antecedent (the set up of the environment, including cues, which cause the behaviour to occur), B = Behaviour, C = Consequence. When the C is a reinforcer, the B will become more likely to happen. When the C is a punisher, the B will become less likely to happen.

In your example, you go to the gym because you know it's good for you. That is your reinforcement, and it's something you can rationalise. If you find it disagreeable while you're there, then that is likely punishing, so on its own (without the knowledge that what you're doing is for your benefit), it would reduce the likelihood of you going - but if you do continue going, then your reinforcer must be stronger than your punisher. Of course, dogs don't rationalise in the same way, so we can't say "you should do this because it's good for you, even though you don't enjoy it". There has to be a more tangible reinforcer.

That said, habits (as much as they don't reinforce in themselves) do make a behaviour more likely to occur. A habit is created because of reinforcement history, and that history makes it persist after the reinforcement is no longer available. For something to be truly habitual, it is what is called "overtrained".

Habits can be both a problem and a delight when it comes to training, either other animals or oneself. Habits caused by overtraining can be very difficult to change as they are inflexible and resistant to the consequence. That's great when it's a behaviour we like, more of a problem when it's something we don't like. This is an interesting piece of research on it: Habits, action sequences, and reinforcement learning

So, during goal-based learning (operant conditioning), the stronger we make the reinforcement history for a particular behaviour, the stronger the habit and the more likely that behaviour will continue.
I would suggest that, if you learned that going to the gym had no benefit to your health, no social benefits etc, you would stop going almost immediately? If that is the case, then it's not truly a habit; you're working within the goal-based model - seeking the reinforcement, rather than the antecedent driving the behaviour.

snowbunny · 10 October 2018

Talking about a reward being tangible, that's often the case, but a reward can actually end up being aversive.
I heard an example recently, someone talking about it in terms of a Christmas bonus. If you got a nice addition to your pay packet at the end of the year, this would be a nice reward. Not really reinforcement, as you don't go to work every day of the year in order to earn your Christmas bonus; your pay packet is your reinforcer, plus the social aspects, the job itself if you enjoy that etc etc etc. No, the Christmas bonus isn't a reinforcer, it's a reward. But if you are expecting a nice cash bonus and you end up with a £10 Woolworths gift card, then you'd probably be pretty pissed off! The reward is still that, a reward (because that's defined by the person awarding it), but to the recipient, it's aversive.

Snowy · 10 October 2018

This morning I may have unexpectedly found a reward/reinforcer (I'm not going to get bogged down analysing which it was). We went out for a couple of hours and I took a shotgun along. I split the walk up between free sniffing and sessions of hunting (in the bits of the forest where I know there are Black Grouse).

Now for the hunting parts I got him to walk behind me - this is something we have trained since early summer, as it's really useful for stairs, duckboards throagh swamps on nature trails and narrow paths through young forest/bushes.

As we switched from "free sniffing with check-ins" to the hunting part, he really seemed to enjoy returning to me and then "getting to do a task". He was moving at my same slow pace, peering around my legs and scrutinising each tree in turn. It's as if he was trying to emulate my actions.

And the best thing is, after each session of free sniffing, it felt like he was happy to come back because he "got to do the hunting bit again".

snowbunny · 10 October 2018

Snowy said:
(I'm not going to get bogged down analysing which it was)

Hehe, you're in the wrong section of the forum then

Does it make the preceding behaviour more likely? If so, it's a reinforcer. A reward is simply a construct of the giver rather than of any necessary benefit to the receiver.

Snowy · 10 October 2018

"Behaviour Geeks United "

Aghhhhh! I hadn't noticed. I've clearly taken a wrong turn at the "view new posts" link and ended up somewhere where I don't belong. :run:

snowbunny · 10 October 2018

Joy · 10 October 2018

snowbunny said:
This is an interesting piece of research on it: Habits, action sequences, and reinforcement learning

I found parts of that article hard-going but very interested in its general theme. It talks about habitual behaviour being a reflex / automatic/not a conscious choice, but that habits are acquired through extended reinforcement of behaviour. I was pondering classical and operant conditioning - do you think the article implies that if a behaviour is conditioned operantly for long enough that it will become a conditioned emotional response (i.e. classically conditioned) ?

We often think of needing to train recall through classical conditioning - we want a reflex response not a decision made each time about whether to come or not - but (apart from the very early stages when the dog doesn't have to do anything but is just fed for being next to us) we do in fact use operant conditioning - reinforcing when the dog responds to the recall cue. Is it just the fact that we constrain the environment so that the dog always responds and therefore always is reinforced for a long period of time, that makes it a reflex response?

I suppose there are occasions when a person wants a dog to make decisions - sniffer work, hunting - but in most things I think we probably want to develop behaviours which are habits. I wonder if the most successful dog trainers realise the need to reinforce again and again and again. It tends to be people new to dog owning who are keen to drop the treats. The trainer that I've had a few obedience sessions with told me that I wasn't 'rewarding' (i.e. reinforcing) heelwork frequently enough, which really took me by surprise. She didn't really give me a rationale, but now I've read this article I feel much more inclined to try 'overtraining', increasing reinforcement for what I consider known behaviours to try to develop more good habits in Molly.

Lots to think about...

Reward vs reinforcement

snowbunny

Lab_adore

Moderator

Selina27

Snowy

snowbunny

Boogie

Moderator

snowbunny

snowbunny

Snowy

snowbunny

Snowy

snowbunny

Joy