The matching law

I've been thinking about this a bit recently, and wanted to bring it up here for discussion.

The matching law states that if there is a choice between two behaviours, the relative proportion of responses matches the relative proportion of reinforcers that are earned by those behaviours.

In practical terms, that means that, say there are two pots in front of the dog and the dog gets rewarded for touching them. If the dog earns the same piece of cheese for touching either pot, then the dog will touch each pot 50% of the time (everything else being equal).
If the dog gets reinforced for hitting the pot on the left only half the time that he gets reinforced for hitting the pot on the right, then he will be half as likely to touch the pot on the left. But he will still touch it.
In that example, I'm only talking about quantity of the rewards, but the same can be said for desirability of the reinforcer: if behaviour X is reinforced with a reinforcer that is twice as desirable than that used for behaviour Y, then behaviour X will occur twice as often as behaviour Y.

Let's consider how this affects something like walking at heel. The environmental rewards for pulling are obvious and large: getting to move forwards, sniffing a lamppost, whatever it is. Compare that reinforcer to the piece of cheese in your treat pouch. What is the ratio of desirability of the two reinforcers? I think it's pretty obvious that the environmental reinforcers trump our cheese, and by a large amount. So, what do we have to do to even up the balance? Easy! Reinforce more with our pitiful cheese.
It basically boils down to some simple maths:

(Value of reinforcer A x Frequency of reinforcement A) : (Value of reinforcer B x Frequency of reinforcement B)

That is the ratio with which Fido will choose behaviour A over behaviour B. So even though the value of reinforcer B is super high, if we restrict the frequency of the dog being able to get that reinforcement, then that side of the ratio tends to zero. Meanwhile, we work with the highest reinforcer we can on "our" side of the ratio, and make the rate of reinforcement as high as possible, then we are massively tipping the balance of the ratio in our favour.

Dogs do what pays best. If your dog is disengaging, then you have to find a way to up your rate of reinforcement.
 

Boogie

Moderator
Location
Manchester UK
I’m not able to use even cheese. No human food, just kibble and fish treats.

So I reward with a stream of food until the correct position becomes the ‘default’. After that I reward every 30 paces or so, varying it as much as possible. In the early days I used the ‘fixed tether’ method and this has worked brilliantly. But now, if the pup pulls I don’t move forward - so they never self reward for pulling. To do this usually I do the ‘continental heel’ (moving backwards with a treat so the dog turns tail and comes back to the right place). Sometimes I do an ‘about turn’ and occasionally I just stop and move on when the lead goes slack. If a big distraction is approaching I ask for a ‘sit’ and give a stream of treats. I’ve taught ‘leave it’ for distractions on the ground.

The first walks, until at least 16 weeks, involve walking well away from any interesting smells, so sniffing on lead never becomes an option. Once we are walking in places which are dog busy the pups already have a good response to ‘leave it’. It then becomes simple habit to always walk with their heads up.

Spencer, at 19 weeks very rarely pulls now. Today I took them both for a free run and he didn’t pull to the off lead place. Keir didn’t pull all the time I had him, but he’s an easy going Golden Retriever.

I don’t find I can do it with no ‘punishment’ at all but I work hard keep to 95% positive and it doesn’t seem to affect their willingness to work at all :)

.
 

Beanwood

Administrator
Oh God! Maths was always my weakness! %) But I think I understand...but not entirely sure I agree, well for sure some of the time.
I have found reinforcement history a HUGE reinforcer, maybe though not for all dogs, I guess we are looking at other variables here such as dopamine working in our favour.

The other thing for me is understanding disengagement....
 
  • Like
Reactions: HAH

Joy

Location
East Sussex
If your dog is disengaging, then you have to find a way to up your rate of reinforcement.
Yes I agree - but also perhaps change the activity to make it intrinsically more rewarding. In the case of lead walking, perhaps running, about-turns, sudden stops etc. (Whiskey, the dog I walk, is a huge sniffer and Ive found I need food and variation in the things I do when he's on lead to retain his attention.)
 
but not entirely sure I agree
Well the matching law is the matching law. Not agreeing is akin to saying you don't agree with gravity. Of course, there may be tweaks that can be made to the theory, and there will be holes in my understanding (and explanation), for sure, but the law is just that, a psychological law. It has been modified since its first introduction, to include bias in the equation (my mathematical equation is embarrassingly simplistic and just there for a bit of a clue). For example, going back to the pot example, most animals are "handed" so there would be a bias for one over the other simply because of that, making it another factor in the equation.

The picture can never be as straightforward as the one I have painted, as we're in real-world environments, but it does still stand true; we just need to take apart the picture as much as we can to work out what is happening.

I guess the point is, if the dog is free to do something that will gain him reinforcement, then that behaviour will never be fully extinguished. What we have to do is make the conflicting behaviour that we do want as highly reinforced as possible to make the ratio as unbalanced in our favour as possible.

I'm going to be strict and say this isn't a conversation about teaching a dog to walk on a loose leash, that was simply an example, and we should stick to discussing the topic of the matching law :)

Here's where it can get interesting. Let's take something like working on a straight sit. By thinking about the matching law, we would say that we don't want to ever reward a crooked sit because by doing so, we are making that crooked sit more likely - we're paying in to the "wrong" side of the ratio. That makes complete sense, and you'll hear it argued for all over the place. But let's think about the consequence that could have. If the dog believes he has performed the behaviour and earned reward, yet that reward is withheld, is he more or less likely to perform the behaviour in the future? I'd argue that by withholding the reward from a dog who has tried to do what was asked, we're potentially going to punish a sensitive dog who doesn't fully understand the criteria, meaning that the behaviour of sitting as a whole becomes less likely. Or we dent the dog's confidence so they will do it, but a lot more slowly; we extinguish the spark - and those of us with sensitive dogs know that it's a really hard thing to reignite that once it's lost for a particular behaviour.

So what we need to do is set up our training environment really cleanly so that the dog is highly likely to make the correct decision and invest heavily in that. If we focus on rewarding attitude (that is, the dog is trying to do what you are asking, even if he's lacking precision), and make it incumbent upon ourselves to ensure the dog performs the actual behaviour we want to reinforce, then we can keep the confidence and energy levels where we need them, and continue paying into our side of the ratio.

At a practical level, this means that if the dog goes wrong, we still pay for it, but then we use that information to set up our training environment in such a way that we are stacking the deck in our favour in terms of getting the behaviour we do want.
 

Boogie

Moderator
Location
Manchester UK
But let's think about the consequence that could have. If the dog believes he has performed the behaviour and earned reward, yet that reward is withheld, is he more or less likely to perform the behaviour in the future?
Gamblers very rarely win the ‘reward’ but they keep playing nonetheless. Do dogs gamble? I’m not sure, but it is the reason we use jackpot rewards isn’t it? So that the dog is hoping ‘this will be the one’.

.
 
Gamblers very rarely win the ‘reward’ but they keep playing nonetheless. Do dogs gamble? I’m not sure, but it is the reason we use jackpot rewards isn’t it? So that the dog is hoping ‘this will be the one’.

.
This is an interesting read on the subject of variable reinforcement schedules. Doesn’t Variable Reinforcement Create a Stronger Behavior? - eileenanddogs
The TL;DR is that they don't work well in the real world, outside of laboratory conditions. Ooh look, and she talks about the matching law, too :D

The problems with using variable reinforcement schedules in the real world fall into three areas. A problem in any area can be enough to punch holes in the expected benefits. First, “resistance to extinction” is not the best measure of behavior when our goal is to get enthusiastic, consistent responses exactly when we want them. Second, even if resistance to extinction were our goal, it’s difficult for humans to perform the necessary randomized schedules. Third, in the real world, there are many alternative sources of reinforcement (we call them distractions). That means even when done correctly, the possible value of a variable reinforcement schedule can be demolished by something called the Matching Law.


And here's more of a specific answer to your question about the gambling effect. That was once widely touted, but is considered quite old hat these days.


When discussing variable reinforcement, people often present the idea of a slot machine. They talk about the excitement for the player of wondering if this is the time she will get a payout. They theorize about the excitement and persistence the parallel situation could invoke in their dogs.

But the slot machine model has a problem. Let’s say you are gambling on a slot machine that makes payouts up to $100. The most common payouts are $5 and $10. As you are gambling, someone regularly strolls through the casino, taps you on the shoulder, and hands you a $100 bill. Do you stop and accept the free money, or do you turn away and concentrate on your lever? Of course you take the money! Your machine will still be there after you pocket the cash. (Although you may decide to follow around the money guy instead!)

We are walking around in a world full of free $100 bills for our dogs. Being a slot machine putting out random $5’s and $10’s on a thin schedule is not good protection against them.



And for those of you who don't get through the whole article, this is her take home:


My goal is not to get the most behavior out of them for the cheapest payout on my part. My goal is for them to have fun, enriching lives and fit into our human world with the most ease possible. Being generous with all sorts of reinforcers works beautifully for agility and daily life.
 

Joy

Location
East Sussex
I read the article from Eileenanddogs a few days ago and I have to say it warmed my heart! I've never been convinced by the gambling argument, but have lacked the courage to say so as it seemed so widely accepted. I'd read the Skinner rats experiment years ago and thought that the fact that the rats were in a container with no other stimulus must affect the outcome. Perhaps having virtually no gambling instinct myself also influenced me (I have never bought a lottery ticket, scratchcard etc.)
 
I read the article from Eileenanddogs a few days ago
I saw it posted on the Positive Gundogs FB page a few days back but hadn't got around to reading it. I just read it as I knew it was about variable reinforcement schedules. I had no idea it talked about matching law, too. I've heard matching law mentioned a few times in various circles recently so had decided to do some thinking about it.

I think we have to be careful with the results of experiments in laboratories as they can't necessarily say how things happen in the real world; there are too many uncontrollable variables. It doesn't mean we should simply discount any laboratory research, far from it, but it also means we shouldn't simply grab hold of something because it's been proven to hold true inside of a Skinner box and extrapolate wildly from there. The variables that are outside of our control are some of the most powerful.
 
Just to be clear so that I am not confussed for a change :rolleyes:, are we saying that giving different value treats or games/play for different levels of training and NO gamble/jackpot treats is the correct way? x
 
Last edited:

HAH

Moderator
Location
Devon, UK
set up our training environment really cleanly
Ooh, this is all very interesting! The Matching Law makes intuitive sense to me - the main challenge in my mind being how do you control the training environment when in the real world (rather than e.g. a brilliantly fenced training field with no birds flying overhead or smells underfoot). And the answer I guess is a) control what you can, so that a behaviour is started in a simple, more easily-controlled environment like your kitchen; and b) perhaps grade the reward with level of precision required as well as level of difficulty/challenge. So in @snowbunny ’s example, a crooked sit gets a reward, but a straight sit gets a more valuable reward. Would this fit your thinking?
@Charlie - in my mind you’re spot on (although maybe correct/incorrect isn’t the way to think about it, rather what gets best results for you and your dog); gambling isn’t ‘bad’, it’s just less effective than matching your rewards to the challenge.
 
We're saying that offering great reinforcement for every behaviour is a good thing. Don't try to reduce your rate of reinforcement to a variable schedule (my dog gets "paid" one out of ten times). Don't even try to get it so that your dog is only working for kibble. Mix it up, by all means, but bin the idea that you "should" reduce your rate or value of reinforcement. The higher you keep that, the stronger the behaviours you will have, and the snappier.

Of course, we change what the reinforcement is, and giving a jackpot reward every now and again is a good idea, but not when that's combined with the so-called "gambling effect". That is when people believe that a dog will work for reinforcement one time out of ten because, isn't gambling exciting! In a laboratory it might work, but in the laboratory you're not fighting against all the other things that the dog can do to get reinforcement without involving us at all.

Now, people often need to get their dogs working for no obvious reinforcement - for example, in competition, but that can be done by getting smart about building up delayed reinforcement, or what we call "secondary reinforcers" - that is, something that the dog learns to like, rather than something like food that they like because they need it to survive. That might be something like a smile, saying "good boy", or a million other things. We can even build the value in the "work" itself so that the dog loves that. Lots of things can be reinforcing, so people see a dog who is doing a great job, loving it and getting no food and they think, "I shouldn't need treats!".
 
  • Like
Reactions: HAH
So in @snowbunny ’s example, a crooked sit gets a reward, but a straight sit gets a more valuable reward. Would this fit your thinking?
Well, even this can be difficult! Let's say you're looking forward to a nice cup of tea. You take a big gulp and then find out its coffee. Yuk! It's not that you don't like coffee, but because it's not what you were expecting, it's aversive. That can happen with rewards, too. A dog can be punished by something he likes because it's not what he was expecting. Gagh!
You can do what you suggest, but you have to think about how you implement it so you don't end up accidentally punishing the sit and losing the enthusiasm. I'd be more inclined to go back to basics, maybe use a place board or a barrier to give me a much higher likelihood of the straight sits which I can heavily reinforce and then fade that tool out.
 
  • Like
Reactions: HAH
Mix it up, by all means, but bin the idea that you "should" reduce your rate or value of reinforcement. The higher you keep that, the stronger the behaviours you will have, and the snappier.
Marvellous! I've always struggled with the reducing rate and value rewards, and worried that I just confuse us both a lot of the time. My mind just doesn't work methodically enough. I definitely can see the value in strong reinforcement for certain behaviours i.e in our case rewarding proximity to me in areas where there is tempting opportunity for her to self reward.
 
I struggle with this also @Selina27 I also use different value rewards for certain behaviours or high value treats for new ones. If I get an amazing 'stop' from chasing a rabbit, pheasant, hare etc. Hattie gets my entire treat bag scattered on the ground, hugely rewarding to her because she is a massive foody, a retrieve or tug would definitely not cut the mustard! Charlie loves cheese or fish treats which are rewarding to him, or a 'find it', and now 'scatter' I do change up my treats introducing different things. I have drived anchovies to try this week. I agree with @HAH and have always said, whatever works for your dog is the right way, I need to take my own advice! :rolleyes: xx
 
I haven't had chance to catch up on much on here lately but have just really enjoyed spending time reading this post, especially as I've been working on some similar things with Stanley especially lately for a variety of reasons, and the sit verses crooked sit part of the conversation has been interesting for us, I've been working on fitness with Stanley as he's wobbly and wonky bless him but he's also very sensitive, so with many people's opinions being not to reward a crooked sit that's the aim I started with but it didn't last long because my thinking being - Stanley is a wee bit wonky so we may not get a straight sit ever we may just get better posture, muscle tone and fitness, physically it may just not be quite possible for him, my next thought was if he thinks he's done as I've asked him to and he doesn't get a reward because he's crooked is this punishing and will it knock his confidence and I can say for Stanley it was noticable he wasn't doing well with this way of doing things, then going on the suggestion of releasing them/resetting them and asking again to see if they can sit straighter, to release/reset Stanley it involves a verbal cue followed by launching a treat still, which is absolutely the best game ever to Stanley so what I have at this stage is wonky sits which Stanley isn't getting rewarded for his confidence is obviously not great about this and then he gets released to something he actually loves doing, so I'm less likely to get a straight sit or a dog that's interesting in sitting happily but he's more likely to offer a wonky sit to get something he does find rewarding, so I switched it about and stopped focusing on straight sits just building value in sitting, so he sits however he sits and I give him a high value treat maybe two, release him to one kibble, he comes shooting back for the next sit, gets rewarded with high value again but also finds it rewarding to run off after the kibble I ping, we've gone from him not being confident about sitting because waiting for a straight sit knocked his confidence to him now throwing out sits because he enjoys the games we've played, even choosing to do it in environments I haven't asked him too that are a little more worrying for him, I'll hold my hand up to start I tipped the balance the wrong way but now he's happy as Larry! I've also been using this type of thing for counter surfing and llw/proximity as now Stanley feels more confident in the big wide world and we don't get the fear pulling on lead and he's in a head space to listen and learn I've been trying to address those times he'll just pull on the lead like a normal dog does sometimes, so again ping a bit of kibble out as he does like that game so it's also quite a rewarding distraction but reward with high value and frequently next to me on and offlead in easy environments and he's tootling next to me with an attitude of "er I don't need to be off over there it's way better here"
 
Top