betty's Blog.

Why the Poisson distribution does not apply to football

Cover Image for Why the Poisson distribution does not apply to football
Dirk Paulsen
Dirk Paulsen

There are people in this whole business of betting and gambling and football that are calculating lots of stuff. Many people have their own Excel files trying to work out some kind of prediction/odds calculation system. Quite often they also do it by trying to predict the goals expected for upcoming fixtures. They may not use the same algorithms — which is also pretty ok, because we prefer to be unique in that area —, but they still may come pretty close with some less deep functions. Also at betting market there are people trying to determine the goals expected for upcoming fixtures by kind of reverse engineering. But the small „problem“ with this approach lies here: they „rely“ on market being „good“ and having the right prices in the end. It it not a bad approach, but it is one that we would not wish to use at all. Because we want to determine the goal expectancies and the probabilities and all the markets to bet on all on our own — and we are able to do so.

Just to mention once again: there is no obvious reason why this „market closing line“ many of the pros refer to as „the best you can get“ should really be the best. It is driven by many obscure and also curios factors and it is definitely not the result of one singe proper algorithm. It is just a mixture of everything, assuming that it will come to a well-balanced average.

Anyway which ever algorithm or system people use in this market, some of the pretty advanced ones will reach this certain point: they have goal expectancies for an upcoming fixture. Like we do, maybe even no bigger or striking differences, but different and we would still insist: ours are better, although it takes some data analysis to prove that.

Now anyone having such goal expectancies for an upcoming fixture will at a certain point refer to mathematics. And there he will encounter the Poisson distribution. This is a very clean and really beautiful function. It is based on the fact, that an expected value can well be fractional, although the values occurring in a random experiment are distinct. So in the long run the average on one die will be 3.5 (1 + 2 + 3 + 4 + 5 + 5 = 21; 21 divided by the number of outcomes, which is 6, is 3.5; 21/6 = 3.5), but still the 3.5 will never be the outcome of one single roll.

Same happens in football. We expect like 1.63 goals for one side to score, which is their expected average to occur after a thousand games (or any high number) of this same fixture would be played, but it will never be 1.63 in one single game. So here is the perfect moment to refer to this Poisson distribution. We simply determine the probabilities for this team to score 0, 1, 2, 3, 4, 5, … goals. Poisson gives the answer. Sounds perfect, does it?

Well, now we do the same thing with the opponents goal expectancy, which may be 1.28 (so we have a fixture in which our or anyone else’s algorithms calculate it to 1.63 - 1.28 goals). Ok, well done.

Now we take these two goal expectancies together. We simply set up a matrix in which we determine how often this team score 0 goals and the other team scores 1 goals. We multiply these two values and have the probability for the 0-1 scoreline.

We repeat this algorithm for every result possible, in this matrix, until make 10-10 or even 20-20. So any result is covered in this 20 times 20 matrix and the sum of all those values within this matrix will be desperately close to 100%. Every outcome is covered, we know now not only how often this team wins and how often that team wins, but also how often the game will end in a draw. Could not be any better. We have everything we need.

Or do we? Now here comes the problem. And you even can read this in the net on wikipedia, where it simply says: „For simplification reasons we assume those two goal expectancies are independent of each other.“ Ok. If this were the case, then poisson would really be the perfect solution.

A simple example how thiis kind of game could be played to make it independent: the two teams „facing“ each other (they do not really „face“ each other in this example) are both playing their game in a locked room or on different pitches. Now they will not be informed about how many goals the other side scored. They play their game as well as they can and try to score as many goals as they can. But only afterwards they will be told, how many goals the other side scored. So they will simply try score and score and score, the more, the better. If this would still be based on that same assumption (team A 1.63 goals, team B 1.28 goals), then we would have the obvious and required independency.

But now we face this very sad answer with our all-so-nice Poisson distribution. Those goal expectancies may be well calculated - but they are not independent. The players on the pitch, the coaches, everyone around: they all KNOW the actual score line. They know it and they also know, by experience, how to react to it. If the trailing side are one goal down (could not be any more realistic) they may take risks they otherwise would not take were they ahead or were it still a draw. This „risk“ may include the higher chance to concede another goal, but it also raises the chance to score a goal themselves.

Imagine now they manage to do so and equalize: they will instantly fall back and relax a bit more. They achieved their primary goal, they are not losing any more, so they lower the risk again. It is something that anyone involved into the game of football will be able to observe every day. Should he be a player or a coach he may even have noticed it himself by getting/giving different orders. „Stay behind“ or „go forward“ to put it that simple. And this depends on the current score line.

There are lots of other things around it. And also people using this Poisson distribution may have notices as well and try some adjustments here or there. But be sure you understand first of all that all of this is well taken care of within our software. betty does it for you,

If you try to work out a proof yourself or you ask us to provide one further proof: you can simply sum up the draws you expected by this Poisson distribution for a certain campaign or time spell and compare it with the number of draws that occurred. You will find a pretty huge difference. This could amount to 5% or even more. So with Poisson the expected value could be like 22.5% draws while the ones occurred amount to 27.5% or even more.

So this would surely provide a proof, if you did not notice it by just comparing the draw odds coming out with Poisson and the draw odds market offers. There will be a pretty huge difference. This is not the real problem about it. The problem comes here: by the lowered draw probability that Poisson determines for you, this difference will be found somewhere else. And it will be the probability for this and that team to win the game. The (falsely) risen probability for one side to win will lower the true odds on them to win the game. So with this small error you made this approach will give you quite often some indications you may be tempted to bet on, because you get an indication. But this indication is misleading, because the win probability is really a git smaller. So in the long run you will really start to place bad bets and lose money on these.

Be assured at least that within betty all those things are taken care of. You can easily compare the draw probabilities with those Poisson would give. And in the end compare these to market prices. This will tell you.