Saturday, October 24, 2015

On the probability of losing the EuroMillions transnational lottery

joão pestana

I want to talk about probability now. You cannot predict an outcome nor expect some sort of behaviour within some pattern based on past information. All you can do is estimate the likelihood of a certain event, but even if you can guarantee 99.9% of certainty, there is still the missing 0.1% playing against you. Probability is a way of organizing randomness.

The classical example is that of a perfectly balanced dice — a perfectly crafted cube marked with the number from 1 to 6 on each of its sides. If you roll and throw the dice, you do not know which number you will get faced up.  You do know that you will get a number from 1 to 6 and that, because it's perfectly balanced, each number has an equal likelihood of coming up.

If we define 0 to mean that an event is impossible and 1 that some other event is certain, any number between 0 and 1 can only mean you are uncertain about the outcome of that event. This is called a probability and you can also imagine it as a percentage ranging from 0% to 100%. It means that if you repeat an event with a probability somewhere in between many, many times, you observe such event occurring with that frequency. The more times you repeat, the more this is true — something called the law of large numbers.

Think about the dice again and you can guarantee that some number will come up, you just don't know which one. We can add up all the probabilities to one, because one of them must occur for certain.

`P(1)+P(2)+P(3)+P(4)+P(5)+P(6)=1`

As I said before, for a perfectly balanced dice, the probability of each number is the same. We can put it this way `P(1)=P(2)=P(3)=P(4)=P(5)=P(6)=P(n)` where `n` is an element of the set of possible results `N={1,2,3,4,5,6}` which means that `P(n)=1/6=0.1666...` or the probability of getting any particular number is about 16.7%.

Let's go back to our subject. If the probability of rolling a dice and getting a 4 is 16.7%, what is the probability of not getting a 4? I believe you know the answer. Intuitively, you say that it's the probability of getting any number except 4, but instead of adding up all those probabilities we might do something smarter. Since we know that some number must come out, the probabilities all sum up to 1. If we remove from that the probability of getting a 4, we get the probability of getting any number except for 4. Put in another way, it's about 83.3%.

`P("not "4)=1-P(4)=5/6=0.8333...`

Can we do something more interesting that just playing with the odds of a regular dice? Of course! I want to know the probability of not winning the EuroMillions' first prize — a transnational lottery between several nations of Europe.

Rather that add up all the immense possibilities of losing, I'm going to do the same thing I did before. If we find the probability of winning, all we have must do is subtract it from the certainty. The rules of the game are:

  1. The player must choose any 5 main numbers in the range of 1 to 50 and
  2. Choose 2 more numbers, called stars, in the range of 1 to 11.
The important here is that the numbers do not repeat themselves and the order in which they come out doesn't matter, unlike in permutations. For small numbers, we may be able to count all the unique outcomes from a set of possible events regardless of the order in which they happen. Imagine that we draw simultaneously two balls at random from a jar of balls numbered 1 to 3 and add their values together. We can expect to draw the numbers `{1,2}`, `{1,3}` or `{2,3}`. Even if they come out as `{2,1}`, `{3,1}` or `{3,2}` it won't matter. It's the same result, because we get the values 3, 4 or 5, respectively in either case. That is, there are 3 unique possible results for the draw. How can we count this for larger combinations?

If we count as 3 possibilities for the first ball and the remaining 2 for the second ball, we have a permutation with `3xx2=6` different outcomes — the order will matter. We must remove the repeating results. For this we have 2 different orders of each set of results so we divide the previous result by the number of repeating orderings and we get what we want. Luckily, there is a simple mathematical expression for this.

`C(n,k)=(n!)/(k!(n-k)!)`

The `C` stands for combinations and `!` means the factorial product `n! = n xx (n-1) xx (n-2) xx ... xx 2 xx 1`. From a set of different `n` elements we draw `k` elements at once and this expression gives us the total number of possible combinations regardless of the order in which they come out. For the previous example, we'd use it as `C(3,2)=(3!)/(2!*(3-2)!)=(3xx2xx1)/(2xx1xx1)=3`.

For the lottery in question, we have a total of `C(50,5)=2" "118" "760` unique possibilities for the main numbers and `C(11,2)=55` different combinations of stars. To the find the total number of different draws we just multiply the two to obtain `2" "118" "760 xx 55 = 116" "531" "800`. This means that the probability of winning the jackpot is simply `1/(116" "531" "800)=0.0000000085813...` which means that the probability of not winning the jackpot is about `99.999999141865%`.

Now I know you're thinking that there are more prizes besides the jackpot, but everyone wants the big one! Nevertheless, there is a simple formula that we can use to find the probability of getting any of the lesser prizes.

`1/(P(m,s))=(C(n,k))/(C(k,m)C(n-k,k-m)) xx (C(z,t))/(C(t,s)C(z-t,t-s))`

Where `n=50` is the total number of main numbers and `z=11` is the total number of star numbers, `k=5` is the number of choices for the main numbers and `t=2` is the number of choices for the star numbers. The only variables we are left with are `m` and `s` which are the number of main numbers and stars that we got right, respectively. So we get a nicer formula for `m in M={0,1,2,3,4,5}` and `s in S={0,1,2}`. I'm presenting all the information in Table 1.

`1/(P(m,s))=(C(50,5))/(C(5,m)C(45,5-m)) xx (C(11,2))/(C(2,s)C(9,2-s))`

`m` `s` `1/(P(m,s))` Prize distribution Average Prize
0 0 2.6 0.000 0 €
0 1 5.3 0.000 0 €
0 2 95.4 0.000 0 €
1 0 4.3 0.000 0 €
1 1 8.7 0.000 0 €
1 2 156.4 0.065 10.82 €
2 0 22.8 0.180 4.07 €
2 1 45.6 0.176 8.08 €
2 2 821.2 0.023 20.26 €
3 0 327.0 0.037 12.26 €
3 1 653.9 0.022 14.73 €
3 2 11 770.9 0.005 64.19 €
4 0 14 386.6 0.007 105.57 €
4 1 28 773.3 0.007 213.39 €
4 2 517 919.1 0.008 4 813.08 €
5 0 3 236 994.4 0.016 80 099.60 €
5 1 6 473 988.9 0.048 468 741.33 €
5 2 116 531 800.0 0.320 46 466 582.27 €
Table 1 — Odds of winning, prize distribution and average prizes for the EuroMillions lottery.

The value `1/(P(m,s))` should be interpreted as approximately a 1 in probability. The average prize displayed was taken from the euro-millions.com website for the period between 10/05/2011 and 23/10/2015. If you add all the distributions, there is a 0.086 missing that is intended for the booster fund. As you can see, both `1/(P(m,s))` and the average prize grow extremely fast, so I took the natural logarithm — a very, very slow growing function — of each and plotted them against each other.
Figure 1 — Prize and probability plot

As we expected, visible in Figure 1, the prize grows as the probability of losing increases. Even if we add up all the probabilities of hitting any of the prized combinations, that only amounts to about 7.81% leaving you with an astonishing 92.19% probability of remaining empty handed.