This article is written with a coworker of mine in mind, given the disbelief shown when I presented what is know as the birthday problem or the birthday paradox — which is not a paradox, it just feels like it due to our weak probability intuition.
Let's start easy with the pigeonhole principle and a clear example of something everyone will surely agree with me — I hope. If we have 5 red balls and only 4 boxes to put them in (fig. 1), then surely one of the boxes must contain, at least, 2 balls. What does this have to do with people and their birthday? Well... Think of it this way. If we put 367 random people in the same room, then at least 2 of them must share their birthday. Like putting 367 red balls into 366 boxes, see?
Figure 1 — Visual representation of 5 red balls and 4 boxes.
Before you start to curse my name, because surely all those people can have their birthday occur at any day — including February, 29 — so why should they necessarily overlap? Try thinking about the problem the other way around. Imagine how you can make sure no one shares a birthday! Easy, right? Just have all the people celebrate their birthday in sequence. Let's say, Mark's birthday is January, 1, Lisa's is January, 2, Brian's January, 3, ..., Sarah's December, 31, and so on until every person is assigned to a non-overlaping birthday. Then... What about the 367th person? What about John? As soon as John enters the room, he must share a birthday with someone, because he must have been born in one of the different 366 days that make up a full year.
It still doesn't feel right, does it? Of course that happens, if I assign a day per person, but surely we can reason that some of the days might not even be celebrated! And that's were your reason it's tricking you. You are not thinking about people having birthdays, but rather you're thinking about you and your own experience. In your daily life, birthday's are a special and not frequent event and double birthdays are even scarcer! In light of this, you think about all the empty days in which you don't know a single person who's celebrating their birthday on. When you meet someone new, most likely he or she will have his or her birthday in some of those days that are empty to you. You assume that, because birthdays are a rare event to you, then they surely must always be a rare event — and that's just not true.
Just to complete the above idea, as soon as you start to create empty days, you must create shared birthdays. If you move Lisa's birthday from January, 2 to January, 1, then no one celebrates their birthday on January, 2, but Lisa is automatically sharing January, 1 with Mark. Hopefully, by now, you agree with me that when the number of people is 367, there is a 100% guarantee that, at least, 2 among the 367 must share their birthday. I can tell you now that you only need 70 people to reach the 99.9% guarantee and just 23 people will suffice for those fifty-fifty odds.
Do you not believe me? Here comes the fun part! Let's prove it... mathematically. First, let's forget about February, 29 and assume each day is equally probable. The problem is as follows. We want to find the probability that among N people, at least, 2 share a birthday. Since there are only two possible situations: 2 different persons share a birthday (`A`) and 2 different persons do not share a birthday (`bar A`), among N people, we can analyse it either way. If `P(A)` is the probability of 2 people sharing a birthday, then `P(bar A)=1-P(A)` is the probability of 2 people not sharing a birthday, among N people. I shall use this form, as it makes the process easier.
We shall think in sequence and analyse each person individually, comparing him/her to all the others we've analysed before. Starting with person 1, the probability of his/her birthday not being the same as the people before is 1, because we haven't analysed anyone yet. Let's write it as `365/365`. Next, we analyse the second person and the probability that he/she doesn't share a birthday with person 1 is having been born in any possible day, except for person 1's birthday, so it becomes `364/365`. Again, using the same reasoning, the probability that person 3 doesn't share a birthday with person 1 and 2 is having been born in any possible day, except for person 1 and 2's birthdays, so it becomes `363/365`. Generally, we can say that `P(X)=1-(X-1)/365` is the probability of person X not sharing a birthday with all the X persons before him/her.
Now, we are in a position to (almost) obtain the result we want! The probability that among N people no ones shares a birthday is thus `P(bar A)=P(1)*P(2)*P(3)*...*P(N-1)*P(N)` and we can write it as a formula:
`P(bar A)=prod_(X=1)^N (1-(X-1)/365)=(365!)/(365^N(365-N)!)`
You can test it yourself that for the values `N=1` and `N=366` we get the extreme results `P(bar A)=1` and `P(bar A)=0`, respectively, meaning that the probability no one shares a birthday among a group of 1 single person is 100% and the probability of no one sharing a birthday among 366 people is 0%, as explained in the beginning. If we calculate the complement of `P(bar A)`, then we get what we wanted all along: the probability of, among N people, at least 2 sharing a birthday `P(A)=1-P(bar A)`. Input the values `N=23` and `N=70` and we get, as promised, `P(A)=50.73%` and `P(A)=99.92%`, respectively.
Meeting someone who shares a birthday with you is one thing, but among a group of N random people, 2 people sharing the same birthday is another thing. That's why your reason tricks you. You fail to distinguish the two situations even though the first is somehow rare and the second is clearly not.
Just in case you're wondering how to calculate the likelihood of finding someone with the same birthday as you among N people, it's not difficult. We can do a similar reasoning as before, but instead of comparing will all previously analysed people, we compare only with the first — you. And thus, it becomes the probability of person 1 not sharing his/her birthday with you times the probability of person 2 not sharing his/her birthday with you, and so on. That is, `P(bar A)=364/365xx364/365xx364/365xx...=(364/365)^N` and again `P(A)=1-P(bar A)` is the probability of finding someone with the same birthday as you. So how many people do we need for the fifty-fifty chance? `N=253` will give you `P(A)=50.05%`.