Hacker News new | past | comments | ask | show | jobs | submit login

I’ve never been able to wrap my head around this one.

If in a group 23 people at least two share a birthday with a ~50% chance, would that mean if I have a group of 22 people and pick a random day of the year, there’s also a ~50% chance of somebody having a birthday on that day?




The trick is to realize that "same birthday" isn't a property of people, it's a question about pairs of people.

A group of 23 people has 253 unique pairs. So if I ask, "In a room with 253 pairs of people, what are the odds that in one of those pairs, both have the same birthday?" Now around 50% seems a lot more intuitive.

It's unintuitive because the number of pairs increases quadratically with the number of people.


Complete yet succinct explanation — bravo and i might steal this way of explaining it!



No. There's about a 22/365 chance that somebody has a birthday on that day. But this is getting close to the intuition. Go through the 23 people. For each one of them, there's about a 22/365 somebody else in the group has the same birthday as they do. So adding these probabilities up over all 23, you get 23 * 22/365. Of course, adding up the probabilities is obviously wrong since the events are not independent (and 23 * 22/365 > 1) but it gives the intuition that the probability of a shared birthday grows like N^2. [and actually, you can add up expectations] [Edits for correctness]


No. The chance would be much smaller. If you choose 22 random days though...


Your comment and erehweb’s finally made it click for me. Thanks.


For all these problems I find it easier to just simulate it and plot the results. Do a loop with 100k iterations. In each iteration generate 23 numbers between 1,365 from a uniform distribution. Then check in how many iterations you had duplicate entries.

You will see that asymptomatically you will reach 50k cases


Covid cases are asymptomatically approaching zero. However, your duplicate entries will reach their target asymptotically :)

https://en.wikipedia.org/wiki/Asymptote


I know, Apple ios keyboard does not.


Statistics aren't reflective. They're aggregate.

It's not a 50% chance of being on the date you chose, but that any two dates are the same.

If you pick a random date, Alice has a 364/365 chance of not having a birthday that day. Bob has a 363/365 chance of not having a birthday on that day or on Alice's birthday independently of whether or not Alice's birthday is on the random date. So we take Alice's chances and Bob's chances and we can multiply them together to get our odds that none of these people share a date. And of course, if we know the odds of something not happening, the odds of it happening is the opposite.


I think the easiest way to think about it is to invert the problem. What is the probability that no one shares a birthday?

One person, nobody shares a birthday because they’re alone.

Two people, it’s going to be a 364/365 ≈ 99.7% that they don’t share a birthday.¹

Now add a third person. We have the initial probability for the first pair and then the third person can’t share a birthday with either of them, so the probability is going to be 363/365 that they don’t share a birthday with either, but since we have two events that both must occur, we multiply the probabilities. That gives us (363364)/365² ≈ 99.2%

For the fourth person it becomes (362363364)/365³ ≈ 98.4%

You’ll notice that the probability is decreasing more and more with each step.²

In general, for n* people we would have³ P = 365!/((365-n!)365ⁿ). At n = 23, we drop below 50%.

This technique is useful for a lot of statistical calculations, e.g., for calculating the probabilities of poker hands, some of them are easier to calculate the probability of not getting the hand then the probability of getting the hand (a pair being the most obvious case).

1. We’re mostly programmers here so we’re going to pretend leap years don’t exist.

2. This will not continue indefinitely, obviously. The first derivative hits a minimum⁴ around n = 20 for this formula and at n=365. The graph kind of looks like cos θ over the range [0,π] but not exactly, it’s just the first well-known graph that comes to mind of this shape.

3. If you’re eagle eyed, you might notice that my examples before had a denominator of 365 raised to the n-1 power. Then you’ll also notice that my factorials have n terms and end with 365 and not 364. Remember our lonely birthday dude? His probability of not sharing a birthday (1) can be thought of as 365/365 and should also be part of the product, but when we’re doing the math for small numbers, it’s easier to just ignore that but for a general formula, keeping him in the product makes the formula clearer.

4. Minimum because we’re decreasing, remember.


that's just 1 date that has to be matched, so you get a lower probability of matching, comparing 22 dates to 1... in the 23 person scenario, you have 23 dates that can match with any of the 23 dates.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: