Hacker News new | past | comments | ask | show | jobs | submit login
Think you understand Monty Hall? Try the Tuesday boy problem. (scienceblogs.com)
106 points by ColinWright on Nov 29, 2011 | hide | past | favorite | 141 comments



Doesn't this rest on the simple ambiguity in the phrasing?

> I have two children and one is a son born on a Tuesday.

If by that is meant:

> I have two children. Here is some information about one of them: son, born on Tuesday.

Then the probability of the other child being a son is 1/2.

If on the other hand we mean:

> I have two children. One or more is a son. Exactly one of them was born on a Tuesday.

Then we get the 13/27 probability.

In fact it doesn't seem reasonable at all to assume that only one was born on Tuesday, while at least one is a son. One single interpretation of 'one of them' must be applied to both the gender and day of birth. Otherwise we're picking and choosing our interpretation on a whim.

edit: Colin appears* to think that what I've said here is incorrect, and I'd like to know why. I'm not a maths/stats person at all so am very keen to be re-educated on this matter.

* based on his now-deleted reply to ars which said "no, it doesn't, and no, you're not"

Important Edit Two:

If I'm reading this right, I think the defender of the 13/27 solution would say:

No. We don't discount the possibility of both being tuesday-boys (TB), we just adjust the calculation so that it doesn't count (eldest=TB, youngest=TB) and (youngest=TB, eldest=TB) as two separate possibilities.

To which I respond:

Right, so it's not down to ambiguity. But shouldn't you also discount every other symmetrical pair such as (eldest=TB, youngest=WB) and (youngest=TB, eldest=WB) and thus return the odds to 1/2? Or does that not return the odds to 1/2?


No, you shouldn't "also discount every other symmetrical pair", for exactly the same reason as there's a 1/36 chance of rolling double 6, but 2/36 chance of rolling a six and a one. It's all to do with labellings, and it's the most common source of error[1] in statistics.

[1] By "error" I mean calculations that then don't agree with the experimental results.


What about this:

I have two teenagers. One is a boy of 13.

Do we encounter a similar situation with regard to the odds of the second teenager being a boy?

edit: I'm thinking the odds are exactly the same, 13/27, by coincidence, as there are seven possible teen ages.

So then, what about this: One is a boy named George. Or One wears a black shirt. Or One likes chocolate.

Doesn't this mean that the more information we gain about the boy, the less likely it makes it that his sibling is a brother?


Doesn't this mean that the more information we gain about the boy, the less likely it makes it that his sibling is a brother?

More likely, but only if that information being true was a precondition for knowing about the boy in the first place. Take the following scenarios, assuming Alice knows Bob has exactly two children.

Alice: Do you have a son? Bob: Yes Alice: Pick one of your sons, and tell me the day of the week he was born Bob: Sunday

Here the day of week provides no additional information because Bob will always have an answer (like in Monty Hall, where Monty will always reveal a losing door), so the probability that Bob has two boys is 1/3.

Alice: Do you have a son who was born on a Sunday? Bob: Yes

Here having a son isn't enough; he also has to satisfy a condition that occurs with only 1/7 probability. Bob is more likely to be able to answer yes if he has two sons and thus two chances to satisfy that condition.


That sounds convincing.


> I have two children. One or more is a son. Exactly one of them was born on a Tuesday.

I'm not sure that's what you are supposed to infer.

Looking at your earlier statement:

> I have two children. Here is some information about one of them: son, born on Tuesday.

There are two ways to interpret this.

(1) I am a man pulled at random from the set of [families with two children of indeterminate gender]. Here is some information about one of them: son, born on Tuesday.

(2) I am a man pulled at random from the set of [families with two children of indeterminate gender, one of whom was born on a Tuesday]. Here is some information about one of them: son, born on Tuesday.

We're not selecting from the same initial set in each case - set (2) is more restrictive. A difference in probability is maybe not surprising.

Still, I totally agree with you that it's a bit of a jump to conclude that the man is referring to scenario (2) in which the birthday information is used to narrow the initial set while the gender information is used to determine the probability. Just seems like a trick question to me.

To expand on the numbers a bit more... In scenario (2) the possible combinations are:

(Combo A) G G (0/49 at least one boy TB)

(Combo B) B G (7/49 at least one boy TB)

(Combo C) G B (7/49 at least one boy TB)

(Combo D) B B (13/49 at least one boy TB)

If the man has one boy, then combo A does not apply and the probabilty he has two boys must be 13 / (7 + 7 + 13) == 13/27.


Yeah, I was very confused what exactly the "paradox" was at first, too, and why the expected interpretation should be the one it was. Here's how I finally see it:

If I say, "I have two cars, one's a 1994 Porsche 911", then I think it's reasonable to interpret that statement to mean the other car is not also a '1994 Porsche 911', but doesn't speak at all to its Porsche-ness, 1994-ness, or 911-ness. Rather, I'm wrapping them all up in a package, and saying the other is not this exact combination of characteristics.

Similarly, if I say, "I have two children. One's a son born on Tuesday," I think I now understand that I'd probably interpret that to mean the other child is not a son born on Tuesday. It doesn't speak to the gender or day of birth beyond that.

With that knowledge, that the other child is not a (son AND Tuesday-born), that's when the 13/27 arises.


> that the other child is not a (son AND Tuesday-born), that's when the 13/27 arises.

No. The other can be (son AND Tuesday-born) as well and the probability is still 13/27. See Colin's reply to me.


The ambiguity isn't in the phrasing of the father, we understand what he says about his family, it is in the probability universe that surrounds his statement. The scenario doesn't give one; we don't have a model for the probability of people saying anything at all unprompted. We have a simplified model for the probability distribution of genders, and one for how prompts and replies modify an already known probability model, but here there is nothing to anchor any reasoning of this kind.


if exactly one of the children is born on Tuesday, would't the probability of having a son be 5/13 and not 13/27.


I found this post confusing and ambiguous, so I restated it in simpler terms, with pictures:

http://mikeschiraldi.blogspot.com/2011/11/tuesday-boy-proble...


Thank you for this. I slugged through the whole original article and felt like I was being beat up with words.

However, I still fail to comprehend how the "at least one is a boy" quirk maths out to a 1 in 3 chance that his second child is also a boy.

Taken literally, it does. I understand that, in a set of data, GB is different from BG. But for the sake of our comparison, the order the children were born in doesn't matter. We're seeking if the other child is a boy, or not.

In my mind, the bit about it being the younger or older sibling is irrelevant information. We're comparing gender, not age. Regardless of if she is the younger sister, or the older sister, she's still his sister, and therefore not a boy.

I think the introduction of age is convoluting the issue, unnecessarily.

I now standby, ready to be proven wrong. I'd really like to wrap my head around this one, but I must insist that the age information is irrelevant.


Probability is much less meaningful when you're not talking about large groups of repeated trials.

In the context of the article, imagine that you didn't do this just one time, but that you asked 100 fathers about the compostion of their children.

The question posed in the article is essentially, "Of those fathers who responded 'I have one son,' (which is likely 75 of the 100), how likely is it that they have another son, (which is likely 25 of that group of 75, or 1/3).

When the article talks about the father standing next to one of his children at random and the probability of another son being 1/2 at that point, it helps to imagine those same 100 fathers all standing next to their children. Of that group, you're not eliminating the fathers standing next to girls based on the way the situation is posed.

The English words used to describe each case make it much less clear which group of 100 people we're talking about.

Also, when we talk about one father and not a group of fathers, the 1/3 or 1/2 number is much less meaningful. This is where insurance companies make their money (ideally). It's impossible to predict whether a single person will die in a car accident over their lifetime, and any number is essentially a guess. But it's very easy to predict that, say, 1 in 50,000 people will.


> However, I still fail to comprehend how the "at least one is a boy" quirk maths out to a 1 in 3 chance that his second child is also a boy.

Can you program? If so, write a program which runs the following trial over and over:

1. Assign genders at random to two children

2. If at least one is a boy, increment tally T1

3. If both are boys, increment tally T2

You'll find that T2 / T1 approaches 1/3.


You could equally well distinguish between BG and GB based on birth weight. Age has nothing to do with it. If the order of the children doesn't matter, the odds of conceiving a BG combination are twice as large as conceiving either a BB or a GG combination. The distinction is made so all combinations carry the same weight, which makes it easier to illustrate the solution.

As another alternative, you could distinguish between BG and GB based on a non-accidental property, like the alphabetic ordering of their names (assuming each starting letter is equally likely and no two siblings have the same name, both of which are probably false in practice).


I've been thinking about this for an hour, and I'm now convinced that the author is wrong. The fact that we found out about one of the children from the father means that all probabilities are not equal, even though they're treated here like they are.

The difference is between the information being offered, and determined independantly. I'll do this with the boy/girl problem, for simplicities sake.

If we ask a man if he has at least one boy, and he says yes, we can work out the chance the other child is a boy like so:

Assume all four possibilities are equally likely:

   BB
   BG
   GB
   GG
If we ask him if he has at least one boy, and he says yes, we effectively filter off GG, which brings the list down to:

   BB
   BG
   GB
Therefore, the chance of the other child being a boy is 1/3. Pretty straight forward.

However! Because the father offered the information on his own, it effectively turns it into the author's other problem, where the older child is a boy, find out the gender of the younger child.

The trick is that he's equally likely to give information about either of his children, therefore there are eight possibilities:

He gives information about child A:

   BB - B
   BG - B
   GB - G
   GG - G
He gives information about child B:

   BB - B
   BG - G
   GB - B
   GG - G
There are still 3 possibilities, but BB is twice as likely as the others, because if both children are a boy he's definitely going to reveal the gender of one of them as a boy; whereas if one is a girl and one is a boy, there's only a 50% chance he will.

   BB - 50%
   GB - 25%
   BG - 25%
So, the answer to this:

You meet a man on the street and he says, “I have two children and one is a son born on a Tuesday.” What is the probability that the other child is also a son?

Is 1/2

The answer to this:

A man has two children, and one is a son born on a Tuesday. What is the probability that the other child is also a son?

Is 13/27

It's nitpicky, but I think the author should be very exact about this kind of thing, since he's trying to clear things up.


Yeah, what matters is the contents of the initial set of families over which we determine probability.

"A man has two children, and one is a son born on a Tuesday. What is the probability that the other child is also a son?"

If the man is randomly chosen from the set of all families the answer is 1/2.

If the man is randomly chosen from the set of all families with a son born on a Tuesday then the answer is 13/27.

The reason for the difference is that a boy/girl family has a 1/7 chance that the boy was born on a Tuesday whereas the boy/boy family has only a 13/49 chance.

B G (7/49 probability of a Tuesday boy)

G B (7/49 probability of a Tuesday boy)

B B (13/49 probability of a Tuesday boy)

13 / (7 + 7 + 13) = 13/27


>If the man is randomly chosen from the set of all families the answer is 1/2.

By assumption, the man has a son born on Tuesday, so this is hardly relevant.

If A is a subset of B, then choosing x uniformly at random from A given that x is in B is the same as choosing uniformly at random from B.


I agree but set B is not a uniformly chosen subset of A in this case. That is the core of the trick. The rule for choosing B is intuitively uniform but actually slightly favours families with a girl and a boy over those with two boys.


You're making the exact mistake the author is cautioning against, which is assuming the day doesn't matter. Write out all the possibilities (see my other comment in this thread), eliminate the dupe, and you get 13/27.


Did you read my post carefully? I do get 13/27, when presented with the information from a neutral third party - i.e. filter for all 2 child families with one son male/Tuesday, what is the chance the other is also male/Tuesday.

But the fact that the father voluntarily offered up the information changes the probability distribution. We can assume he's selecting one of his children at random, and revealing their birthday and gender.

If only one of his children is a male/Tuesday, there's a 50% chance he'll say male/Tuesday.

If both are, there's a 100% chance.

So I'm not counting the possibility twice; I'm saying that given that the father reveals male/Tuesday, it's disproportionately likely that's as a result of having two male children born on a Tuesday compared to any other possibility.


We can assume he's selecting one of his children at random, and revealing their birthday and gender.

This is the entire point of the article, IMO: we have to make some assumption about how we selected this guy to talk to, and how he chose what to tell us. It's not pinned down by the statement of the problem, and what you might consider a natural assumption is not necessarily what other people might assume.

Which is why these problems tend to suck...


I definitely agree - I hate problems like this, because the difficulty is caused by ambiguity of English, not the problem itself.

But even given other assumptions as to why the father selects the child he does - sort by date, males first etc, the author's answer of 13/27 is still almost certainly wrong - he should have just taken the father out of the equation completely.


I don't think this has anything to do with the ambiguity of language or English. It does, however, like the Monty Hall problem, require you to make assumptions about how/why a speaker presents certain information.

Think of it like this. What would the father say if he did in fact have two boys who were both born on Tuesday. Would he really say, "I have two children and one is a son born on a Tuesday"? Wouldn't he instead say, "I have two children and both are sons born on a Tuesday."?

I mean he could say it the first way, but such a comment would be borderline misleading. To say you have one son born on Tuesday when in fact you have two is technically correct, but I think the problem assumes that the man is speaking somewhat plainly.

So I do agree with you that communication intent is ambiguous, but I agree with some others that this is sort of the whole point of the problem, and it's not always immediately obvious that statistical information is hiding in seemingly irrelevant data.


I agreee -- this problem should not be discussed in English. Python seems better: http://news.ycombinator.com/item?id=3290313


The probability that he would make that statement given what his children are is a different question than the probability that the other child is also a boy given that he made that statement. Your conditional probabilities are correct, but your conclusion about what they mean for the original question is flawed.

    N = Total number of ways to have 2 children over 7 days = 14^2 = 196
    B = Two sons born on Tuesday.
    O = Exactly one son born on Tuesday.
    A = At least one son born on Tuesday. 
    T = Two sons.
    S = The statement.
Priors:

    P(B) = 1/N = 1/196 = 0.005
    P(O) = 26/N = 26/196 = 0.133
    P(A) = P(B) + P(O) = 27/N = 27/196 = 0.138 
Your conditional probabilities:

    P(S|O) = .5  
    P(S|B) = 1  
An interesting number we can infer from your conditionals is the probability that a father selected at random would make the statement:

    P(S) = P(S|B)P(B) + P(S|O)P(O) = 1.0 * 0.005 + 0.5 * 0.133 = 0.0715 
But the question we asked about the other child already takes into account the fact that he did make that statement, meaning we're back to only caring about those 27 cases:

    P(B|S) = 1/27 = 0.037
    P(O|S) = 26/27 = 0.963
    P(A|S) = 27/27 = 1.0
    P(T|S) = 13/27 = .481 
Another interesting number we can infer from your conditionals is the probability that a father would make the statement given that at least one of his children was a boy born on Tuesday:

    P(S|A) = P(S|B)P(B|A) + P(S|O)P(O|A) = 1.0 * 0.037 + 0.5 * 0.963 = 0.519
If you still think this is incorrect, can you point to exactly which number is wrong and explain why?


The problem I have is that you're assuming that each of the 27 outcomes has equal probability, but the chance that we received the information in the way we did makes that a flawed assumption.

The best analogous problem is the German Tank Problem:

http://en.wikipedia.org/wiki/German_tank_problem

If we have destroyed a single German tank with a serial number 100, we can at least to begin to make an estimate on the size of the German force, by basically asking the question:

"If they have 200 tanks, what was the chance one we randomly killed was this serial number? 500? 1000?"

And then combining n=100->infinity to form a probability distribution. You can then say that there is an x% chance that Germany has 500 tanks, and a y% chance that Germany has 10,000 tanks.

However - if instead, we asked 'does there exist a German tank with a serial number 100', and the answer is yes, this does NOT tell us anything past the fact that their tanks are >= 100 in number.

We have the exact same information, but how it was determined changes the outcome drastically.

Does that make sense?


No, it doesn't make sense, because it doesn't apply here. We aren't estimating the count of anything. We know he has two kids, we know there are two genders, and we know there are seven days. All other relevant counts can be calculated directly from these, no estimation required.

I already showed the probability that we would receive the message the way we did, assuming we're sampling fathers with two children, and its pretty low. If we drop the sampling assumption, it would go even lower. But that's irrelevant to the actual question, because we've already won that lottery. I've also shown the probability that we would get the statement we got given that the father had at least one son born on a Tuesday, but again, we already won that lottery.

If you still insist, can you please stop talking in hand-wavy fake math and show some actual concrete numbers? To start, if each of the 27 possibilities are not equally likely, what are the actual probabilities and why?


The issue here is with assumptions - you have made a different set of assumptions from the author, and hence are getting a different result.

A lot of people here are having similar issues, by misreading exactly what the initial proposition means.

Your reasoning above relies on the 'likeliness' of a man giving you the information, which is something that is not meant to be a part of the problem. Although it is phrased as a man 'telling' you something, that statement is really a metaphor for 'you determine the following piece of information, 100% truthfully'.

In particular, your explanation assigns agency to the man - that if he does in fact have a son, he may or may not choose to reveal the truth 'I have a son'. However it makes no allowance for the man lying - so you are assuming if he answers it will be truthfully, but you are allowing him the lie of omission.

Whilst there is nothing specific that rules out your interpretation, it is not what is intended. Read it instead as:

----

There exists a man, A.

A has exactly two children.

The statement 'A has at least one Son, B' is true

What is the chance that the statement 'The Non-B child of A, is a son' is true?

----


I agree that that is what the author intended to communicate. But I actually think that's a bigger stretch than my own interpretation - it changes how the information was determined, which has a definite impact on the outcome.

The reason why I posted was to suggest that the author should have worded it the second way, i.e.

A man has two children, and one is a son born on a Tuesday. What is the probability that the other child is also a son?

Which leaves no doubt. I guess I didn't really make that clear enough with my original post.


I'd also observe that carefully read, this article is really about how important assumptions are, and not about the problem per se. The Peter Winkler quote is key.


The distinction you're making is in your first example both the man is arbitrary, as well as the day of the week.

In the second example, only the man is arbitrary.

With those assumptions what you say is correct.

However, the question is "You meet a man on the street and he says, “I have two children and one is a son born on a Tuesday.”" and not "You meet a man on the street and he tells you he has two children, that one is a son and he tells you what day of the week he was born". So the day is not arbitrary, it's specifically Tuesday.


0.48 ~ 0.5 Close enough for statistical purposes!


That last comparison is false. In both cases p=13/27 .


As the currently top voted comment does not get it, I try to intuitively explain the paradox.

No, it is not ambiguity of language. It says formally:

I have two children. There exists a child of mine who is (boy and born on tuesday).

And yes, the probability of the other child being a boy is 13/27.

To understand this, try a more extreme case:

When a child is born we generate a random number: rnd(1billion)

Now the man says:

'I have two children. There exists one of them which is boy with generated id=456765234'

As we can see, the probability of the other child being a boy approaches 1/2 as we increase the bound of our random number!

Why is this intuitively?

Because by using a very specific id we are closer and closer to specifying one of the boys, as the probability of the id not being unique decreases. We are closer and closer to saying this:

'Randomly pick one of my children. He is a boy. What is the probability of the other being a boy?'

Yes, that would be 1/2.

Edit.: Seriously downvoting this??? For most commenters the correct result was not intuitive. (It was not intuitive for me at first also). I tried to make this intuitive to them. (a bit deeper than usual paradxes though) Or do you still think the 13/27 result is not correct? In that case you are fighting mathematical truth with the downvote button.

Edit: Thanks, corrected probability from 2 to 1/2


Well, you're talking about probabilities being greater than 1... That doesn't really make sense in this discussion.


I think it's fairly clear that nadam means 1/2, not 2.


it is now he's edited it. Wasn't clear before, to me at least. But then I "don't get it" so what do I know ;)

edit: can't reply to you as HN isn't giving me the option. But: No worries, I didn't take it personally, just a little dig :)


Personally, I find Tanya Khovanova's explanations (also linked to by the article) much easier to read: she's discussed this and related problems quite a few times.

http://blog.tanyakhovanova.com/?p=221


I did not mean 'not getting it' in a negative way. (English is not my mother language.) Getting a paradox or not is a state. First I did not get it also. And I only wrote my comment, because at the time it was almost a consensus in this thread that the paradox is an uninteresting language ambiguity. I wanted to say that it is more interesting than that.


No, it is not ambiguity of language.

This is a big call to make, seeing as English is not your native language. As a native English speaker, I find it very ambiguous as to which set of probabilities I should be counting.


I meant that the paradoxical nature is not resolved by simply formalizing the statement into first order logic the following way:

"I have 2 children. Exists child where (Born_on_tuesday(child) and Boy(child))."

(Some other paradoxes completely disappear when you try to write the statement into first order logic. It is not the case here.)

This is what I wanted to say.


You see, the problem is that even your latest attempt to describe the question is ambiguous - the solution depends on how 'I' was selected. And if ever you describe that in a non-ambiguous manner, I suspect that the answer would no longer seem paradoxial. In other words, ambiguity in language is at the very heart of this problem.


  Randomly pick one of my children. He is a boy. What is the probability of the other being a boy? 
  Yes, that would be 1/2.
To be clear, that probability would be 1/3. But I believe i understand where you are getting at. You have to phrase the question in the form "pick one of my 2 children, he is a boy with property x=x0, what s the prob that the other is a boy?" .

The probability is prob = ((N/2)-1)/(N-1) where N=2 * 2 * (number of different property values for x). For N>>1, prob tends to 1/2


Why can't the other boy also be born on Tuesday?


The other can also be born on Tuesday. (generated id = 1 in rnd(7)) And the other can also have a generated id = 456783123 in rnd(1billion). But as the bound of the rnd is bigger and bigger, the probability of that approaches to zero. So as we know more and more information of one of the children, we are closer and closer to simply uniquely identifying him, so the probability of the other being a boy approaches 2.


If both can be born on Tuesday then telling me one was born on Tuesday does not uniquely identify him! It's not unique if both can do it.

And since it's not unique the rest of your analysis is based on a faulty assumption.

(BTW I did not downmod you, in case you were wondering.)


It is about specifying a child, and then saying it is a boy.

The one extreme is: one of my children is a boy. in this case the other is a boy with 1/3 probability.

The other extreme is: One of my children has a national unique id=... He is a boy. The other is a boy with 1/2 probability.

And other cases are in between. The more information you provide on the first child (the less chance there is that the other can have the same property, the more close you are to 1/2.

Of course in the rnd(7) example we are in-between:

1/3 < 13/27 < 1/2


"the less chance there is that the other can have the same property"

No! That is not true! The two events are independent, specifying information on one has zero impact on the other.


They are independent, but you get information about both ("One of them is a boy."), therefore p=1/3.

If you say "the older is a boy" then you have a statement about one son and no information whatsoever about the younger one, therefore p=1/2.


"One of them is a boy." is a completely different scenario and no one is arguing about it. (BG, GB, BB, GG - by saying one is a boy you exclude GG leaving 1/3.)

In this case we are asking if the other is a boy NOT if the other was born on Tuesday.


Ah, sorry, I misunderstood your question.

You are asking, why the information "one is born on Tuesday" says anything about the gender or birthday of the other, right?

More specificaly, this:

> Let us first assume that it is the older child who was a son born on a Tuesday. In this case the second child could be either of two sexes, and could have been born on any of seven days of the week, for a total of 14 possibilities.

> Now let's suppose it is the younger child who was a son born on a Tuesday. Then the older child could, again, be either of two sexes and could have been born on any of seven days of the week, again providing 14 possibilities. Added to our original 14 that would seem to give 28 possibilities.

> But be careful! One possibility got counted twice. Specifically, the one where both children are boys born on Tuesdays. So really there are only 27 possibilities. And since 13 of them involve the second child being a boy, the probability would be 13/27.

Maybe it helps, if you first imagine all 196 (2 * 2 * 7 * 7, think of four 7-by-7 tables [draw them, it helps a lot ;)]) possibilities. Eliminate 49 of them where both are girls (one of the four tables) and you have 147 possibilities/cells left. Then, in each of the two BG and GB tables leave only one row/column, eg. eliminating another 2 * 6 * 7=84 possibilities.

Now comes the interesting/tricky part: The last case/table, where both are boys contains only 13 possible cases (instead of 2 * 7)! Imagine a 7-by-7 table, where each row and column corresponds to a day. Mark all the cells which are not in a Tuesday-row or a Tuesday-column, eg. eliminate all posibilities where none of the two boys are born on a Tuesday. This eliminates another 49-13=36 cases.

So, we have a total of 196-49-84-36=27 cases, 13 of which "the other is a boy" and 3/27=1/9 cases where "the other is born on a tuesday".

I hope this makes somewhat sense. Once I've drawn all the possibilities and marked all the impossible ones, it became a lot clearer.


ADDENDUM: The tables:

       A: Boy                     A: Boy
       M T W T F S S              M T W T F S S
                           
 B M   . 1 . . . . .        B M   . 1 . . . . .
   T   1 1 1 1 1 1 1          T   . 1 . . . . .
 B W   . 1 . . . . .        G W   . 1 . . . . .
 o T   . 1 . . . . .        i T   . 1 . . . . .
 y F   . 1 . . . . .        r F   . 1 . . . . .
   S   . 1 . . . . .        l S   . 1 . . . . .
   S   . 1 . . . . .          S   . 1 . . . . .
  
  
       A: Girl                    A: Girl
       M T W T F S S              M T W T F S S
     
 B M   . . . . . . .        B M   . . . . . . .
   T   1 1 1 1 1 1 1          T   . . . . . . .
 B W   . . . . . . .        G W   . . . . . . .
 o T   . . . . . . .        i T   . . . . . . .
 y F   . . . . . . .        r F   . . . . . . .
   S   . . . . . . .        l S   . . . . . . .
   S   . . . . . . .          S   . . . . . . .

 total: 196
 of which are possible ('1'): 27
 of which 'the other is a boy': 13
   (all the ones from the upper left table)
 of which 'the other child is born on a tuesday': 3
   (the T/T-cell of each table)


If the older child is the boy born on Tuesday, you have 7 chances the younger child is a boy. If the younger child is a boy born on Tueday, you only have 6 chances the older child is a boy. This is because the 7th chance (older is a boy born on Tueday) was covered in the first scenario.

Summarized from: http://www.sciencenews.org/view/generic/id/60598/title/Math_...


It's fun to reason these things out, but it's easier and faster to simulate, at least at first.

  import random, time, collections, pprint
  
  sample = collections.defaultdict(float)
  
  now = time.time()
  
  known_child = ('M', 2)
  
  while sum(sample.values()) < 100000:
      genders = [random.choice('MF') for dummy in '..']
      birthtimes = [random.uniform(0, now) for dummy in '..']
      birthdays = [time.localtime(t).tm_wday for t in birthtimes]
      gendertimes = zip(genders, birthdays)
      if known_child in gendertimes:
          gendertimes.remove(known_child)
          sample[gendertimes[0]] += 1
          if (sum(sample.values()) % 10000) == 0:
              print sum(sample.values())
  
  males,females = [sum(c for (g, t), c in sample.items() if g == gender)
                   for gender in 'MF']
  
  print males / (males + females)
Result when I ran this was .4816, approximately equal to 13/27.


I stopped caring about 'paradoxes' ever since I understood there's no isomorphism between pure mathematical concepts and human language.


The paradoxes are important in highlighting the ambiguities in our language.

The annoying aspect is when someone uses a paradox to show off how mathematically clever they are, instead of to show how ambiguous language is. As XKCD illustrates: https://www.xkcd.com/169/


The more sons you have, the more likely that one of them is born on a tuesday. Thus, if you take all the two-child families with at least one son, and eliminate the families without a son born on tuesday, the two-boy families are more likely to remain than the one-boy families, and you will end up with a higher proportion of two-boy families than before.

Simple.

(edited)


Simple.

And wrong. Note that the correct answer is 13/27 probability of a boy, which is less, not more, than 50%.


Sorry, I should have said a higher proportion of two-boy families, not a higher proportion of boys. 13/27 is more than 1/3, which is the proportion of two-boy families before you eliminate the families without tuesday boys.


    Let's try a simpler problem. Suppose we know that a 
    certain man has two children and we also know that the
    older one is a boy. In this case we would say that the
    probability that the other child is a boy is 1/2. After
    all, the sex of one child is independent of the sex of
    the other child. That the older child is a boy has no
    bearing on the sex of the younger child.

    Now suppose we know simply that a man has two children
    and that one of them is a son. This time we would reason
    that there is no possibility that the person has two
    girls. It follows that the sexes of his two children,
    ordered from oldest to youngest, are either BB, BG or GB.
    Since these cases are equally likely, and since only one
    of them involves having two boys, we would say the
    probability that the man has two boys is 1/3.
Maybe I'm missing something, but this seems fundamentally wrong. Pr(two boys | older child = boy) is not equal to Pr(two boys | one+ child = boy)? Why is time so special? Could we not order them based on their height, and assert Pr(two boys | taller child = boy) = 1/2. Or, if there were a scale of masculinity, order them based on that and assert Pr(two boys | manlier child = boy) = 1/2, where manlier child = boy <==> one+ child = boy, so Pr(two boys | manlier child = boy) = Pr(two boys | one+ child = boy) (contradiction).


For a randomly selected family with two children, there are four possible boy/girl combinations: B B, B G, G B, G G

In the first case we are told that the older child is a boy. This leaves only two cases: B B, B G Therefore, there is a 50% chance the second child is a boy.

In the second case, we are told only that [at least] one child is a boy. This leaves three possibilities: B B, B G, G B Therefore, the probability that both children are boys is 1/3.

Enumerating possible states of the world like this is the fundamental insight you need to have to be able to understand these types of problems - but it does take a while to get used to!


But you are assuming that each of those three possibilities has equal probability; can you explain the rationale for that? (it is clear why BB, BG, GB, and GG have equal probability in the unrestricted case, but less clear why BB, BG, and GB have equal probability in this restricted case)

Besides, this is just a rephrasing of the original article's argument, and doesn't counter mine at all. I am open to the possibility that there is a flaw in my argument, but where is it?


So you say: "it is clear why BB, BG, GB, and GG have equal probability in the unrestricted case"

The restricted case is just the unrestricted case + one additional bit of information, that is, you're told that GG is not an option. This eliminates GG from the unrestricted case, but says nothing more about the probabilities of the other options. So the probabilities stay equal, although they now equal 1/3 each (if you eliminate options, the remaining options all become more likely).

What you're missing is this: The statement "the older child is a boy" has more information than the statement "one of the children is a boy". The first statement allows you to eliminate two options (GB and GG), while the second statement only allows you to eliminate one option (GG).

The "older" part is not fundamental to the problem. Equally, the statement "the taller child is a boy" has more information than "one of the children is a boy". The problem with this is that probabilities for height are not so friendly like the 50/50 probabilities for birth order (e.g., boys are likely to be taller than girls, older children are taller than younger, etc), which introduces unnecessary complexities to a logic problem. So that's why birth order is used for these types of puzzles.


Using age as the ordering is not important. What matters is that when enumerating possibilities you count the probability of the first and then the second, and the second and then the first.

If manlier child = boy <==> one+ child = boy then surely in your final equation one of the sides boils down to Pr(two boys | impossible event) as the probability of either manlier child = boy or one+ child = boy must be 0?

The problem comes from the fact you've asked an impossible question, not from applying an ordering. If the ordering is used consistently, then it should all work fine.


  > Why is time so special? 
Because it is a part of the condition? How else can you have "older child" in your condition, if time is not important? Stating that first child is a boy is the same as opening the door in Monty Hall problem: it eliminates particular combination.

If you don't know the order of the kids you have three ways to have a situation where two boys are possible: BB, BG, GB. When you know that first one is a boy, you eliminate the GB case and are only left with BB and BG.


I hate this one because while it says:

  "I have two children and one is a son born on a Tuesday."
It actually means:

  "I have two children and only one of them is a son
   born on a Tuesday."
You are supposed to just assume this modification.


No I don't think that's true. Bear with me :-)

There are 14 possible permutations for a child

Mon-Sun Girl

Mon-Sun Boy

which for two children gives a total of 28:

Child 1 is Boy, born Mon-Sun

Child 1 is Girl, born Mon-Sun

Child 2 is Boy, born Mon-Sun

Child 2 is Girl, born Mon-Sun

Note that we haven't said which child is 1 and which is 2, and in fact we don't know because we haven't been told - this is the important bit.

So, for our unknown child to be a boy, it must be one of the following 14 permutations:

unknown child is Child 1, boy, born Mon-Sun

unknown child is Child 2, boy, born Mon-Sun

but one of those permutations is already taken by our known Tuesday boy, so we have to remove one of the Tuesday permutations leaving 13 possible outcomes out of 27. Note that one of those 13 outcomes is 'unknown child is son born on Tuesday' - it's perfectly valid to have both sons born on a Tuesday.

Note also that we can't say which permutation we're removing until we know which is Child 1 and which is Child 2, just that one of them is taken.

If the original statement were phrased as '... the older child is a son born on a Tuesday' then you would have a constraint on which was Child 1 and which was Child 2 and then the probability would be 1/2 as expected because you would know up front that you were entirely discounting, say, Child 1, AND that the removed Tuesday permutation also belonged to Child 1.

But I stand to be corrected!


First, saying Mon-Sun is confusing almost everyone since most people start the week on Sun, not Mon. (Even in Europe Christians start the week on Sunday.)

In any case, as I replied here http://news.ycombinator.com/item?id=3290118 you can not remove the duplication! It's two different situations, even though they may appear the same.


ajanuary explains the duplicates problem nicely here: http://news.ycombinator.com/item?id=3290185 so I won't double up.

I've spent the last hour and a half wrestling with the same clash between maths and intuition that you're experiencing so I understand your frustration - 13/27 really is the right answer though.


He explains it, and he's wrong.

Look, I understand your intuition wants you to remove duplicates "They are the same!". But you should resist, because doing that gives incorrect results.


On the contrary - I'm saying that 'removing the duplicates' is counterintuitive and that's why you're having trouble with it, the same as I did.

Besides: please re-read my original post - I didn't mention anything about removing duplicates, and in fact chose to explain the problem in a different way precisely because the duplicate removal thing was so difficult for me to grasp sufficiently to be able to explain it.

Ask yourself this: where does the original 28 come from? One of those 28 permutations appears to be 'Child 1 is Tuesday boy' and 'Child 1 is Tuesday girl'. If you can answer that then you should be able to understand the whole thing.

(edited for clarity)


No, you are not supposed to assume that only one of them is a son born on a Tuesday. The 13/27 probability comes from the knowledge that one is a son born on a Tuesday, but you don't know which child this refers to - the other child being a son born on a Tuesday is covered in the 13/27 probability.


No, the 13 was arrived at by excluding the possibility that the second child was a boy on Tuesday. (Read the article again and see.)

Once you include that possibility it's back to 14/28.


When considering the case that the older son was the one born on a Tuesday, that gives 14/28 possibilities. One of those 14 is the case that both were born on Tuesday.

When considering the case that the younger son was the one born on a Tuesday, that gives 14/28 possibilities. One of those 14 is the case that both were born on a Tuesday.

But woops, we've already covered the case that both were born on a Tuesday in our first count. Removing the duplication gives the 13/27.

You then combine those two probabilities.

The 13 wasn't arrived at by excluding the possibility that the second child was a boy on Tuesday, it was arrived at by discounting the case that they were both born on Tuesday as it had already been covered in the previous sub-calculation.


No, you do not remove the duplication.

If you have two kids there are 4 possibilities, not 3: BG, BG, BB, GG

BG seems to be the same as GB, except that it's not. And it's not in this case either.


Looking at your simple example of genders using the same process the article uses to enumerate the combination of gender and days:

Assuming Child 1 is a Boy:

Child 2 can be: Boy, Girl

Assuming Child 1 is a Girl:

Child 2 can be: Boy, Girl

Assuming Child 2 is a Boy:

Child 1 can be: Boy, Girl

Assuming Child 2 is a Girl:

Child 1 can be: Boy, Girl

Combining those gives us the following combinations:

Child 1 | Child 2

---------+---------

Boy | Boy

Boy | Girl

Girl | Boy

Girl | Girl

Boy | Boy

Girl | Boy

Boy | Girl

Girl | Girl

Clearly there is duplication in there we need to remove any exact duplications before we get to the 4 possibilities you listed.

Now looking at the problem in the article again, but using a 2 day week for brevity:

Assuming Child 1 is a Boy on Tues:

Child 2 can be: Boy on Mon, Boy on Tues, Girl on Mon, Girl on Tues

Assuming Child 2 is a Boy on Tues:

Child 1 can be: Boy on Mon, Boy on Tues, Girl on Mon, Girl on Tues

Combining those gives us the following combinations:

Child 1 | Child 2

--------------+-------------

Boy on Tues | Boy on Mon

Boy on Tues | Boy on Tues

Boy on Tues | Girl on Mon

Boy on Tues | Girl on Tues

Boy on Mon | Boy on Tues

Boy on Tues | Boy on Tues

Girl on Mon | Boy on Tues

Girl on Tues | Boy on Tues

In exactly the same way, removing exact duplications gives us 3/7.

There is no ordering over the Tuesdays. "The older child being born on Tuesday and the younger child being born on Tuesday" is exactly the same as "The younger child being born on Tuesday and the older child being born on Tuesday". Just like "The older child being a boy and the younger child being a boy" is the same as "The younger child being a boy and the older child being a boy".


Except you are not actually supposed to remove the duplicates!! It is a very common mistake, but it's simply incorrect, it leads to incorrect results.

Also, why are you numbering the kids as child 1/2? There is no such distinction made. If you changed your list so that the fixed child is always listed first, and removed duplicates you would have 4 possibilities.


The removal of duplication and the distinguishing between GB and BG are two completely different things.

If you're not meant to remove the duplicates, then why doesn't your enumeration of two gendered children run: BB, BG, GB, GG, BB, GB, BG, GG?

I number them merely to distinguish them. The "a child has 50% chance of being a boy, and 50% chance of being a girl" applies to a single child, so you need to enumerate their states independently. To do that you need to be able to distinguish between them.

It is this fact that each child's possible states should be treated independently that means you can't combine BG and GB, not any aversion to removing duplication. Because they are independent entities BG represents one - nominally called child 1 - is a B while the other - nominally called child 2 - is a girl. GB represents one - nominally called child 1 - is a G while the other - nominally called child 2 - is a B.

If the list were changes so that the fixed child is always in the list we wouldn't be enumerating all the possibilities for each child and then combining them. That would be falling into exactly the same mistake you are keen to avoid.


I would be interested in your reasoning why G,B and B,G can (rightly) be considered distinct, and B,B and B,B can (rightly) be considered the same enumeration and thus one discounted, but BT,BT and BT,BT () should still be considered distinct when it is clearly the same situation as B,B?

() Where BT is "Boy born on a Tuesday"


I think I've spotted where your misunderstanding is.

BG is only not the same as GB if there is some other information available - which was born first, what their names are, hair colour, etc., because then you'd be saying something like

Boy born first, Girl born second

Girl born first, Boy born second

and those are two distinct possibilities. The point is that they are only distinct if you have this extra information, which we don't. We have no way of differentiating between the two children except for gender, and therefore BG and GB both just say 'one male child and one female child'. You might assume that the order of the two letters specifies the order in which the children were born, but that's a false assumption because it's not stated anywhere.

If you include the order, there are four possibilities: a) Boy first, Boy second, b) Boy first, Girl second, c) Girl first, Boy second, d) Girl first, Girl second. If you don't, there are three possibilities: a) two boys, b) two girls, c) one boy and one girl. The unspecified information about order is the subtle but important difference.


No, they are correct in saying that GB and BG are distinct, even with no other information.

It is not order that is important, but considering each child as a distinct entity.

The 50% chance of being a boy and 50% chance of being a girl applies to a single independent child. When enumerating the possible combinations we need to first enumerate the possibilities for each child, and then combine these two enumerations into our overall enumeration.

To help us distinguish between the two children, let's call one Sam and the other Alex. There's no ordering over them, it's just to help us tell which one we're talking about.

Sam can be: Boy, Girl

Alex can be: Boy, Girl

Combining those gives us:

Sam is a Boy, Alex is a Boy

Sam is a Boy, Alex is a Girl

Sam is a Girl, Alex is a Boy

Sam is a Girl, Alex is a Girl

Or, to use a shorter notation, BB, BG, GB, GG.

Using oldest and youngest is just another handy way of distinguishing between the two children. Even if they were nameless, faceless children with no distinguishing characteristics other than gender we still need to consider then separately, each as their own entity with their own enumerations of possible genders.

Where the parent commenter is wrong is saying that (correctly) considering GB and BG as distinct combinations is the same as considering "Oldest Boy born on Tuesday and Youngest Boy born on Tuesday" distinct from "Youngest Boy born on Tuesday and Oldest Boy born on Tuesday". They are not. When you stop thinking about ordering and instead think about the enumerated states of each entity (child) involved it becomes clear both are saying the same information. They are not distinct, but the same combination phrased differently.

The parent commenter is basically saying BB should be distinct from BB just because the first one talks about Sam first and the second one talks about Alex first. This is clearly wrong.


Hmmm I think we're saying more or less the same thing. The point I wanted to make is that

"I have a son and a daughter"

is the same as

"I have a daughter and a son"

unless you qualify the statement with further information - names (Alex and Sam from your post) would be an example of that further information. My feeling was that ars is implicitly 'filled in the blanks' somewhere, treating the two children as distinct when in fact they have to be interchangeable for the purposes of the original (13/27) calculation.

It's the lack (or not) of such details that alters how the probability is calculated.

I admit though that I'm still chasing myself in circles trying to understand the whole thing so take this reply with a pinch of salt ;-)


[deleted]


OK, after rereading the article, I'm wrong about this - that's not what the author is doing. Post deleted to not spread confusion myself...


"I have two children and one is a son born on a Tuesday."

If it were about the phrasing you could assume the other child is a daughter - if he were to continue ... "The other is also a son, born on a Friday" it would just sound strange.

Perhaps it is all down to his turn of phrase :)


more than that, the modification is inconsistently applied; we're supposed to magically know that only one is born on Tuesday, but not that only one is son.

edit: accidental downvote, sorry.

edit2: but see my top-level reply to OP.


For the people who don't believe this, here's a Python program:

    x = [(c1,c2,d1,d2)
            for c1 in ["B","G"]
            for c2 in ["B","G"]
            for d1 in [0,1,2,3,4,5,6]
            for d2 in [0,1,2,3,4,5,6]
            if (c1 == "B" and d1 == 1) or (c2 == "B" and d2 == 1)]

    num = sum(1 for (c1,c2,d1,d2) in x if c1 == "B" and c2 == "B")
    denom = sum(1 for _ in x)

    print("%s/%s" % (num,denom))


If you generalize "born on Tuesday" to a generic attribute, with probability p (equal for boys and girls) it becomes easier to see what's going on. For brevity, let's say people with the attribute are positive and people without are negative.

There are seven (ordered) possibilities involving at least one positive boy, and we'll subdivide them into two subgroups: B+B- B-B+ B+G- G-B+ / B+B+ B+G+ G+B+. The important thing to note is that within each group the outcomes are equiprobable.

If p is near 1, then the first group has almost zero probability and we have ~1/3 chance of two boys. If p is near 0 then the second group has almost zero probability and we're left with ~2/4 chance of two boys.

Intuitively, if p is near 1 then the statement "I have at least one positive boy" is almost equivalent to "Both my children are positive, and I have at least one boy" (group 2, 33% chance). If p is near 0, then the statement is almost equivalent to "I have exactly one positive boy" (group 1, 50% chance).


When in doubt, use brute force:

    1BT 2BM    1BM 2BT
    1BT 2BT    1BT 2BT <-- Counted twice
    1BT 2BW    1BW 2BT
    1BT 2BR    1BR 2BT
    1BT 2BF    1BF 2BT
    1BT 2BS    1BS 2BT
    1BT 2BU    1BU 2BT
    1BT 2GM    1GM 2BT
    1BT 2GT    1GT 2BT
    1BT 2GW    1GW 2BT
    1BT 2GR    1GR 2BT
    1BT 2GF    1GF 2BT
    1BT 2GS    1GS 2BT
    1BT 2GU    1GU 2BT
I think the issue people are having is that they want to distinguish between the two instances of '1BT 2BT', but it doesn't work like that. Let's say I numbered each of the possibilities written above 1 through 28. If I handed you a piece of paper with '1BT 2BT' written on it, could you tell me which number that corresponded to? No - it would be one of two numbers - so the numbering is additional information not given in the problem statement.


Alternatively, suppose we label the combinations of child-day BM, GM, BT, GT ... BU, GU (14 items). We have a bucket with 2 of each of these combinations, as there are 2 kids (total 28 items). The problem states that we have taken one "BT" item out of the bucket (so 27 items left), and are asking how many "B*" items are left in the bucket (13).

The confusion arises because one can consider the same problem with replacement, putting back the BT item in the bucket before picking a second time. It's something that's not clear if u are not thinking of enumeration.


What would the question have to be to make the following the answer?

Suppose we label the combinations of child-day BM, GM, BT, GT ... BU, GU (14 items). We have two buckets which both contain a copy of each combination (total 28, 14 in each bucket). The problem states that we have taken one "BT" item out of one bucket, and are asking how many "B" items are left in the other bucket (7 out of 14).


That would be two parents having one child each, one of whom says his only son was born tuesday, and asking what 's the probability the other guy has a son.


Having 2 each of BM, GM, etc. is implicitly labeling 1BM, 2BM, 1GM, 2GM, etc. If you sampled with replacement, you have to admit the possibility that you draw the same BT twice. It's a stretch to suggest that the statement is ambiguous in this way, as it would imply that the father could have the same child twice.


Trying to figure out why it seems wrong at first: it seems my first reaction would be emotional, that the odds of having a boy should be 0.5. However it is exactly because the total probability of a boy has to be .5 that the probability of a second boy has to be less than 1/2. The fact that you can pair "boy" with any enumerable attribute that can bring the probability down to 1/3 is ... funny.


Sometimes I hate these types of problems - though they're quite fun to argue about - because though they seem to be simple, straightforward enumeration problems, the "right" answer depends on assumptions about how the information that you've got is obtained, and what the other possible pieces of information you could have obtained are.

Here's the thing: nobody needs to see how you obtained your answer, we're all smart enough to enumerate things, so don't bother; similarly, leave out the equations, they're simple enough, and that's not where anyone falls over. If you want to argue productively about this type of problem, you need to explain why you enumerate the possibilities the way you do, not how.

And actually, I think the article goes through this problem pretty well, as mentioned there it hinges on two questions:

1) In this "random sampling", was "girl" a possible answer? Or did we restrict the sample pool to only people with boys, and only let them reveal that they had a boy?

2) Similarly, was any day other than "Tuesday" a possible answer?

Once people agree what the "obvious" answers to these questions are, disagreements tend to evaporate very quickly, and the enumerations/equations solve themselves. Unfortunately these assumptions are almost never spelled out in the problem, which is why these damn problems keep confusing people...


"You meet a man on the street and he says, “I have two children and one is a son born on a Tuesday.” What is the probability that the other child is also a son?"

The fact that one of the sons is born on a Tuesday, is blonde, has freckles, got an A+ on his math paper is irrelevant to the gender of the second son. The author's entire reasoning misses the entire point of the Monty Haul paradox where Monty Haul specifically reveals a door known _not_ to have the Goat, thereby immediately modifying the probabilities of the remaining door.

In this "Tuesday Boy Problem" - no such selection takes place. There is no paradox. There is a 50% chance that the other child is a boy and a 1/7 chance that they were born on a tuesday (or a monday, wednesday, etc...).

Note - The wikipedia article on this topic does a much better job discussing the ambiguity involved in asking the question - http://en.wikipedia.org/wiki/Boy_or_Girl_paradox.

The sad part of this, is that if you had _selected_ a family in which at least one son was born on a tuesday, then you would have modified the probabilities of the other child being born a son - but no such selection was done here, therefore no probabilistic impact on the chance of the other child being a son.


There is a shorthand and implied understanding in mathematical word problems (else they would be formulas!). As soon as you ask "what is the probability" you are implying either a sampling/generative process, or a subjective (e.g. Bayesian) probability framework. I think it's clear this is the former case.

The generative model implied for each child is: pick uniformly from boy/girl, and pick uniformly from Mon-Sun. The generative model for the father is: generate two children.

This generative process has a well-defined outcome distribution and it is completely reasonable to ask "what is the probability of generating an outcome with two boys, given that you generated an outcome with a Tuesday boy?"


  A small simulation in Python

  from random import randint
  oneBoy=0
  twoBoys=0

  # 0 = boy, 1 = girl
  for i in range(10000):
      s1 = randint(0,1)
      s2 = randint(0,1)
      if s1 * s2 == 0:  # Here is at least one boy
          oneBoy = oneBoy+1
          if s1 == s2:  # Here are two boys
              twoBoys = twoBoys+1
  print("Two Boys")
  print(float(twoBoys)/float(oneBoy))

  # As expected the result is near 1/3.

  oneBoyT=0
  twoBoysT=0
  #days 0=Monday, 1=Twesday ...
  # (0,1) means (sex,day) = a boy was born one Twesday

  for i in range(1000000):
      s1 = randint(0,1)
      d1 = randint(0,6)
      s2 = randint(0,1)
      d2 = randint (0,6)
      if (s1,d1)==(0,1) or (s2,d2)==(0,1):
          oneBoyT = oneBoyT+1
          if s1 == s2:
              twoBoysT = twoBoysT +1
  print("Two Boys in Twuesday")
  print(float(twoBoysT)/float(oneBoyT))

  # The simulation gives a result near 1/3, 
  # this is a hint to prove that 13/27 is incorrect
  # under the assumption that the population sex and
  # day are independent, and with probability 1/2 and 1/7

  # if you want to convince me otherwise, I'll be 
  # glad to see the code to generate the population
  # and the estimated probability.


Wait. What? The author says there can only be one BB but that there can be both GB and BG?

What he's doing here is saying that birth order functionally doesn't matter if the sibling is a boy, but it does matter if it's a girl. How is this correct?

If you keep comparing apples to apple you get:

  Older Boy / Boy
  Boy / Younger Boy
  Older Girl / Boy
  Boy / Younger Girl
And we're back to a 50% chance that the other child is a boy.


The Monty hall problem is based on the idea that you will always do the same thing in response to my choice aka he can always pick an open door. It's feels add to think about it in terms of what's already happens but seems more reasonable to say it in terms of something that will happen.

So if you say I will flip 2 coins and if I get zero heads I will flip again. So, if the first coin is a head second one either a head or a tail, but if it's a tail you know the second one is a head or I would have flipped again. Thus 3 options one of which is HH.

Assuming you used the same approach with the Tuesday boy problem, aka the first one can be BMTWTFSS or GMTWTFSS and the second one can be BMTWTFSS, GMTWTFSS but if I don't get a BT from the first or second try's I will pick again. Thus BT + BMTWTFSS or GMTWTFSS, OR BMTWTFSS or GMTWTFSS + BT minus a BT,BT which would otherwise be counted twice. Thus it's 14 + 13 options with 7 + 6 being BB. Which works out to 13/27.


Altogether now! Here's the chart:

   B   G
B BB BG

G GB GG

Satisfied?


Since he has two children, you know first off that these are all equal chance historically:

BB BG GB GG

Now since you know the child is a boy, it eliminates the fourth option. So now we have these possibilities:

BB BG GB

In only one of these is the other child a boy, so the chance is 1/3.


This explanation makes a ton of sense. I get it now. Thank you.


Also I don't think the exact day of birth being Tuesday makes one bit of difference to the gender. For example, if you were to say the known boy was born crying, and the chance of this is 70%, it doesn't affect the gender of the other children. It's just useless trivia.


That's not true. It does affect the probability of the other child being a boy just like being born on Tuesday does.


> I have two children. Child A is a boy, born on a Tuesday, what is the probability that child B is a boy?

We have no knowledge of child B so the probability is even for it being a boy or a girl.

> I have two children, at least one of them is a boy who is born on a Tuesday. What is the probability that I have two boys?

We have incomplete knowledge of both children, we are constraining a probability space in 2 dimensions (i.e. the 2 children) rather than refining it down to 1 dimension as before. In this case the probability is 13/27 as these tables (http://news.ycombinator.com/item?id=3290349) nicely show.

When the father says "one is a son born on a tuesday" this can be read as him selecting a child and then asking you about the second or as giving you some information that can apply to either child and then asking you about both.

In my opinion the language used leads much more easily to the first interpretation, this is the reason surely for the confusion.


The crux of the issue is much better illustrated by the simplified problem where if a man tells you one of his kids is a boy, the other child is a boy in 1/3 cases. That's intuitive looking at the possibilities:

BG GB BB

Where this analysis gets confusing is when you say that if you meet the man with one of his kids at random and it's a boy, the probability changes to 1/2. The reason is in that case one of the possible outcomes was indeed GG, even if the child you met was a boy. The difference is how much knowledge you had about the situation prior to making your measurement of the outcome.

If that doesn't make your head spin even a little you're not really human.

I was able to make peace with these problems when I read about the Two Envelopes Paradox and realized that it's not really possible to pick a random integer, because by picking a random integer you're essentially limiting the domain of integers you've picked from to a finite one. If you can wrap your head around that it might help you save your sanity.


I finally understand it.

Here are my events (taking 2 children as given):

  B = I have a boy
  BT = I have a boy and he was born on Tuesday
  2B = I have 2 boys
  B+G = I have a boy and a firl
Using Bayes' Theorem:

  Pr(2B|BT) = Pr(BT|2B)Pr(2B)/c
  Pr(B+G|BT) = Pr(BT|B+G)Pr(B+G)/c
  where c = Pr(2B|BT) + Pr(B+G|BT)

  Pr(BT|2B) = 1/7 + 1/7 - 1/49 = 13/49
  Pr(2B) = 1/4
  Pr(BT|B+G) = 1/7
  PR(B+G) = 1/2
So

  c = 13/49 * 1/4 + 1/7 * 1/2 = 27/49
  Pr(2B|BT and 2C) = (13/49 * 1/4) / (27/49) = 13/27
  Pr(B+G| BT and 2C) = (1/7 * 1/2) / (27/49) = 14/27
They main reason Pr(2B|BT) is around 50% is that Pr(2B|BT) is proportional to Pr(BT|2B) while Pr(B+G|BT) is proportional to Pr(BT|B+G) and Pr(BT|2B) is about 2 * Pr(BT|B+G).

It's much more likely you have a boy born on Tuesday given that you have 2 boys (rather than you have a girl and a boy) so it's more likely that you have 2 boys given that we know you have a boy born on Tuesday.


P(two boys, at least one on tuesday) = 1/4 (1 - 6/7 6/7). The 1/4 is the probability of two boys, and the 1 - 6/7 6/7 is the probability at least one is born on tuesday.

P(one boy, one girl, boy on tuesday) = 1/2 * 1/7. The 1/2 is the probability of having one boy, one girl, and the 1/7 is the probability that the boy was born on tuesday.

The man can make his statement if either of the above is true, and they are disjoint, so their sum gives the probability that he has two children, one a boy born on tuesday.

The ratio of the first to the second is the probability that both are boys. Plug in the numbers and you indeed get the 13/27 the article claims.


To answer the question posed by the title: no, I didn't think I understood Monty Hall, but thanks to this article I'm pretty sure I do now. This is a good explanation and analysis.


I understand thanks to a comment I found here: http://helives.blogspot.com/2010/07/tuesday-child-puzzle.htm...

====================================

Unfortunately, these questions you ask are ambiguous, and it is the failure to recognize how they are ambiguous that causes the results to seem unexpected. Consider two versions of what led up to the first statement:

Case #1: A father is chosen at random. He is given a slip of paper as he is led onto a stage. The paper says "Pick one of your children. Tell the audience the number of children you have, the chosen child's gender, and the day of the week on which it was born."

Case #2: A father is chosen at random from all fathers who have two children, including one boy born on a Tuesday. He is also ushered onto a stage and given a slip of paper that instructs him to tell the audience the criteria used to select him.

Now shift scenes. You are in the audience when a man is ushered onto the stage. He looks at a slip of paper, thinks a moment, and says "I have two children and one of them is a boy born on Tuesday." What is the probability that he has two boys?

The answer to the question depends on which case applies to the man you listened to. In Case #1, it is 1/2. In case #2, it is 13/27. Your simulation only covered the second case. To get the first, after you have two children, flip a coin to see which one the father will tell about. If it is not a Tuesday Boy, don’t keep that trial even if the other child is a Tuesday Boy. You will find that the 27 cases where you have a Tuesday Boy reduce to 14 (just over half, since one father didn’t need to flip the coin), the 13 where you also have two boys reduces to 7, and the answer is exactly 1/2.

If you simulate the simpler problem, where you don’t worry about the day of the week, the answers are 1/2 and 1/3 for the two cases, respectively. The reason 13/27 seems unintuitive, is because the fact that a Tuesday Boy was REQUIRED in the second case is not intuitively obvious from the statement "one of them is a boy born on Tuesday." In fact, as you point out, the puzzle could equally well be named after either of your two children, which is probably two different names. You choose one, just like the father in case #1, so the better answer to your question is 1/2, not 13/27. It is still ambiguous, but there is no valid reason to assume that case #2 applies.


p(2s\T)=p(2s & t)/p(T)

p(2s & T)=1/4(13/49)

p(T)=1/4(13/49)+1/2*(1/7)

p(2s\T)=(13/49)/(13/49+14/49)

p(2s\T)=13/27


If you get that far down in the comments, you should also read why I think all solutions given here and by the author are wrong: http://news.ycombinator.com/item?id=3291009


This post would have been much better if the author had used more precise language. I still don't know whether the assertion being made by the father is that at least one of his children is a son born on Tuesday, or exactly one is.


How would the solution change if the problem was stated as follows: 'You meet a man on the street and he says, “I have two children and one is a son born in March.” What is the probability that the other child is also a son?'


Or, "...one is a son born in an even-numbered month"


I have two children. One of my children is a son born on a Tuesday. Of course, my wife is still pregnant with the 2nd, but she will be thrilled to know there is a better than average chance she is having a girl!


Scott Aaronson has a really good article about the anthtopic principle and conditional probability.

http://www.scottaaronson.com/democritus/lec17.html


"If you surveyed all such families, you would find that roughly 13/27 of them have two boys." roughly 13/27? How rough are we talking, maybe it's roughly 13/26 instead.


The degree of "roughness" depends on the number of families surveyed. The more there are the closer to 13/27 you'll get.


> Now suppose we know simply that a man has two children and that one of them is a son. [snip] It follows that the sexes of his two children, ordered from oldest to youngest, are either BB, BG or GB. Since these cases are equally likely, and since only one of them involves having two boys, we would say the probability that the man has two boys is 1/3.

Errr. except these cases aren't equally likely - because the boy we know about could be either of the boys in the BB scenario, but only one either of the other two. So the probability is still 1/2.


Since the constraint simply says that at least one child is a boy, we don't need to distinguish between two types of BB.

This is a curious misconception, by the way. The typical failure mode I see on these questions is people having difficulty accepting BG and GB as different possibilities.


If we are determined to look at order of birth (ie. have a BG and GB) then we should consider all cases by order of birth, so where 'B' represents the boy we know and 'b' or 'g' represents a child we dont, and the first character represents the first child, and the second the second childe - we have:

Bb, bB, Bg, gB

50%.


You need to look at it in a Bayesian sense.

Consider all the possibilities and then look at the outcomes they produce.

BB BG GB GG

These are all equally likely, right? They each have P = 1/4. If we took 1000 fathers who had two children each, and lined them up, we would expect 1/4 of them to have children matching each of the above pairings. That means 250 of each. With me so far?

Now, we know that we are dealing with a father who has at least one son. If we went along the line and asked each father 'Do you have at least one son: Yes/No?', then 750 would answer yes, and 250 (the GG fathers) would answer no.

If a random father comes up to us and says 'I have at least one son', we know that he is from those 750 - we have selected a subset to deal with. Of those 750, only 250 have a second son, so the probably is 250/750 or 1/3.


You're assuming a different selection model, in which a father of two boys is twice as likely to volunteer information if he has two boys: "Pick a child at random, then if it's a boy - say: I have a boy [blablabla] what is the other?"

However, you're doing this on an already selected sample by discounting all the girl-girl pairs. The selection rule in your logic then becomes a two-step "If you have at least one boy, then pick a child at random, then if it's a boy, say: [blablabla]"

Given that, your analysis is correct.

But, seeing as it is a wildly different selection model than what everyone else seem to work with, you should be explicit about it.


Imagine a similar problem but with red and blue poker chips. Say, for example, that I have a bag and I pull out two chips, one at a time.

In this problem, would you still try to distinguish the two identical red poker chips using your logic?


Okay. To respond to both of you - I have put this in programming just to check, and I think you are taking a different inference from the question than I am.

Under your inference, the man wouldnt have mentioned anything unless he had at least one male child. (in which case you can say the GG scenario is gone, but GB BG and BB are equally probable)

Under my inference, the man just told me the sex of one child at random.... (in which case BB is twice as probable as GB or BG - where he could have equally said 'i have at least one girl')

That sound reasonable to you? I have it in ruby form if you are interested :)

The blogpost linked to by someone above (http://blog.tanyakhovanova.com/?p=221) uses this explanation, which is very different (to me) than the one in the main link, and I can see why this gets to 1/3:

"A father of two children is picked at random. If he has two daughters he is sent home and another one picked at random until a father is found who has at least one son."


"Under your inference, the man wouldnt have mentioned anything unless he had at least one male child. (in which case you can say the GG scenario is gone, but GB BG and BB are equally probable)"

No. I'm just assuming he has two children, and randomly mentions something about one of them. The GG scenario is only eliminated after he makes his statement, because we then know he has at least one boy.


Ah, then I do think you have an error of logic.

Put it this way. before he says it, we have GG, GB, BG, BB

after he says "I have a child that is [MALE OR FEMALE]" we have (where the capital letter is the child whose sex has been mentioned, and the lowercase letter is the other child):

Gg, gG, Gb, gB, Bg, bG, Bb, bB

So if he has said the sex is male, then we have four combinations left:

gB, Bg, Bb, bB.

Understand that we go to more scenarios (8) based on which child is mentioned, before we go to fewer. Actually the order of the children is something you can and should ignore, however as you are holding on to it, I show it this way....


I'm afraid this is wrong - you shouldn't distinguish between Bb and bB. In this problem, they are not different states, so counting them messes up your probability calculation.

Someone had a nice link higher up: http://mikeschiraldi.blogspot.com/2011/11/tuesday-boy-proble...


The last Mythbusters episode put Monty Hall to the test and passed. I have to say it was the first time I truly understood why Monty Hall works.


I think you a few words. I have only 1 son who was born on a Tuesday. Basing this on ambiguity of phrasing is not novel, or interesting.


I have a son born on xDay. What is the probability that my next kid will be a son?

I dont get Monty Hall :(


The Tuesday Birthday problem is related to you not having info about the order leading to extra possibilities that you don't intuitively consider. Like in the simple example, if one son is a boy, probability of other being a boy is 1/3rd because the info you have the sets available could be BB, BG or GB. But if the younger son is a boy, then the GB possibility is removed, so it's 1/2 on the older son - BB or BG.

In your case, your younger kid is a son born on xDay. If you assume you will have another kid, chances on being a son are still 50%. It only gets interesting if either kid could have been a son.

In Monty Hall, the info that you don't intuitively consider is that Monty is providing additional information. You think of it as 50/50 because Monty ruled out a goat and car has to be behind one of two remaining doors, but actually the way it works is you chose a door that 1/3rd had a car, leaving 2/3rds chance of car on the other two doors. Then Monty eliminated one of those two doors, still leaving 2/3rds chance of car on the remaining door. So you should switch to that door. In Monty Hall, first you divide the set into a 1/3rd chance group of 1 door and a 2/3rd chance group of 2 doors, then Monty makes the second group a 2/3rd chance group of just 1 door.


Spoiler: Tuesday was the name of his horse.


ANYONE WHO SAYS THE ANSWER IS OTHER THAN 1/2:

Let's simplify the question:

If a man says "I have two children, one was born on a Tuesday," what is the probability that they are both born on a Tuesday?

Is the answer to this 0 or 1/7 in your opinion? (Or something different).


You can follow a similar process to work out the probability as for the original question. For any two children, the possibilities are:

Child 1 born Mon, Tue, Wed, Thu, Fri, Sat, Sun

Child 2 born Mon, Tue, Wed, Thu, Fri, Sat, Sun

We know that one of the children was born on a Tuesday, but we haven't said whether it's Child 1 or Child 2 (and that lack of information is key to understanding the original problem), so our unknown child could also be either Child 1 or Child 2.

With that in mind, 'unknown child' has 14 possible options except that we know one of those options (our known Tuesday child) is already taken - we don't know whether it's Child 1 or Child 2 but that doesn't matter, we just know that one of them is already taken. That leaves 13 possibilities for unknown child, only one of which is 'born on Tuesday' (since we've removed the other 'born on Tuesday' option), so I would say the answer is 1/13.


This is ridiculous. Let's do some queries on the world:

From all women: (some 3 billion)

the ones who have exactly two living children: x matches

the ones in which it is true for them to say "One of my children was born on Tuesday.". (this is true or false for every mother that comprises x. y mothers remain (can answer "true" for that question")

the ones that can say "both my children were born on Tuesday": z matches

are you telling me that sizeof z is 1/14th the sizeof y???

(because for me it is either 1/7 that size or exactly 0 if the correct meaning of "one of" is "exactly one of and not both of")


Wait, I think I get it. (Sorry, too late to update).

This is indeed ingenious. The key is the step from x to y. The women in x who can say "one of my children was born on Tuesday" are the ones who can say "EITHER my first OR my second child was born on Tuesday"; the initial constraint (when you meet the person) is thus: "women who can say I have exactly 2 children and it is true to say EITHER my first OR my second child was born on a Tuesday"; obviously this is far more than 1/7th. Then the additional constraint "BOTH of them were" is a smaller addition than 1/7th.

It's not 1/7th the first time and 1/7th the second time, because the question wasn't "of women who can say they have exactly two children, the number who can say 'my elder child was born on a Tuesday'" and then "of these the women who can say 'my younger child was also born on Tuesday". This would indeed be 1/7th each time.

instead y is "EITHER my elder OR my younger child was born on Tuesday". This obviously results in a set that is greater than 1/7th of all women with two children, and, consequently, it is no surprise that there is a correspondingly smaller than 1/7th possibility that both were.


Bingo :-) I was just racking my brain for a different way to explain it but luckily you got there first!

It's a real mind-bender.


1/13




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: