The basic formation (which the author argues is not subtle enough to be true) is better thought of step by step.
Suppose a man comes up to you and says "I have two children"
At this point you build a set of possible realities, your model. There are four possibilities: {BB, BG, GB, GG}. This space fully describes a model whereupon there are two distinct, children with genders. Additionally, via assumption of independence and equal likelihood, you can assign probabilities to each observation, {BB:1/4, BG:1/4, GB:1/4, GG:1/4}.
"At least one of them is a boy."
At this point, you update your realities by removing the one firmly contradicted by the new evidence. Your new space is {BB, BG, GB} and when you renormalize the probabilities you get {BB:1/3, BG:1/3, GB:1/3} which leads to the idea that the probability at this point that the man has two boys is 1/3rd.
The author suggests however that during that second step, you should also take into account the possibility that this guy is lying or that the fact that he's proffering this information actually changes the likelihoods of those four scenarios in a way different from just multiplying one of them by 0. So perhaps the likelihood of hearing "At least one of them is a boy" is reflected like this:
{BB:0.35, BG:0.32, GB:0.32, GG:0.01}
And your new belief in each of these realities reflects that like so (renormalized)
{BB: 0.35, BG:0.32, GB:0.32, 0.01}
So now I feel even more confident that he has two boys.
Thanks for your explanation. I think his first renormalization process is wrong. Because once we know there is a boy, the problem space is reduced do "What's the probability of a boy?" which is 1/2. It has nothing to do with probabilities involving the known child.
I think the point is that it depends on what prompt the man is implicitly answering.
If you ask parents of two children if they have a boy 3/4 of them will say they do, but if you ask the same parents to randomly name the gender of one of their children 1/2 of them will say they have a boy. Those two different scenarios lead to the 1/3 and 1/2 odds respectively.
The problem space is not reduced to the gender of the unspecified child. The important distinction is that "One of my children is a boy" is a statement about both children, not just one of them. Compare that statement to "My first-born child is a boy," and it may make more sense.
I appreciate your reply. But I'm still not clear on it. Because of this: "At least one of them is a boy." As I see it, this statement contains the following pieces of information:
1. There are two children.
2. One of them is a boy.
The question is... what's the probability of there being two boys? Considering the information we've got, there are two possible scenarios remaining 1. [B, B] 2. [B, G]. So we have P = 0.5. No?
Three scenarios remain: [B,B], [B,G], [G,B], so the answer is 1/3.
Maybe you were thinking that the order doesn't matter. In that case, what was the probability of getting a boy and a girl, in any order? 1/4 + 1/4 = 1/2. So that's still twice as likely as getting two boys, and that ratio (2:1) will still hold after eliminating [G,G]. You again get 1/3.
Some people find it easier to picture it in terms of frequencies. Imagine 1000 families. What fraction of them have two boys, among those that have at least one boy?
ced, yes i was thinking that the order doesn't matter. I think this is what it comes down to. Do you think that order matters? If so, why?
yes i do find it easier to picture in terms of frequencies. in this case, take 1000 families which fulfill the criteria of "2 children with at least 1 boy". what is the probability that a family will have 2 boys? we have not sampled randomly. we have sampled according to the "2 children with at least 1 boy" criteria. we are not dealing with two random variables. one variable is fixed and we sampled according to it. now we are working with one independent random variable within that sample. that random variable has P = 1/2.
is there a flaw in my logic? if there is, please highlight it. i think the main confusion is:
1. we have sampled according to particular criteria.
2. we need to calculate a probability within that sample.
NOT the population that sample was taken from.
In a sampling of 1000 families, the expected values of each kind of family is as follows:
2xB : 250
1xB, 1xG: 500
2xG : 250
Sampling this population ignoring any family that has no boys leads to the probabilities
2xB : 1/3
1xB, 1xG: 2/3rds
You're still looking at the same probabilities; the models agree.
I don't fully understand your two random variables formulation. I think the confusion you're getting at is that there is an assumption that the chance of any given birth being male is theta = 0.5. The question however is not
"I have two children, at least one is a boy, what is the probability that my next child is a boy?"
It instead has to do with binomial probabilities on the space of a few repeated trials under parameter theta. The distribution is no longer flat.
Here's a more stark example of a similar form.
"I have 300 children, and at least 1 is a boy. What are the odds that I have no girls?"
ok, you stated this very well. It's now clear where the confusions arises: "Sampling this population ignoring any family that has no boys." Yes, with this interpretation the answer is 1/3, but it's contrary to my interpretation.
Actually, for anyone who is interested, see "Boy or Girl paradox". There is literature on this which discusses the different interpretations.
That sampling arises because being part of the population which has no boys is * necessary and sufficient* to (truthfully) make the statement that forms the paradox.
The alternative interpretation of the paradox arises when the wording of the paradox is construed to identify one of the children as male or female. In this case (stating something like "my first child is male"), being part of the population (x \in {BB, BG}) is necessary and sufficient and leads to the 1/2 probability of having two boys.
In short, the question becomes whether you believe the child is identified in the wording of the question. Honestly, the author of the paradox goes pretty far out of their way to say "at least one of the children is male" avoiding that identification.
Suppose a man comes up to you and says "I have two children"
At this point you build a set of possible realities, your model. There are four possibilities: {BB, BG, GB, GG}. This space fully describes a model whereupon there are two distinct, children with genders. Additionally, via assumption of independence and equal likelihood, you can assign probabilities to each observation, {BB:1/4, BG:1/4, GB:1/4, GG:1/4}.
"At least one of them is a boy."
At this point, you update your realities by removing the one firmly contradicted by the new evidence. Your new space is {BB, BG, GB} and when you renormalize the probabilities you get {BB:1/3, BG:1/3, GB:1/3} which leads to the idea that the probability at this point that the man has two boys is 1/3rd.
The author suggests however that during that second step, you should also take into account the possibility that this guy is lying or that the fact that he's proffering this information actually changes the likelihoods of those four scenarios in a way different from just multiplying one of them by 0. So perhaps the likelihood of hearing "At least one of them is a boy" is reflected like this:
{BB:0.35, BG:0.32, GB:0.32, GG:0.01}
And your new belief in each of these realities reflects that like so (renormalized)
{BB: 0.35, BG:0.32, GB:0.32, 0.01}
So now I feel even more confident that he has two boys.