Hacker News new | past | comments | ask | show | jobs | submit login
Nobody Understands Probability (jsteinhardt.wordpress.com)
84 points by kmod on Sept 13, 2010 | hide | past | favorite | 60 comments



For those who didn't make it all the way down:

People often intuitively think of probabilities as a fact about the world, when in reality probabilities are a fact about our model of the world.


Not an exact quote, but E. T. Jaynes: "If I am ignorant about a phenomenon, that is a fact about my state of mind, not a fact about the phenomenon."


There are at least two interpretations of probability. Firstly the epistemological, as you say, represents our lack of knowledge about the world. Secondly the aleatory truly represents the phenomenon of chance in the world.

The best interpretation of quantum theory for example (as I understand it) takes the latter view that randomness is genuinely physically manifested, and does not simply represent our inability to model reality.


I think you're confusing concepts. The original point is about the difference between the map and the territory. And, because I'm (finally) systematically going through Eliezer's sequences: http://wiki.lesswrong.com/wiki/Map_and_Territory_(sequence)

Your second paragraph is about a particular map: quantum theory. Quantum theory has probabilities in it. The dominant interpretation of quantum theory is that the probabilities accurately represent what happens in the universe; they are not artifacts for us to correct. But there is still a difference between our map (quantum theory) and the territory (the universe itself).

Put another way: quantum theory is a map with uncertainty baked into it. But this uncertainty has been accurately mapped.


No I don't think so. Unlike statistical physics, where probabilities are simply a mathematical technique for dealing with uncertainty, quantum mechanics actually postulates that randomness is inherent to the universe.

If you disagree with me, please describe how your concept of "maps and territories" applies to the StatPhys/QM distinction.


Suppose I accurately map the coastline, and every relevant part of the coastline is depicted in my map. But the map is not the same as the coastline itself.

If my coastline has some feature that blips in and out of existence in a predictable way, I can integrate that into my map. My map then has uncertainty in it. That uncertainty is an accurate reflection of the coastline itself - but there is still a distinction between the map and the coastline.

I don't disagree with your second sentence. But there is still a difference between our theory of quantum mechanics and the universe itself.


  The best interpretation of quantum theory [..] takes the [..] view that
  randomness is genuinely physically manifested
We could argue endlessly about whether that is 'the best' interpretation (which ethical assumptions does your 'the best' presuppose?) but fortunately it doesn't matter. As Mermin (from the famous Ashcroft and Mermin book on Solid State Physics) famously quipped: you can 'shut up and calculate'. The usefulness of the model does not depend on its interpretations (although the interpretations are certainly important with respect to scientific progress).


Well indeed, and this doesn't just apply to quantum mechanics but to the OP's quotation from the article too:

People often intuitively think of probabilities as a fact about the world, when in reality probabilities are a fact about our model of the world.


Obligatory quote: "Essentially, all models are wrong, but some are useful"

- George Box


The odds of nobody understanding probability is near-zero...


I figured there was a good chance someone would say that.


This may arguably be one of the few probabilities in the natural world which does == 0.


This essay by Yudkowsky is also helpful.

http://yudkowsky.net/rational/bayes


> However, the answer is not, in fact, 1/3. Why is this?

This seems like a canard to me.

Here is my defense of 1/3 as a correct answer: http://gist.github.com/578386

> Is Bayes’ theorem wrong?

> No, the answer comes from an unfortunate namespace collision in the word “given”. The man “gave” us the information that he has at least one male child. By this we mean that he asserted the statement “I have at least one male child.” Now our issue is when we confuse this with being “given” that the man has at least one male child, in the sense that we should restrict to the set of universes in which the man has at least one male child. This is a very different statement than the previous one. For instance, it rules out universes where the man has two girls, but is lying to us.

No, we are assuming that the givens are facts that are true.

> Even if we decide to ignore the possibility that the man is lying, we should note that most universes where the man has at least one son don’t even involve him informing us of this fact, and so it may be the case that proportionally more universes where the man has two boys involve him telling us “I have at least one male child”, relative to the proportion of such universes where the man has one boy and one girl. In this case the probability that he has two boys would end up being greater than 1/3.

No, we don't have to consider universes where the man has at least one male child but does not inform of us of this fact. We have a set of givens that are assumed to be true, and based on those givens and the rules of logic, we can make justifiable statements of probabilities.


You unfortunately fall right into the trap that he's warning about: just because you can enumerate the possible outcomes doesn't mean that they are all equally likely (cf: the Monty Hall problem).

The conclusion that BB/BG/GB are all equally likely follows from the assumption that the man would definitely state that he has a male child if and only if one of those conditions were true. But what if instead we add the fact that the man would only say that he has a son if he has no daughters? This isn't contradictory with anything else in the problem, but now the answer is clearly "P(BB) = 1", which makes it hard to state that certainly "P(BB) = 1/3" when there is a consistent interpretation of the problem that gives a different answer.

You should reread the way he set up the problem again: he makes the subtle distinction between things that are given to us by the omniscient problem writer and things that are given to us by characters inside the problem, for whom we have to apply Bayes Law an additional time.


I think you are confusing the likelihood of something happening with the estimate based on a given set of facts. The man doesn't have quantum children that are 1/3 boy or girl. So trivially P(the gender of the man's children is the gender of the man's children) = 1. They don't change gender depending on your guess. The only thing that changes is the trueness of your guess of the other child's gender (it changes when you change the guess)

The way to get clear hear is to simulate meeting the man 1000 times, and counting how many times you would have guessed right if you said he has two boys. This yields 1/3 because you don't even guess when he has 2 girls.


It's a canard, but still informative. I do wish he hadn't claimed that 1/3rd is wrong by Bayesian statistics because the frequentist approach, with the same interpretation of the problem, yields exactly the same results.

It's still a valuable example of how to represent unreliable measurement processes in your model, and the importance of doing so.


1/3 isn't correct, and I find the OP's explanation to be overly complex.

The set of possibilities for two genders of two children is GG, GB, and BB. In your possibilities, GB and BG are exactly the same set (order doesn't matter in a set, only membership), so you don't have 4 possibilities, you have 3 total. Since the guy asserted that one of them is a boy, you can rule out the GG possibility. This leaves only GB and BB as possible results, both of which have a 1/2 chance of being the correct one. The guy never makes a claim that the first child or the second child is the boy (but, this doesn't change the possibility that he has two boys, it just changes which one you remove from the possibilities based on the provided information).

I'm not sure that it's that people don't understand statistics (although I'm not in a position to confirm or deny that), it's that people don't understand set theory. At least if you're going to use this "genders of two children" as example.


> The set of possibilities for two genders of two children is GG, GB, and BB.

Yes, but there are two equally probable paths to arrive at (set-theoretic) GB. Each of these paths is equally probably to the remaining paths, BB and GG. There are 4 possible paths, and (set-theoretic) GB is the result of 2 of them.

Your application of set theory is inappropriate given 2 independent events.


The question isn't what is the probability of any one of the paths, the question is about the probability of the final result.

A fork in the road that joins up again gives each fork equal probability of reaching the destination.


You can't understand the final result in isolation. Out of all men with two children, the probability that they will have two daughters is 1/4, the probability that they will have two sons is 1/4. This leaves heterogeneous offspring occurring 1/2 the time.

If the man is able to make the statement "I have two kids, at least one's a boy", this puts him among the 3/4 of all men with one or two sons. The probability of a man with two sons cannot jump from 1/4 to 3/8 (half of 3/4), as you assert earlier.

It's unintuitive, but it's more obvious when you negate the statement: "I have two children, but I do not have two daughters."


Perhaps this would offer an equally appealing explanation: if the man said, "I have two children but at least my firstborn is not a girl..." then the intuitively appealing response of 1/2 becomes roughly correct.


But we're not trying to find out if this man falls into the 3/4 of all men with one or two sons, we're trying to determine if this man's two children are both boys when we know that one of them is.


1/3 would be correct if you set aside the issues about 'given' information addressed in the post. And I think the post has a valid point, that we have to be careful to appropriately understand biases and uncertainties in our data sources. But your reading of it, I think, demonstrates the real weakness of this part of the OP's discussion, namely that he's started with, and trying to expand from, a basic, 'classic' problem that people notoriously have difficulty agreeing on (e.g. http://en.wikipedia.org/wiki/Maryln_vos_Savant#.22Two_boys.2...). So as a pedagogical exercise, or as an attempt to get his point across, he has to trip over all the misunderstandings that cause people to think in the simple, classic case, that the answer is 1/2.

I think he would have been better off expanding the classic coin example, only changing the story such that a naive research assistant is tasked with flipping the coin n times and bringing us the results. Then we could start by finding the probability that the coin is biased taking the research assistant at his word, and then we could reason about the probability that the research assistant either doctored the sequence when he felt they weren't 'random' enough, or the probability that he just pulled an 'HTTH' sequence out of his head without ever touching a coin.


http://en.wikipedia.org/wiki/Boy_or_Girl_paradox has interesting content in the sections Second Question and Ambiguous Problem Statements.

I'll accept that I'm wrong and that the answer is 1/3 and that I just don't understand it and that I'm supporting the point that "no one understands probability, especially me", but there has been little explanation why the birth order is even a factor. Even that Wikipedia entry labels the four possibilities with "older child" and "younger child", which seems to be extra, unneeded information when the term "at least" is used to describe the number of boys independent of order.


Birth order per se doesn't matter; probability mass does. Birth order is an easy way to show that you're twice as likely to get a boy and a girl as you are to get two boys. But you can ignore birth order and say instead "let k be a binomially distributed random variable with n=2 and p=1/2. If you know k>=1 what is p(k=2)?". Then the answer is a straight comparison of p(k=2) to p(k=1), where we note that '2 choose 1' is 2, and '2 choose 2' is 1. Note that ordering per se isn't entering into this, but to someone without probability background, it's a bit obtuse. Describing the probability space in terms of birth order is not strictly necessary but helps a lot of people grasp the concept.


i think it's like this.... interpretation 1.. we select a random FAMILY from the set of all families with two children.one of the children happens to be a boy. {BB, BG, GB} are possibilities. P(BB) = 1/3

interpretation 2.. we select a random CHILD from the set of all families with two children. one of the children happens to be a boy. in this case we have the following possible combinations [B1B2, G1B3, B4G2]. we know it's a boy, so what's the probability that B1 or B2 was selected? 1/4+1/4 = 1/2.


Think of it in terms of a set of tuples and not a set.

{(B,G), (G,B) , (B,B) , (G,G)}


You picked that 0.5 out of thin air, though. You forgot about BG.


I didn't "forget" about it, I contend that case is already handled by GB; "one child is a boy and one child is a girl" is the same as "one child is a girl and one child is a boy" and can be expressed as either GB or BG, making GB and BG equivalent.


GB and BG take up more area in probability space.

If you have a thousand paths in front of you, one leading to a fortune, one to a potion and the rest to a pit of death, you can't say the 998 paths are together equal to the other two just because they all end at the same place. It's much more likely you'll hit the pit of death.

It's much more likely to have boy/girl children even though BG and GB look the same at the end. You get as much probability current down each of the wires BG and GB you do down the GG and BB wires, making twice as much in total for BG/GB combined.

If there are four buildings, red, green, gray and gray, and you pick one at random you are more likely to end up in a gray building. You dont get to say there are three colours because I can class the two gray buildings as the same the chances are 1/3rd for each colour.


If there are four buildings, red, green, gray and gray, and you pick one at random you are more likely to end up in a gray building.

I guess that's where my confusion lies. I see that the chances of picking a grey building the above is 1/2, but I assert that the set of possibilities for the children is "two girls", "different genders" and "two boys". But yours is a good explanation, I see that that "different genders" state has a double chance of occurring compared to the other two states.

The Wikipedia article I cited above is interesting in that in one explanation, it bases the different results on interpretation of the givens. I see the problem stated as about a specific man's state, not the chances that a randomly chosen man/family with two children has two boys. In my case, this was a misdirection in trying to understand this.


No, choosing to not care for order changes the way probabilities have to be calculated. I suppose there was enough discussion about that on this thread already.

I guess it is helpful to think about two coin flips instead of boys and girls.

Not sure how it is being taught in the US. In Germany we use urn models and learn formulas for drawing balls from urns in different variations. Like returning the balls or not returning the balls, or caring for order or not. All these cases have different formulas attached...


I thought on this a bit. And I do believe you are correct. Because as you note, he did in fact condition the space such that 1/3 is the correct answer and is stretching.

He goes on to say "Now this means that if we want to claim that the probability that the man has two boys is , what we are really claiming is that he is equally likely to inform us that he has at least one boy, in all situations where it is true, independent of the actual gender distribution of his children. I would argue that this is quite unlikely, as if he has a boy and a girl, then he could equally well have told us that he has at least one girl, whereas he couldn’t tell us that if he has only boys".

I see this point but I think this example does not lend itself to explaining the notion of defining a good prior because it is such a simple scenario - it biases responses to focus on the absurdity and unnecessarily fretful complexification. And it opens the door to all sorts of subjective objections so that nothing can actually be said. For example I can say that given this guy is talking in riddles he likes being vague and will always give me just enough true information to keep things sane but interesting. Or given he is being wishy washy then he is a deviant trying to mess up my day, very likely he has 3+ kids. A more muddled scenario would have been better for similar reasons to why humans are the hardest to animate 'correctly' - no preconceptions to distract from the presentation.

Otherwise the maths is straight forward, we have been given a space to condition on: he has two kids and one of them is a boy. P( (B,B)) = 1/3 since (G,G) is eliminated. Giving a probability of 1/2 makes sense only when we are not given the key piece of information of at least one is a boy. The information we get tells us how we can construct our probability (sub)space.

Consider for example I met this strange man for the first time and see that he has a son and for some reason I know he has two kids. Then the probability of two sons is 1/2. This can be seen as the fact that knowing the gender of the kid I saw says nothing of the other kid's. If we whip out some probability spaces this can be seen by thinking in terms of {(M,F), (F,M), (F,F), (M,M)} X {1,2}. Where (M,F,1) is I met the older kid who is his son first. Or (M,M,2) is I met the younger son first.

Some time ago I wrote a simple finite probability space toy in F# to help in building probability intuitions by using simple spaces. Ill use it here because I think code can be clear.

    let Z = cartesg ["MF";"MF"; "12"]
    let S = definePSpaceSimp Z
    let pboy = conditional  (fun (x:string) -> x.[0] = 'M') S

    let p2b =(conditional (fun (x:string) -> x.Contains("MM")) pboy).Space |> pboy.Pr

    >
    val S : ProbabilitySpace<string> =
      {Space = set ["FF1"; "FF2"; "FM1"; "FM2"; "MF1"; "MF2"; "MM1"; "MM2"];
       Pf = <fun:definePSpaceSimp@43>;
       Pr = <fun:definePSpace@41>;}
    val pboy : ProbabilitySpace<string> =
      {Space = set ["MF1"; "MF2"; "MM1"; "MM2"];
       Pf = <fun:filtrSpace@30>;
       Pr = <fun:definePSpace@41>;}
    val p2b : PositiveReals = 0.5
Thus we see the probability of the compound event: 2 boys given seen 1 boy is 0.5. This is because based on the information I got, the compound event I can define is very different from being given information - 2 kids, at least one boy: {(M,F), (M,M), (F,M)}

>EDIT: I find it interesting that a mathematical argument is downvoted. I would appreciate any flaws to be pointed out along with a downvote so I might correct my reasoning. Currently, I do not think I made any mistakes.


Can't read your code, but aren't (F,M,1) and (M,F,2) impossible? Not sure what exactly you are calculating.


(F,M,1) means met daughter (who is older) first. (M, F, 2) is met son who is younger than his sister first. These is the space of possible ways to meet his two kids ordered by age.


I see - I thought the first child you meet is always a boy in your example. So I am still not sure what you are calculating, but at least I understand the tuple notation :-)


I was trying to make a point that two problems that look the same actually lead to different results based on what information you are given. For example, being told someone has 2 kids and at least one is a girl then the space for this is S = {(M,F), (F,M), (F,F)}. However if I were to run into someone with two kids and saw a daughter then the information I have doesn't allow me to construct the space S above if I wanted to find the chance of say, 2 girls while accounting for age.

Instead I must consider S X {1,2} because instead of knowing that there is at least one daughter I just know that I have met one daughter whose age (whether older) is unknown to me. I know nothing of her sibling. So I am calculating the possible ways I could have met her and then the chance of 2 daughters conditioned on that space. This is a very tricky differentiation. Distinctions like this and thinking about what to condition on is what makes probability so tricky.


I must admit, I don't understand the distinction. What does age have to do with it? The information seems to be the same in both cases: a man with two kids, at least one of them a girl.

Of course thinking about age is a legit way to calculate the probability of the second kid being a girl. It is just unnecessarily complicated.


The distinction is in how the information was presented and there is a difference (let us ignore the ages and assume i met the younger one then seeing as I met one girl my space is {(F,F), (F,M)} vs {(F,F), (F,M), (M,F)} for at least one girl while accounting for age). For the second age does not matter so much as who is older and accounting for all combinations is why I present it that way. The distinction is tricky but problems like these are often covered in the conditional probabilities section of most probability theory texts.


If you're lucky enough to attend UIUC, I highly recommend taking ECE 413 to get a thorough introduction to these concepts. It's unfortunate that none of the class materials are online since it goes well above and beyond what is taught in most undergraduate CS courses on statistics. Taking it was hard, but it made me a much better engineer.

Edit: I suppose this applies to anybody in college. Take the hard statistics course that goes over this stuff. It's really valuable, and pretty hard to pick up on your own.


A more thorough introduction to this topic is "Probability Theory: The Logic of Science" by E. T. Jaynes (http://www-biba.inrialpes.fr/Jaynes/prob.html).


This article is not particularly clear. It doesn't have a clear discussion of Bayesian versus frequentist interpretations of probability or inferential statements that are conditioned on the unobserved true parameter versus the observed data. It's hard to understand the subtlety of probability without understanding p(theta), p(x), p(theta|x) and p(x|theta).


I've read several pieces on Bayesian stats, and I've done some nontrivial statistics before. It still confuses me that p(data) != 1. I kinda wish the author had gone into detail about how to calculate the probability of an already-observed event.


You're confusing p(data) with p(data|data) which is, trivially, equal to 1.

p(data) is better formulated as p(data|F) where F codifies your assumptions about the possible generative probability models that you're building your likelihood function from. Or, similarly, F codifies your understanding of the world and the possible things that could occur within it.

This makes p(data|F) a perfect normalizing constant for the numerator of Bayes' Theorem since the numerator implies a choice of a specific model in the family F, but p(data|F) averages over all possible models/worlds/parameter choices (contained in F).


It sounds like he didn't get to it before running out of steam. Which is unfortunate, because that's really the reason I kept reading.


'Let’s consider an example. Suppose that a man comes up to you and says "I have two children. At least one of them is a boy." What is the probability that they are both boys?'.

Am I missing something or in his attempt to solve the problem, does he implicitly assume statistical dependence?

If statistical independence is assumed, with P(Boy) = P(Girl) = 1/2, then the answer to the problem is very simple. P(Boy | Boy) = P(Boy) = 1/2.

Maybe I just don't understand probability :(


The basic formation (which the author argues is not subtle enough to be true) is better thought of step by step.

Suppose a man comes up to you and says "I have two children"

At this point you build a set of possible realities, your model. There are four possibilities: {BB, BG, GB, GG}. This space fully describes a model whereupon there are two distinct, children with genders. Additionally, via assumption of independence and equal likelihood, you can assign probabilities to each observation, {BB:1/4, BG:1/4, GB:1/4, GG:1/4}.

"At least one of them is a boy."

At this point, you update your realities by removing the one firmly contradicted by the new evidence. Your new space is {BB, BG, GB} and when you renormalize the probabilities you get {BB:1/3, BG:1/3, GB:1/3} which leads to the idea that the probability at this point that the man has two boys is 1/3rd.

The author suggests however that during that second step, you should also take into account the possibility that this guy is lying or that the fact that he's proffering this information actually changes the likelihoods of those four scenarios in a way different from just multiplying one of them by 0. So perhaps the likelihood of hearing "At least one of them is a boy" is reflected like this:

{BB:0.35, BG:0.32, GB:0.32, GG:0.01}

And your new belief in each of these realities reflects that like so (renormalized)

{BB: 0.35, BG:0.32, GB:0.32, 0.01}

So now I feel even more confident that he has two boys.


Thanks for your explanation. I think his first renormalization process is wrong. Because once we know there is a boy, the problem space is reduced do "What's the probability of a boy?" which is 1/2. It has nothing to do with probabilities involving the known child.


I think the point is that it depends on what prompt the man is implicitly answering.

If you ask parents of two children if they have a boy 3/4 of them will say they do, but if you ask the same parents to randomly name the gender of one of their children 1/2 of them will say they have a boy. Those two different scenarios lead to the 1/3 and 1/2 odds respectively.


The problem space is not reduced to the gender of the unspecified child. The important distinction is that "One of my children is a boy" is a statement about both children, not just one of them. Compare that statement to "My first-born child is a boy," and it may make more sense.


I appreciate your reply. But I'm still not clear on it. Because of this: "At least one of them is a boy." As I see it, this statement contains the following pieces of information: 1. There are two children. 2. One of them is a boy. The question is... what's the probability of there being two boys? Considering the information we've got, there are two possible scenarios remaining 1. [B, B] 2. [B, G]. So we have P = 0.5. No?


Three scenarios remain: [B,B], [B,G], [G,B], so the answer is 1/3.

Maybe you were thinking that the order doesn't matter. In that case, what was the probability of getting a boy and a girl, in any order? 1/4 + 1/4 = 1/2. So that's still twice as likely as getting two boys, and that ratio (2:1) will still hold after eliminating [G,G]. You again get 1/3.

Some people find it easier to picture it in terms of frequencies. Imagine 1000 families. What fraction of them have two boys, among those that have at least one boy?


ced, yes i was thinking that the order doesn't matter. I think this is what it comes down to. Do you think that order matters? If so, why?

yes i do find it easier to picture in terms of frequencies. in this case, take 1000 families which fulfill the criteria of "2 children with at least 1 boy". what is the probability that a family will have 2 boys? we have not sampled randomly. we have sampled according to the "2 children with at least 1 boy" criteria. we are not dealing with two random variables. one variable is fixed and we sampled according to it. now we are working with one independent random variable within that sample. that random variable has P = 1/2.

is there a flaw in my logic? if there is, please highlight it. i think the main confusion is: 1. we have sampled according to particular criteria. 2. we need to calculate a probability within that sample. NOT the population that sample was taken from.


In a sampling of 1000 families, the expected values of each kind of family is as follows:

  2xB : 250
  1xB, 1xG: 500
  2xG : 250
Sampling this population ignoring any family that has no boys leads to the probabilities

  2xB : 1/3
  1xB, 1xG: 2/3rds
You're still looking at the same probabilities; the models agree.

I don't fully understand your two random variables formulation. I think the confusion you're getting at is that there is an assumption that the chance of any given birth being male is theta = 0.5. The question however is not

"I have two children, at least one is a boy, what is the probability that my next child is a boy?"

It instead has to do with binomial probabilities on the space of a few repeated trials under parameter theta. The distribution is no longer flat.

Here's a more stark example of a similar form.

"I have 300 children, and at least 1 is a boy. What are the odds that I have no girls?"


ok, you stated this very well. It's now clear where the confusions arises: "Sampling this population ignoring any family that has no boys." Yes, with this interpretation the answer is 1/3, but it's contrary to my interpretation.

Actually, for anyone who is interested, see "Boy or Girl paradox". There is literature on this which discusses the different interpretations.


That sampling arises because being part of the population which has no boys is * necessary and sufficient* to (truthfully) make the statement that forms the paradox.

The alternative interpretation of the paradox arises when the wording of the paradox is construed to identify one of the children as male or female. In this case (stating something like "my first child is male"), being part of the population (x \in {BB, BG}) is necessary and sufficient and leads to the 1/2 probability of having two boys.

In short, the question becomes whether you believe the child is identified in the wording of the question. Honestly, the author of the paradox goes pretty far out of their way to say "at least one of the children is male" avoiding that identification.


since we are talking about probability-theory, thought folks here might find gnedenko pretty interesting: [ http://www-history.mcs.st-andrews.ac.uk/Biographies/Gnedenko... ]


"Nobody"? That's not likely...


Never use absolutes?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: