Hacker News new | past | comments | ask | show | jobs | submit login
Han Solo and Bayesian Priors (statslife.org.uk)
76 points by rck on March 28, 2015 | hide | past | favorite | 17 comments



You also need to consider the survivorship bias of a story from a long time ago in a galaxy far, far away.

All the other potential stories where the protagonist predictably wins the Darwin Award would be too uninteresting to reach us over such an immense time and distance.


C3PO's estimation would have been a lot more useful if he'd calculated their odds.

Also his calculation suggests there's a strong open data regime in the empire, which is a nice thought.


The odds may have been based on simulation rather than "real data"


Ah but in that case I find it hard to believe a robot wouldn't take piloting skills into account.

For example when wind energy developers study wildlife impact they tend to include avian "avoidance behavior" in modeling risk of collision.


I like the concept, it's a cool way of introducing Bayesian probabilities!

The conclusion strikes me as overly confident, though. It implies that if we had 100 Han Solo's, we are very confident that about 3/4 of them are going to make it through. This comes about because it uses this:

We're going to say that C3PO has records of two people surviving and 7,440 people ending their trip through the asteroid field in a glorious explosion.

as data for Han's odds of making it. But 100 out-of-shape people dying on the ascent of Mount Everest does not tell us anything about the odds for someone who is very fit.

I would rather model that there is no one true probability of making it through - it's going to be dependent on the pilot. Mediocre pilots might have odds in the neighborhoud of 1/3720, but there is presumably a lot of variance depending on skill, and my prior belief is that Han would wind up in the upper end of the distribution.


How is your model different from the blog post? Yours sounds like a paraphrase.

Do you disagree that Han is more likely to survive more difficult challenges than easier ones?


The blog post has the right idea, but the wrong equations. If we take C3PO's statement to mean:

We're going to say that C3PO has records of two people surviving and 7,440 people ending their trip through the asteroid field in a glorious explosion.

Then the big question is how skill-dependent the challenge is. Is it like swimming across the English channel, or is it random like surviving the Niagara Falls in a barrel? There exists a distribution of survival rates as a function of skill level, we just don't know what it is. We could assume a sigmoid form. C3PO's statement gives us some constraint on what the parameters of the sigmoid are, but not enough. We could make some further assumptions and get an answer, but the results will depend strongly on the specific assumptions. In other words: we don't know the answer. https://xkcd.com/384/


I often find myself wondering if the major problem with any given statistical analysis is whether Bayesian inference ought to have been used at all.


The 75% result (chance to survive) seems about right, but the confidence in this result seems completely unnatural. The sharp peak in the posterior plot suggests that I have nailed down the exact probability to +/- 2%. No way.


Thought experiment: imagine there is a second robot, C4P1, on one of the TIE fighters. He independently comes up with an estimate of 10,000:1 for Han's death. Are we now going to slide our estimate all the way down? Or, what if C3P0's line was revised to say 1,000,000:1? Would the author now say, 'whelp, I was pretty confident at 20,000:1, but now that C3PO's number is so much bigger than mine, I guess Han's gonna die'?


I think this is the kind of analysis you need instead:

http://tvtropes.org/pmwiki/pmwiki.php/Main/PlotArmor



Does the author have something backwards or do I?

P(RateOfSuccess|Successes) = Beta(α,β)

"In Bayesian terms, C3PO's estimate of the true rate of success given observed data is referred to as the likelihood."

BUT we know likelihood(rateofsuccess|data) = probability(data|rate of success).

I am confused.


"Likelihood" can refer to both p(param|data) and p(data|param). See [1] for an example of the first, and [2] for the second.

[1] http://en.wikipedia.org/wiki/Likelihood_function

[2] http://en.wikipedia.org/wiki/Bayes%27_theorem#Events


One factor seems missing: the deviation of Han's skills from that of the average pilot. Does C3PO take it into account in his likelihood? Do we take it into account in our probability of success? Although I do get the point, it seems there's often something of that sort, a simplification, in that kind of bayesian reasoning.


Someone needs to do this with Game of Thrones characters. Everything I thought about important characters' survival chances turned upside down as I watched it.





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: