Philosophy and the practice of Bayesian statistics [pdf]

chimeracoder · on March 27, 2013

I never thought I'd see a 31-page paper by Andrew Gelman on the front page of Hacker News. And certainly not a paper coauthored with a well-known frequentist!

I was lucky enough to work with Prof. Gelman as his research assistant while I was in school - I can't even being to tell you how prolific and brilliant that man is. His name may not be known very much outside academic circles, but I'd go as far as to say that he's the most important Bayesian statistician since Thomas Bayes.

He used to be a contributor to FiveThirtyEight, back before the Times picked it up. I used to explain FiveThirtyEight as 'one of the six blogs Andrew Gelman writes for'. Now, I explain Andrew Gelman as 'a former contributor to Nate Silver's blog'. How times have changed!

Gelman's approach to statistics is more wholly Bayesian than most people with a moderate level of statistical training are likely familiar with. It was from Gelman that I learned why I never need to perform an F-test[0]; at the same time, it was from Gelman that I learned some of the potential pitfalls of pure Bayesian reasoning[1] (and how to address them).

When people ask me where to get started with statistics, both of the books I recommend are Gelman's: Teaching Statistics: A Bag of Tricks and Data Analysis Using Regression and Multilevel/Hierarchical Models.

Both have tremendously off-putting titles, but they're actually incredibly accessible. Gelman is great at many things, but picking sexy titles is not one.

If you're interested in understanding the concepts behind this paper, I'd start there.

[0] http://andrewgelman.com/2009/05/18/noooooooooooooo/

[1] The linked paper provides a good analysis

realitygrill · on March 28, 2013

Shalizi is no slouch either - his notebooks are fascinating.

http://vserver1.cscs.lsa.umich.edu/~crshalizi/notabene/

darkmethod · on March 28, 2013

Thank you so much for sharing this.

Tidbits of resources like this is _the_ reason I frequent Hacker News.

realitygrill · on March 28, 2013

You're welcome. You might try his blog, too - his views on subjects like the "wisdom of crowds" and IQ are intellectual fun.

http://vserver1.cscs.lsa.umich.edu/~crshalizi/weblog/315.htm...

christopheraden · on March 28, 2013

Gelman really is an unbelievable Bayesian. I don't think I'd say the most important since Bayes himself (Jeffreys and Jaynes strike me as crucial to the Bayes movement, along with the computational folks, like Geman&Geman and Metropolis&Hastings), but certainly one of the most important living Bayesians.

Gelman's (and Rubin, Carlin, Stern) Bayesian Data Analysis is by and large the most useful Bayes book out there. I keep it around just for reference sometimes, because of how detailed it is.

zenburnmyface · on March 27, 2013

If you are interested in the practical practice of Bayesian methods (and you love Python), check out our open-source project/book Bayesian Methods for Hackers:

https://github.com/CamDavidsonPilon/Probabilistic-Programmin...

We aim to empower the non-mathematician with really cool tools and methods to solve otherwise very difficult problems. Plus it's all opensource, and every plot/diagram is reproducible and extendable.

tmarthal · on March 27, 2013

As an aside, I just want to thank you for making the project/book text available as iPython notebooks. I haven't seen a mathematical writeup as beautiful and interactive as the chapters that you've put out. I've only had time to go through a couple of them, but it really is a treat.

Also, I've learned so much more about how people use python to do analysis and all sorts of other things through ipynb files than reviewing traditional python libraries/code. I wish more people would publish using them.

bobwaycott · on March 28, 2013

Wow, this is fantastic! Thank you for one of the sweetest projects I've seen this year.

anymane · on March 27, 2013

Your book seems very interesting. I await the other chapters eagerly

pseut · on March 27, 2013

Haven't had time to read the whole article yet, but these two paragraphs from the conclusion (p. 24-25) are excellent:

"In our hypothetico-deductive view of data analysis, we build a statistical model out of available parts and drive it as far as it can take us, and then a little farther. When the model breaks down, we dissect it and ﬁgure out what went wrong. For Bayesian models, the most useful way of ﬁguring out how the model breaks down is through posterior predictive checks, creating simulations of the data and comparing them to the actual data. The comparison can often be done visually; see Gelman et al. (2004, Chapter 6) for a range of examples. Once we have an idea about where the problem lies, we can tinker with the model, or perhaps try a radically new design. Either way, we are using deductive reasoning as a tool to get the most out of a model, and we test the model – it is falsifiable, and when it is consequentially falsified, we alter or abandon it. None of this is especially subjective, or at least no more so than any other kind of scientific inquiry, which likewise requires choices as to the problem to study, the data to use, the models to employ, etc. – but these choices are by no means arbitrary whims, uncontrolled by objective conditions.

"Conversely, a problem with the inductive philosophy of Bayesian statistics – in which science ‘learns’ by updating the probabilities that various competing models are true – is that it assumes that the true model (or, at least, the models among which we will choose or over which we will average) is one of the possibilities being considered. This does not fit our own experiences of learning by finding that a model does not fit and needing to expand beyond the existing class of models to ﬁx the problem."

And section 4, which discusses issues that arise in Bayesian statistics when working with multiple candidate models, is interesting and agrees with my limited experience, especially 4.3: "Why not just compare the posterior probabilities of different models?"

ps (to the submitter), it might be helpful when submitting a 30 page paper to mention what part of the paper you'd like to discuss. It makes it easier to get started.

tunesmith · on March 27, 2013

As someone who didn't study statistics in college, a paper like this is right in that uncomfortable no-man's land between what I understand and what I'm interested in - it seems to tie into several subjects I have layman's interest in.

For instance, there is the controversy over how useful models are - are they worthwhile goals we can actually draw conclusions from, or are they simply shortcuts on our way to a more reductionist understanding of a phenomena? Is "emergence" a meaningful concept or an empty concept? Is "systems thinking" a valid concept or just a lack of discipline in the effort to understanding things in a reductionist manner?

People here seem to hate Stephen Wolfram but his writing has made some concepts approachable to me that I might not have grasped otherwise - for instance, that computational irreducibility means that even if the world is entirely reductionist, it still doesn't mean that we can deduce the reductionist reality/inputs from an output. And so therefore, models are useful even though they are wrong. This is also a point that Paul Krugman often makes about economic models - people who disregard models on the grounds that they are wrong just don't grasp the value of them, he argues.

Most of what I've learned about Bayesianism is what I've read from the first few articles over at lesswrong.com - but I noticed pretty early on that I had a discomfort in using probability as a description of what I believed to be true. It seems the general point of this paper is that Bayesianism is useful for deductive techniques - as a tool in a toolset to support a frequentist view? - but not so much as an expression of a subjectivist philosophy. I appreciated this point:

"Beyond the philosophical difficulties, there are technical problems with methods that purport to determine the posterior probability of models, most notably that in models with continuous parameters, aspects of the model that have essentially no effect on posterior inferences within a model can have huge effects on the comparison of posterior probability among models."

More generally the paper seems to be making the point that using the Bayesian philosophy to address models is improper in general since the premise of Bayesianism is to update beliefs based off of evidence/data, while we know that belief-in-a-model is pointless since models are wrong. But past that point I got pretty lost.

loup-vaillant · on March 28, 2013

> I noticed pretty early on that I had a discomfort in using probability as a description of what I believed to be true.

If it can ease your discomfort: what else? In colloquial language, we already do have pretty good descriptors of subjective beliefs, such as "I don't think so", "I'm damned sure", "maybe"… It is only natural to call the quantitative version of those "probabilities" —at least to me.

As for frequency properties of seemingly "random" phenomenons, they're a property of the real world, ready for the study. I'd have much more discomfort calling that "probabilities".

tunesmith · on March 28, 2013

I agree that colloquially speaking, it aligns most with what we believe. What I have trouble with is that you and I could assign different probabilities (via beliefs) to an external event that has only one probability of happening. Just in that sentence I used the word "probability" correctly both times, yet it has two different conflicting meanings.

loup-vaillant · on March 28, 2013

Then just use different words. Break it down to "degree of subjective belief" and "frequency property" if you have to.

Still, I prefer to use the short hand "probability" to mean the former, for two reasons: first, "degree of subjective belief" is what most lay-people will understand when we say "probability". Second, even scientists have this meaning messing up with their intuitions, even if when they write papers, they really do mean "frequency property". That can be a major obstacle for science. Let me quote Edwin T. Jaynes (long, but worth it):

Those who cling to a belief in the existence of "physical probabilities" may react to the above arguments by pointing to quantum theory, in which physical probabilities appear to express the most fundamental laws of physics. Therefore let us explain why this is another case of circular reasoning. We need to understand that present quantum theory uses entirely different standards of logic than does the rest of science.

In biology or medicine, if we note that an effect E (for example, muscle contraction, phototropism, digestion of protein) does not occur unless a condition C (nerve impulse, light, pepsin) is present, it seems natural to infer that C is a necessary causative agent for E. Most of what is known in all elds of science has resulted from following up this kind of reasoning. But suppose that condition C does not always lead to eect E; what further inferences should a scientist draw? At this point the reasoning formats of biology and quantum theory diverge sharply.

In the biological sciences one takes it for granted that in addition to C there must be some other causative factor F, not yet identied. One searches for it, tracking down the assumed cause by a process of elimination of possibilities that is sometimes extremely tedious. But persistence pays off; over and over again medically important and intellectually impressive success has been achieved, the conjectured unknown causative factor being finally identified as a definite chemical compound. Most enzymes, vitamins, viruses, and other biologically active substances owe their discovery to this reasoning process.

In quantum theory, one does not reason in this way. Consider, for example, the photoelectric effect (we shine light on a metal surface and find that electrons are ejected from it). The experimental fact is that the electrons do not appear unless light is present. So light must be a causative factor. But light does not always produce ejected electrons; even though the light from a unimode laser is present with absolutely steady amplitude, the electrons appear only at particular times that are not determined by any known parameters of the light. Why then do we not draw the obvious inference, that in addition to the light there must be a second causative factor, still unidentified, and the physicist's job is to search for it?

What is done in quantum theory today is just the opposite; when no cause is apparent one simply postulates that no cause exists —ergo, the laws of physics are indeterministic and can be expressed only in probability form. The central dogma is that the light determines, not whether a photoelectron will appear, but only the probability that it will appear. The mathematical formalism of present quantum theory —incomplete in the same way that our present knowledge is incomplete— does not even provide the vocabulary in which one could ask a question about the real cause of an event.

Biologists have a mechanistic picture of the world because, being trained to believe in causes, they continue to use the full power of their brains to search for them —and so they find them. Quantum physicists have only probability laws because for two generations we have been indoctrinated not to believe in causes —and so we have stopped looking for them. Indeed, any attempt to search for the causes of microphenomena is met with scorn and a charge of professional incompetence and `obsolete mechanistic materialism'. Therefore, to explain the indeterminacy in current quantum theory we need not suppose there is any indeterminacy in Nature; the mental attitude of quantum physicists is already sufficient to guarantee it.

This one has been quite an eye opener, making me doubt even Many Worlds, which for one still doesn't explain the Born statistics. Still, thanks to Eliezer's Quantum Physics sequence, I'm now convinced that to the best of Science's knowledge (and despite what many physicists say) the laws of physics are most probably deterministic. Which would instantly solve the conflict by rendering "probability" nonsensical when applied to physical phenomena.

tunesmith · on March 29, 2013

That's a lot to unpack - let me make sure I have this right.

Quantum theorists see that light is necessary but not sufficient for some sort of electron behavior. And I'm guessing that it's very hard to find an additional contributory/necessary cause that, when combined with light, would reliably predict the behavior.

So they use probability to communicate their findings.

If that additional contributory/necessary cause exists (but is just really hard to discover, or perhaps light itself is just highly contributory but not necessary/sufficient) then the probability is an effective way to communicate degree-of-belief, and helps to combine information about what they know with what they don't know. In other words, they are using probability to communicate partial findings.

If that additional contributory/necessary cause doesn't exist, then the probability is a fixed, physical part of the science. In other words, they're communicating full findings; that light actually creates probability of an effect.

And that confusion about "probability" is making people believe that the second case is true when they haven't necessarily disproved the first case yet? Which then serves as a curiosity-stopper and makes the science "mysterious". And this messes with scientific intuitions.

Forgive me if I'm way off, I've only read the first few articles of the first couple lesswrong sequences.

Jach · on March 29, 2013

I thought I'd pop in and mention this set of lectures (now a book, huh) from Scott Aaronson on quantum mechanics and quantum computing: http://www.scottaaronson.com/democritus/ Particularly Lecture 9 goes into the basic mathematics of quantum mechanics, and explains why it's more appropriate to consider it an extension to probability theory using complex numbers and call them amplitudes, rather than classic probabilities. Not being a physicist myself I don't know if this is really helpful in clearing up professional scientists' intuitions, but it's helpful for me at least in grasping the basics.

You're on the right track by noting that quantum theorists can use probability to communicate their findings. A single classical probability like 0.7 is a summary, not the whole picture. For the whole picture, you need a log of observations with time stamps, which is a tremendous amount of data because taken to the extreme it's a balancing act between maximizing the data you know about you and your surroundings and minimizing the number of planck unit time steps to collect the data. Even with more realistic amounts of data it's still a lot to pack around, so you can summarize it by constructing a probability distribution which is more convenient mathematically, and you can summarize that probability distribution into a handful of numbers if you need to because those are even easier to pack around. (Like the single probability 0.7, or two numbers u and s that characterize a Normal distribution, etc.)

So if you think of probability as just summarizing your past observations and constraining your expectations for future observations (because what else can?), I think it's easy to see how two people can have different probabilities that represent a belief in a proposition or an abstract system or something else. If both are being honest (I'm packing a lot of assumptions in that word since this is basically Aumann's agreement theorem) and the probabilities are different, there are a couple reasons why: either at least one of them has information (observations) the other one does not, or they started out with different prior probabilities. In the former case, they can reach agreement by sharing data. In the latter case, they can eventually reach agreement by obtaining more data or choosing a common prior. With enough successive Bayesian updates, you can arrive at the same conclusion regardless of where you started. (In practice, of course, things are more hairy and not this simplistic, which is what the submitted pdf is addressing in part.)

I find it hard to grok what it means for something to have a physical frequency property. I can understand it as a statistic of an abstract system, i.e. a probability, but in light of Many Worlds, I don't think a frequency can fundamentally be a property of something in the same way energy can be. But since the energy of a photon is related to the photon's "frequency" via its "wavelength", is energy really fundamental or is the relation just a convenient means of making the wave mechanics useful? I've read Feynman tell how the whole idea of forces is just a somewhat circular simplification that's not really fundamental, even though sometimes it seems like it is. While I have a high degree of certainty about the physics accurately describing reality as she is, my meta-belief / confidence about that high certainty isn't nearly as high simply because I feel a lot of confusion about the ultimate nature of reality. But at least at a higher level than fundamental physics, it's clear to me that saying a "fair coin" has p=0.5 per unique side is really just summarizing we don't have enough knowledge of the flipper and the environment it is flipped in to predict the result with certainty in the way we can predict the result of the output of a logic OR gate with low- and high- voltage inputs with certainty. This is different than the uncertainty of which branch you are within a Many Worlds universe, where it's more of a physical resolution problem rather than one of needing more knowledge, similar to not being able to find out what happens between planck unit time steps (if anything happens and if the question means anything).

Ben Goertzel's book Probabilistic Logic Networks is another resource I'd recommend. Sorry if my branching reply is a bit big, but I'm constantly trying to clarify things for myself as well. ;)

loup-vaillant · on March 29, 2013

I have not read the Aaronson's book, but I'm already confident his use of the word "probability" is more confusing than helpful. Okay, the universe is made up of a wacky "configuration space" of "complex amplitudes" (in lay terms, a many-dimensional sea of complex numbers). And, there is a direct relationship between complex amplitudes and the degree of subjective belief of the physicist doing such and such experiment: take the norm of the amplitude, square it, compare that to the other relevant squared norms, and voilà, you have a probabilistic prediction for the result of your next experiment. (I sweep a lot under the carpet, but it really works like this.)

Something to know however: assuming you made no mistake, then the result of the first experiment tells you nothing about the result of any subsequent trials. Here's an example:

Let us send light through a half-sieved mirror. Have detectors on either side of the mirror to detect the light. Now, you can see that each detector detects half as much light as you originally sent (of course). Things gets interesting when you send photons one by one. The set up of the experiment is such that when you make your amplitude calculations, you find that the squared norm of the amplitudes at each detector is the same. Here, this means that when you send a single photon to the mirror, your should expect to see a single detector go off, but you have no idea which one (subjective belief of 1/2 for either side).

But there's more. Once you made the calculation, you know all you could possibly know about the initial conditions of the experiment (according to current physics). Imagine you sent a photon for the first time, and you see it going through the mirror. Will that change anything to your expectation on the second trial? Not one bit. You will still bet 1/2 chances on either side. So this experiment is the perfect coin toss.

Or is it?

Imagine you send twenty photons, and you see they all made it past the mirror. Initially that's one chance in a million. In that case, you should be strongly tempted to expect the 21th to go through as well. The question is why?

If you are absolutely sure you set up the experiment right, then you shouldn't change your mind, and still bet 1/2 on either side. But are you so sure of your accuracy that you could set up several millions experiments, and get all of them right? Probably not. The more likely explanation is that your so-called "half sieved mirror" is just a piece of glass. Maybe you were tired, and piked up the wrong one. Or maybe some graduate student played a joke on you.

Conversely, an actual coin toss could be perfectly fair: there's a lot going on in a coin toss, and depending on the chosen procedure, you can't expect to have sufficient knowledge of the initial conditions to predict anything but 1/2 for heads, 1/2 for tails, even after seeing the the coin landing tails three times in a row. There is, even then, some kind of "frequency property" in the experiment. It's not because the laws of coin tossing are somehow indeterministic, but because the way the coin is tossed systematically hides relevant information about the exact initial conditions.

(The formal Aumann agreement theorem is a bit stronger than that: two perfects Bayesians cannot have common knowledge of any disagreement. In practice, this means you can't start from different priors, because a prior embody background information, which can be shared. I know that there is controversy about "fully uninformed" priors, but in practice, they don't change posterior probabilities very much.)

tunesmith · on March 29, 2013

As an outgrowth of this discussion I tried submitting a question to Ask HN but it went over the character limit, so for want of a better place to put it, maybe someone will react here - anyone want to weigh in on my various interpretations of truth and falsehood?

   +------------+------------------+--------------------+-------------------+---------------+
   |    T/F     |     Fuzzy        |     Frequentist    |     Bayesian      |   Bayesian    |
   |            |                  |                    |   Subjectivist    |  Objectivist  |
   +----------------------------------------------------------------------------------------+
   |    0/0     | Ambiguous/Vague. |    I am ignorant.  |  I am uncertain.  | No one can be |
   |            | I am apathetic.  |                    |                   | certain.      |
   +----------------------------------------------------------------------------------------+
   |    0/.5    |     N/A          | Don't bother me,   | I am uncertain    | There exists  |
   |            |                  | still testing my   | but it may be     | partial       |
   |            |                  | hypothesis.        | false (partial    | knowledge of  |
   |            |                  |                    | knowledge)        | falseness.    |
   +----------------------------------------------------------------------------------------+
   |    0/1     | Completely false | It never happens   | I am certain it   | Everyone      |
   |            |                  | in the physical    | will never happen | should be     |
   |            |                  | world.             |                   | certain it    |
   |            |                  |                    |                   | will never    |
   |            |                  |                    |                   | happen.       |
   +----------------------------------------------------------------------------------------+
   |    .5/0    |     N/A          | Don't bother me,   | I am uncertain    | There exists  |
   |            |                  | still testing my   | but it may be     | partial       |
   |            |                  | hypothesis.        | true (partial     | knowledge of  |
   |            |                  |                    | knowledge)        | truthiness.   |
   +----------------------------------------------------------------------------------------+
   |   .5/.5    | It is partly     | Intrinsically      | I am 50% certain  | All should be |
   |            | true and partly  | 50/50.  Given 100  | it is true, 50%   | 50% certain   |
   |            | false.  (partial | bottles, half are  | certain it is     | it's true;    |
   |            | truth)  The      | full.  Given 100   | false.  I am 50%  | 50% certain   |
   |            | bottle is half   | of her, half are   | sure the bottle   | it's false.   |
   |            | full.  She is    | pregnant.          | is full.  I am    | All should be |
   |            | partly pregnant. |                    | 50% sure she is   | 50% certain   |
   |            |                  |                    | pregnant.         | bottle is     |
   |            |                  |                    |                   | full.  There  |
   |            |                  |                    |                   | is 50%        |
   |            |                  |                    |                   | certainty she |
   |            |                  |                    |                   | is pregnant.  |
   +----------------------------------------------------------------------------------------+
   |    .5/1    |       N/A        | Don't bother me,   |       N/A         |     N/A       |
   |            |                  | my hypothesis is   |                   |               |
   |            |                  | broken.            |                   |               |
   +----------------------------------------------------------------------------------------+
   |     1/0    | Completely true  | It always happens  | I am certain it   | Everyone      |
   |            |                  | in the physical    | will always       | should be     |
   |            |                  | world.             | happen.           | certain it    |
   |            |                  |                    |                   | will always   |
   |            |                  |                    |                   | happen.       |
   +----------------------------------------------------------------------------------------+
   |    1/.5    |      N/A         | Don't bother me,   |       N/A         |     N/A       |
   |            |                  | my hypothesis is   |                   |               |
   |            |                  | broken.            |                   |               |
   +----------------------------------------------------------------------------------------+
   |     1/1    | Equally          | My hypothesis is   |       N/A         |     N/A       |
   |            | confident.       | meaningless.       |                   |               |
   |            | Torn.            |                    |                   |               |
   |            | Ambivalent.      |                    |                   |               |
   +----------------------------------------------------------------------------------------+

loup-vaillant · on March 29, 2013

I'm not sure what to make of this… More precisely, I'm not sure what your fractions on the left actually mean. Personally, I'm tempted to throw "undefined" at each row that does not sum to 1, leaving only "1/0", ".5/.5", and "0/1".

I haven't heard about any distinction between "subjectivists" and "objectivists" Bayesians. I'd say my own position is a little bit of both:

First, background knowledge fully constraint the world view. That means that two rational agents whose knowledges are the same must hold the same beliefs. More practically, two persons having the same relevant knowledge about any given question should believe the same answer (the difficulty here is to sort out what's relevant). Second, agents generally don't have the same background knowledge. Nothing stops us for knowing different relevant information about any given subject. So of course we won't hold the same beliefs.

Fuzzy. Ah, fuzzy… the more I look at it, the less sense it makes. Strictly speaking, there is no such thing as "half truth". A proposition is binary: either true or false. Beliefs on the other hand are not binary: a belief is a proposition plus a probability. If I say the bottle is more than 75% full with probability 84%, then I'm 16% right if it really is only half full. Quite wrong, but not hopelessly so.

With probability functions and utility function, we can make the same set of decisions we could have made with fuzzy logic, which looks like it mixes the two. I personally prefer the modular approach.

tunesmith · on March 30, 2013

The fractions are supposed to represent partial degrees of truth. 0.5 is just an example, could be 0.4, 0.6 - anything other than absolute true or false. There are examples of probability systems that try to factor in uncertainty, beyond truth belief or false belief. I think it was possibility theory that was talking about a 0.5 value assigned to "the cat is alive", a 0.2 value assigned to "the cat is dead", and a 0.3 uncertainty - then it would be 0.8 plausibility/possibility of alive, but only 0.5 probability.

The main distinction I saw about subjectivist vs objectivist is that the objectivist still communicates in terms of degree of belief, but the belief is that it is based on objective empirical data and so therefore it's not an expression of personal belief - more a measure of belief for the entire system or knowledge base. More relevant for things like machine learning rather than updating ones own personal beliefs. (Maybe this just assumes common priors.)

I've gone round and round on fuzzy because some people talk about fuzzy as if it is degrees-of-belief, like the bayesian interpretation. But I think it's different - I mean, if I'm partly correct on describing a concept, you wouldn't say that I am right 50 times out of 100, or that you're 50% certain I'm right, right? You'd say I'm half-right.

I'll have to think about probability combined with utility function. I'm not sure how to make it square with fuzzy math. It seems you'd be 16% right whether the bottle was half-full, quarter-full, or empty.

loup-vaillant · on March 30, 2013

Okay, so I got your fractions mostly right.

Well, when you use a computer to make a probabilistic calculation, you have at some point to feed it with the relevant information, or it won't know what to make of the data. The data alone is not enough. And if you are absolutely certain your program is (i) correct, and (ii) fed with all the relevant information you know about, then you should definitely believe the result. (Of course, this absolute certainty is not attainable, so any sufficiently surprising result should lead you to think that something went wrong.) Assuming common priors on the other hand, seems unreasonable, unless we're only talking about "fully uninformed" priors such as those based on Kolmogorov complexity.

Yes, I would say you're half right. The example of the temperature given by the Wikipedia is really good. The "degree of coldness" is a worthy notion. What bothers me is that fuzzy logicians don't seem to run probability distributions over those degrees of truths. I mean, I can surely make up a probability distribution on the outside temperature for tomorrow at 10AM. Assuming that temperature maps to degrees of truth to the sentence "it's cold" (say, 100% true below 10°C, 0% true beyond 20°C, and a linear interpolation in between), then I naturally come up with a probability distribution over the degrees of truth (I should get a Dirac spike at both extremities).

Your last one is exactly right. I'm only 16% from empty to 75% full (excluded). That's the punishment I get for not producing a theory of the fullness of the bottle which describes my actual beliefs about it.

I assume that your ultimate goal in life is to make decisions, or to create a machine that will take decisions. To make sound decisions, you need to asses their consequences. Take the example of the bottle: I need a full bottle of water to hike today. Let's assume for the sake of the argument that it is either empty, or full. If it's empty, I need to re-fill it, or I would faint from the thirst (not good). If it's full, checking is only a nuisance (not good either). Now the question is, how can I minimize my discomfort, on average?

Well I need two things: first, a probability distribution. Let's say I'm pretty sure I filled the bottle yesterday, let's call it a 90% chance. Second, a utility function. Baseline, the bottle is full, and I did not bother checking it, utility zero. Checking costs me 1 utility point, and fainting from thirst costs me 50 points (let's assume my brain is actually capable of performing those utility assessments). Should I check the bottle?

Oh yes. Not checking means a 10% chance of losing 50 points, or 5 on average. Checking means a certainty of losing only one point, which costs much less. Conclusion: check your water supply before you go hiking. And re-check just to be sure.

Now this problem didn't call for fuzzy logic. We can however come up with a full probability distribution over the quantity of water in the bottle instead of just two possibilities. From then, fuzzy logic should be able to naturally step in. But frankly, I prefer to run a a separate (dis)utility function over the state of thirst that the lack of water will provoke (from light thirst to actually faint), and combine it with my probability distribution to make my decision. (Though at that point, I'd rather just check than frying my brain trying to solve this cost-benefit calculation.)

tunesmith · on March 31, 2013

BTW, I just saw something about this regarding fuzzy uncertainty.

They made a distinction between vagueness and ambiguity. Both of which might make it difficult to assign a probability distribution among the possible values.

Ambiguous is when the boundaries of the sets are not clearly defined. In the thermometer example, it would be not knowing exactly what temperatures are meant by "cold". Or maybe like the supreme court definition of porn, you know it when you see it.

Vague is when there are clear definitions, but you're not sure how well your data fits into those definitions. For the temperature example, it would be what a crappy thermometer would tell you. It'd roughly correlate (it wouldn't return boiling if it's freezing), but it'd be pretty inaccurate.

In true/false values, maybe 0/.5 and .5/0 could be construed as ambiguous, while 1/.5 and .5/1 could be construed as vague.

Jach · on March 30, 2013

Fuzzy set theory makes sense again when you realize it's not about classifying uncertainty or partial truth at all, but merely denoting partial set membership.

Goertzel's et aliorum PLN book and framework that I mentioned before use fuzzy values in exactly that way actually. There's a brief high-level overview on the differences between PLN and other approaches to uncertain inference early on in the book. In PLN, all of these four sentences are uniquely distinguished, whereas in a fuzzy framework it's not so clear what's going on beyond the first sentence:

"Jim belongs to degree 0.6 to the fuzzy set of tall people.", "Jim shares 0.6 of the properties shared by people belonging to the set of tall people (where the different properties may be weighted).", "Jim has a 0.6 chance of being judged as belonging to the set of tall people, once more information about Jim is obtained (where this may be weighted as to the degree of membership that is expected to be estimated once the additional information is obtained).", and "Jim has an overall 0.6 amount of tallness, defined as a weighted average of extensional and intensional information."

loup-vaillant · on March 30, 2013

Yes, it does make sense.

olympus · on March 27, 2013

Frequentist here. This paper makes me hate Bayesians a little less. The reason is because a general thrust of the paper (since I have only had time to give it a once-over) seems to be that just because you are a Bayesian it doesn't mean that you have to get rid of model adequacy checks. Not having model adequacy checks is why I think Bayesians run around with a magic wand saying, "poof! there's an optimal model." After proving a theoretical optimality they never check to see if the real world data supports their arguments. So I'm glad to see a prominent Bayesian saying that you don't have to throw model checking out the window.

On a secondary note, I have to lament the use of philosopy in a math paper. I realize that many prominent mathematicians are/were also philosophers and that the two subjects are somehow linked at some level. But really I think that putting philosophy in a math paper is an excuse to use more big words and sound smart. Most of us would like to have a set of formulas to apply and not worry about-- forgive me if I for what I'm about to say-- fuzzy non-science like philosophy and the implications that it might have on our cold hard numbers.

Can each 31 page paper that combines math with philosophy come with a 5 page companion paper that leaves out the philosopy and just has the applicable math stuff?

rafcavallaro · on March 27, 2013

I think your objection misses the fundamental point of the paper which is that blindly applying formulae (in this case Bayesian ones) without considering the part these computations play in the whole scientific process leads to bad science. Specifically, it leads to assuming that the correct model is already among those being considered. The authors say this is often not the case and assuming so often leads to stagnation in the relevant discipline.

This isn't an article that hands the readers a set of rote instructions, it's a warning that rote application is bad science because good science doesn't merely compare existing models with the data, good science proposes new and better models. This is fundamentally an article on the philosophy of science so leaving out the philosophy would make the article pointless.

In general, people who are good at symbolic manipulation are often in search of a methodology that allows them to only do symbolic manipulation and relieves them of the difficult task of doing semantics as well. The article is saying that there's no free lunch here - we have to do the semantics as well, we have to understand why models don't perform well and come up with the creative insights necessary to replace them with new ones.

dean · on March 28, 2013

The author's point that the most successful forms of Bayesian statistics accord much better with sophisticated forms of hypothetico-deductivism is reminiscent of the epistemology of normative value(s) which furnish a provisional lens for the analysis of the systemization of statistical transparency.

OK, half of that sentence is from an academic bullshit generator. I won't tell you which half.

Unfair, I know. That paper is clearly not meant for the general public, but still, learn how to communicate.

pertinhower · on March 27, 2013

I... uh.... How did this get here?