> *I noticed pretty early on that I had a discomfort in using probability as...

tunesmith · on March 28, 2013

I agree that colloquially speaking, it aligns most with what we believe. What I have trouble with is that you and I could assign different probabilities (via beliefs) to an external event that has only one probability of happening. Just in that sentence I used the word "probability" correctly both times, yet it has two different conflicting meanings.

loup-vaillant · on March 28, 2013

Then just use different words. Break it down to "degree of subjective belief" and "frequency property" if you have to.

Still, I prefer to use the short hand "probability" to mean the former, for two reasons: first, "degree of subjective belief" is what most lay-people will understand when we say "probability". Second, even scientists have this meaning messing up with their intuitions, even if when they write papers, they really do mean "frequency property". That can be a major obstacle for science. Let me quote Edwin T. Jaynes (long, but worth it):

Those who cling to a belief in the existence of "physical probabilities" may react to the above arguments by pointing to quantum theory, in which physical probabilities appear to express the most fundamental laws of physics. Therefore let us explain why this is another case of circular reasoning. We need to understand that present quantum theory uses entirely different standards of logic than does the rest of science.

In biology or medicine, if we note that an effect E (for example, muscle contraction, phototropism, digestion of protein) does not occur unless a condition C (nerve impulse, light, pepsin) is present, it seems natural to infer that C is a necessary causative agent for E. Most of what is known in all elds of science has resulted from following up this kind of reasoning. But suppose that condition C does not always lead to eect E; what further inferences should a scientist draw? At this point the reasoning formats of biology and quantum theory diverge sharply.

In the biological sciences one takes it for granted that in addition to C there must be some other causative factor F, not yet identied. One searches for it, tracking down the assumed cause by a process of elimination of possibilities that is sometimes extremely tedious. But persistence pays off; over and over again medically important and intellectually impressive success has been achieved, the conjectured unknown causative factor being finally identified as a definite chemical compound. Most enzymes, vitamins, viruses, and other biologically active substances owe their discovery to this reasoning process.

In quantum theory, one does not reason in this way. Consider, for example, the photoelectric effect (we shine light on a metal surface and find that electrons are ejected from it). The experimental fact is that the electrons do not appear unless light is present. So light must be a causative factor. But light does not always produce ejected electrons; even though the light from a unimode laser is present with absolutely steady amplitude, the electrons appear only at particular times that are not determined by any known parameters of the light. Why then do we not draw the obvious inference, that in addition to the light there must be a second causative factor, still unidentified, and the physicist's job is to search for it?

What is done in quantum theory today is just the opposite; when no cause is apparent one simply postulates that no cause exists —ergo, the laws of physics are indeterministic and can be expressed only in probability form. The central dogma is that the light determines, not whether a photoelectron will appear, but only the probability that it will appear. The mathematical formalism of present quantum theory —incomplete in the same way that our present knowledge is incomplete— does not even provide the vocabulary in which one could ask a question about the real cause of an event.

Biologists have a mechanistic picture of the world because, being trained to believe in causes, they continue to use the full power of their brains to search for them —and so they find them. Quantum physicists have only probability laws because for two generations we have been indoctrinated not to believe in causes —and so we have stopped looking for them. Indeed, any attempt to search for the causes of microphenomena is met with scorn and a charge of professional incompetence and `obsolete mechanistic materialism'. Therefore, to explain the indeterminacy in current quantum theory we need not suppose there is any indeterminacy in Nature; the mental attitude of quantum physicists is already sufficient to guarantee it.

This one has been quite an eye opener, making me doubt even Many Worlds, which for one still doesn't explain the Born statistics. Still, thanks to Eliezer's Quantum Physics sequence, I'm now convinced that to the best of Science's knowledge (and despite what many physicists say) the laws of physics are most probably deterministic. Which would instantly solve the conflict by rendering "probability" nonsensical when applied to physical phenomena.

tunesmith · on March 29, 2013

That's a lot to unpack - let me make sure I have this right.

Quantum theorists see that light is necessary but not sufficient for some sort of electron behavior. And I'm guessing that it's very hard to find an additional contributory/necessary cause that, when combined with light, would reliably predict the behavior.

So they use probability to communicate their findings.

If that additional contributory/necessary cause exists (but is just really hard to discover, or perhaps light itself is just highly contributory but not necessary/sufficient) then the probability is an effective way to communicate degree-of-belief, and helps to combine information about what they know with what they don't know. In other words, they are using probability to communicate partial findings.

If that additional contributory/necessary cause doesn't exist, then the probability is a fixed, physical part of the science. In other words, they're communicating full findings; that light actually creates probability of an effect.

And that confusion about "probability" is making people believe that the second case is true when they haven't necessarily disproved the first case yet? Which then serves as a curiosity-stopper and makes the science "mysterious". And this messes with scientific intuitions.

Forgive me if I'm way off, I've only read the first few articles of the first couple lesswrong sequences.

Jach · on March 29, 2013

I thought I'd pop in and mention this set of lectures (now a book, huh) from Scott Aaronson on quantum mechanics and quantum computing: http://www.scottaaronson.com/democritus/ Particularly Lecture 9 goes into the basic mathematics of quantum mechanics, and explains why it's more appropriate to consider it an extension to probability theory using complex numbers and call them amplitudes, rather than classic probabilities. Not being a physicist myself I don't know if this is really helpful in clearing up professional scientists' intuitions, but it's helpful for me at least in grasping the basics.

You're on the right track by noting that quantum theorists can use probability to communicate their findings. A single classical probability like 0.7 is a summary, not the whole picture. For the whole picture, you need a log of observations with time stamps, which is a tremendous amount of data because taken to the extreme it's a balancing act between maximizing the data you know about you and your surroundings and minimizing the number of planck unit time steps to collect the data. Even with more realistic amounts of data it's still a lot to pack around, so you can summarize it by constructing a probability distribution which is more convenient mathematically, and you can summarize that probability distribution into a handful of numbers if you need to because those are even easier to pack around. (Like the single probability 0.7, or two numbers u and s that characterize a Normal distribution, etc.)

So if you think of probability as just summarizing your past observations and constraining your expectations for future observations (because what else can?), I think it's easy to see how two people can have different probabilities that represent a belief in a proposition or an abstract system or something else. If both are being honest (I'm packing a lot of assumptions in that word since this is basically Aumann's agreement theorem) and the probabilities are different, there are a couple reasons why: either at least one of them has information (observations) the other one does not, or they started out with different prior probabilities. In the former case, they can reach agreement by sharing data. In the latter case, they can eventually reach agreement by obtaining more data or choosing a common prior. With enough successive Bayesian updates, you can arrive at the same conclusion regardless of where you started. (In practice, of course, things are more hairy and not this simplistic, which is what the submitted pdf is addressing in part.)

I find it hard to grok what it means for something to have a physical frequency property. I can understand it as a statistic of an abstract system, i.e. a probability, but in light of Many Worlds, I don't think a frequency can fundamentally be a property of something in the same way energy can be. But since the energy of a photon is related to the photon's "frequency" via its "wavelength", is energy really fundamental or is the relation just a convenient means of making the wave mechanics useful? I've read Feynman tell how the whole idea of forces is just a somewhat circular simplification that's not really fundamental, even though sometimes it seems like it is. While I have a high degree of certainty about the physics accurately describing reality as she is, my meta-belief / confidence about that high certainty isn't nearly as high simply because I feel a lot of confusion about the ultimate nature of reality. But at least at a higher level than fundamental physics, it's clear to me that saying a "fair coin" has p=0.5 per unique side is really just summarizing we don't have enough knowledge of the flipper and the environment it is flipped in to predict the result with certainty in the way we can predict the result of the output of a logic OR gate with low- and high- voltage inputs with certainty. This is different than the uncertainty of which branch you are within a Many Worlds universe, where it's more of a physical resolution problem rather than one of needing more knowledge, similar to not being able to find out what happens between planck unit time steps (if anything happens and if the question means anything).

Ben Goertzel's book Probabilistic Logic Networks is another resource I'd recommend. Sorry if my branching reply is a bit big, but I'm constantly trying to clarify things for myself as well. ;)

loup-vaillant · on March 29, 2013

I have not read the Aaronson's book, but I'm already confident his use of the word "probability" is more confusing than helpful. Okay, the universe is made up of a wacky "configuration space" of "complex amplitudes" (in lay terms, a many-dimensional sea of complex numbers). And, there is a direct relationship between complex amplitudes and the degree of subjective belief of the physicist doing such and such experiment: take the norm of the amplitude, square it, compare that to the other relevant squared norms, and voilà, you have a probabilistic prediction for the result of your next experiment. (I sweep a lot under the carpet, but it really works like this.)

Something to know however: assuming you made no mistake, then the result of the first experiment tells you nothing about the result of any subsequent trials. Here's an example:

Let us send light through a half-sieved mirror. Have detectors on either side of the mirror to detect the light. Now, you can see that each detector detects half as much light as you originally sent (of course). Things gets interesting when you send photons one by one. The set up of the experiment is such that when you make your amplitude calculations, you find that the squared norm of the amplitudes at each detector is the same. Here, this means that when you send a single photon to the mirror, your should expect to see a single detector go off, but you have no idea which one (subjective belief of 1/2 for either side).

But there's more. Once you made the calculation, you know all you could possibly know about the initial conditions of the experiment (according to current physics). Imagine you sent a photon for the first time, and you see it going through the mirror. Will that change anything to your expectation on the second trial? Not one bit. You will still bet 1/2 chances on either side. So this experiment is the perfect coin toss.

Or is it?

Imagine you send twenty photons, and you see they all made it past the mirror. Initially that's one chance in a million. In that case, you should be strongly tempted to expect the 21th to go through as well. The question is why?

If you are absolutely sure you set up the experiment right, then you shouldn't change your mind, and still bet 1/2 on either side. But are you so sure of your accuracy that you could set up several millions experiments, and get all of them right? Probably not. The more likely explanation is that your so-called "half sieved mirror" is just a piece of glass. Maybe you were tired, and piked up the wrong one. Or maybe some graduate student played a joke on you.

Conversely, an actual coin toss could be perfectly fair: there's a lot going on in a coin toss, and depending on the chosen procedure, you can't expect to have sufficient knowledge of the initial conditions to predict anything but 1/2 for heads, 1/2 for tails, even after seeing the the coin landing tails three times in a row. There is, even then, some kind of "frequency property" in the experiment. It's not because the laws of coin tossing are somehow indeterministic, but because the way the coin is tossed systematically hides relevant information about the exact initial conditions.

(The formal Aumann agreement theorem is a bit stronger than that: two perfects Bayesians cannot have common knowledge of any disagreement. In practice, this means you can't start from different priors, because a prior embody background information, which can be shared. I know that there is controversy about "fully uninformed" priors, but in practice, they don't change posterior probabilities very much.)

tunesmith · on March 29, 2013

As an outgrowth of this discussion I tried submitting a question to Ask HN but it went over the character limit, so for want of a better place to put it, maybe someone will react here - anyone want to weigh in on my various interpretations of truth and falsehood?

   +------------+------------------+--------------------+-------------------+---------------+
   |    T/F     |     Fuzzy        |     Frequentist    |     Bayesian      |   Bayesian    |
   |            |                  |                    |   Subjectivist    |  Objectivist  |
   +----------------------------------------------------------------------------------------+
   |    0/0     | Ambiguous/Vague. |    I am ignorant.  |  I am uncertain.  | No one can be |
   |            | I am apathetic.  |                    |                   | certain.      |
   +----------------------------------------------------------------------------------------+
   |    0/.5    |     N/A          | Don't bother me,   | I am uncertain    | There exists  |
   |            |                  | still testing my   | but it may be     | partial       |
   |            |                  | hypothesis.        | false (partial    | knowledge of  |
   |            |                  |                    | knowledge)        | falseness.    |
   +----------------------------------------------------------------------------------------+
   |    0/1     | Completely false | It never happens   | I am certain it   | Everyone      |
   |            |                  | in the physical    | will never happen | should be     |
   |            |                  | world.             |                   | certain it    |
   |            |                  |                    |                   | will never    |
   |            |                  |                    |                   | happen.       |
   +----------------------------------------------------------------------------------------+
   |    .5/0    |     N/A          | Don't bother me,   | I am uncertain    | There exists  |
   |            |                  | still testing my   | but it may be     | partial       |
   |            |                  | hypothesis.        | true (partial     | knowledge of  |
   |            |                  |                    | knowledge)        | truthiness.   |
   +----------------------------------------------------------------------------------------+
   |   .5/.5    | It is partly     | Intrinsically      | I am 50% certain  | All should be |
   |            | true and partly  | 50/50.  Given 100  | it is true, 50%   | 50% certain   |
   |            | false.  (partial | bottles, half are  | certain it is     | it's true;    |
   |            | truth)  The      | full.  Given 100   | false.  I am 50%  | 50% certain   |
   |            | bottle is half   | of her, half are   | sure the bottle   | it's false.   |
   |            | full.  She is    | pregnant.          | is full.  I am    | All should be |
   |            | partly pregnant. |                    | 50% sure she is   | 50% certain   |
   |            |                  |                    | pregnant.         | bottle is     |
   |            |                  |                    |                   | full.  There  |
   |            |                  |                    |                   | is 50%        |
   |            |                  |                    |                   | certainty she |
   |            |                  |                    |                   | is pregnant.  |
   +----------------------------------------------------------------------------------------+
   |    .5/1    |       N/A        | Don't bother me,   |       N/A         |     N/A       |
   |            |                  | my hypothesis is   |                   |               |
   |            |                  | broken.            |                   |               |
   +----------------------------------------------------------------------------------------+
   |     1/0    | Completely true  | It always happens  | I am certain it   | Everyone      |
   |            |                  | in the physical    | will always       | should be     |
   |            |                  | world.             | happen.           | certain it    |
   |            |                  |                    |                   | will always   |
   |            |                  |                    |                   | happen.       |
   +----------------------------------------------------------------------------------------+
   |    1/.5    |      N/A         | Don't bother me,   |       N/A         |     N/A       |
   |            |                  | my hypothesis is   |                   |               |
   |            |                  | broken.            |                   |               |
   +----------------------------------------------------------------------------------------+
   |     1/1    | Equally          | My hypothesis is   |       N/A         |     N/A       |
   |            | confident.       | meaningless.       |                   |               |
   |            | Torn.            |                    |                   |               |
   |            | Ambivalent.      |                    |                   |               |
   +----------------------------------------------------------------------------------------+

loup-vaillant · on March 29, 2013

I'm not sure what to make of this… More precisely, I'm not sure what your fractions on the left actually mean. Personally, I'm tempted to throw "undefined" at each row that does not sum to 1, leaving only "1/0", ".5/.5", and "0/1".

I haven't heard about any distinction between "subjectivists" and "objectivists" Bayesians. I'd say my own position is a little bit of both:

First, background knowledge fully constraint the world view. That means that two rational agents whose knowledges are the same must hold the same beliefs. More practically, two persons having the same relevant knowledge about any given question should believe the same answer (the difficulty here is to sort out what's relevant). Second, agents generally don't have the same background knowledge. Nothing stops us for knowing different relevant information about any given subject. So of course we won't hold the same beliefs.

Fuzzy. Ah, fuzzy… the more I look at it, the less sense it makes. Strictly speaking, there is no such thing as "half truth". A proposition is binary: either true or false. Beliefs on the other hand are not binary: a belief is a proposition plus a probability. If I say the bottle is more than 75% full with probability 84%, then I'm 16% right if it really is only half full. Quite wrong, but not hopelessly so.

With probability functions and utility function, we can make the same set of decisions we could have made with fuzzy logic, which looks like it mixes the two. I personally prefer the modular approach.

tunesmith · on March 30, 2013

The fractions are supposed to represent partial degrees of truth. 0.5 is just an example, could be 0.4, 0.6 - anything other than absolute true or false. There are examples of probability systems that try to factor in uncertainty, beyond truth belief or false belief. I think it was possibility theory that was talking about a 0.5 value assigned to "the cat is alive", a 0.2 value assigned to "the cat is dead", and a 0.3 uncertainty - then it would be 0.8 plausibility/possibility of alive, but only 0.5 probability.

The main distinction I saw about subjectivist vs objectivist is that the objectivist still communicates in terms of degree of belief, but the belief is that it is based on objective empirical data and so therefore it's not an expression of personal belief - more a measure of belief for the entire system or knowledge base. More relevant for things like machine learning rather than updating ones own personal beliefs. (Maybe this just assumes common priors.)

I've gone round and round on fuzzy because some people talk about fuzzy as if it is degrees-of-belief, like the bayesian interpretation. But I think it's different - I mean, if I'm partly correct on describing a concept, you wouldn't say that I am right 50 times out of 100, or that you're 50% certain I'm right, right? You'd say I'm half-right.

I'll have to think about probability combined with utility function. I'm not sure how to make it square with fuzzy math. It seems you'd be 16% right whether the bottle was half-full, quarter-full, or empty.

loup-vaillant · on March 30, 2013

Okay, so I got your fractions mostly right.

Well, when you use a computer to make a probabilistic calculation, you have at some point to feed it with the relevant information, or it won't know what to make of the data. The data alone is not enough. And if you are absolutely certain your program is (i) correct, and (ii) fed with all the relevant information you know about, then you should definitely believe the result. (Of course, this absolute certainty is not attainable, so any sufficiently surprising result should lead you to think that something went wrong.) Assuming common priors on the other hand, seems unreasonable, unless we're only talking about "fully uninformed" priors such as those based on Kolmogorov complexity.

Yes, I would say you're half right. The example of the temperature given by the Wikipedia is really good. The "degree of coldness" is a worthy notion. What bothers me is that fuzzy logicians don't seem to run probability distributions over those degrees of truths. I mean, I can surely make up a probability distribution on the outside temperature for tomorrow at 10AM. Assuming that temperature maps to degrees of truth to the sentence "it's cold" (say, 100% true below 10°C, 0% true beyond 20°C, and a linear interpolation in between), then I naturally come up with a probability distribution over the degrees of truth (I should get a Dirac spike at both extremities).

Your last one is exactly right. I'm only 16% from empty to 75% full (excluded). That's the punishment I get for not producing a theory of the fullness of the bottle which describes my actual beliefs about it.

I assume that your ultimate goal in life is to make decisions, or to create a machine that will take decisions. To make sound decisions, you need to asses their consequences. Take the example of the bottle: I need a full bottle of water to hike today. Let's assume for the sake of the argument that it is either empty, or full. If it's empty, I need to re-fill it, or I would faint from the thirst (not good). If it's full, checking is only a nuisance (not good either). Now the question is, how can I minimize my discomfort, on average?

Well I need two things: first, a probability distribution. Let's say I'm pretty sure I filled the bottle yesterday, let's call it a 90% chance. Second, a utility function. Baseline, the bottle is full, and I did not bother checking it, utility zero. Checking costs me 1 utility point, and fainting from thirst costs me 50 points (let's assume my brain is actually capable of performing those utility assessments). Should I check the bottle?

Oh yes. Not checking means a 10% chance of losing 50 points, or 5 on average. Checking means a certainty of losing only one point, which costs much less. Conclusion: check your water supply before you go hiking. And re-check just to be sure.

Now this problem didn't call for fuzzy logic. We can however come up with a full probability distribution over the quantity of water in the bottle instead of just two possibilities. From then, fuzzy logic should be able to naturally step in. But frankly, I prefer to run a a separate (dis)utility function over the state of thirst that the lack of water will provoke (from light thirst to actually faint), and combine it with my probability distribution to make my decision. (Though at that point, I'd rather just check than frying my brain trying to solve this cost-benefit calculation.)

tunesmith · on March 31, 2013

BTW, I just saw something about this regarding fuzzy uncertainty.

They made a distinction between vagueness and ambiguity. Both of which might make it difficult to assign a probability distribution among the possible values.

Ambiguous is when the boundaries of the sets are not clearly defined. In the thermometer example, it would be not knowing exactly what temperatures are meant by "cold". Or maybe like the supreme court definition of porn, you know it when you see it.

Vague is when there are clear definitions, but you're not sure how well your data fits into those definitions. For the temperature example, it would be what a crappy thermometer would tell you. It'd roughly correlate (it wouldn't return boiling if it's freezing), but it'd be pretty inaccurate.

In true/false values, maybe 0/.5 and .5/0 could be construed as ambiguous, while 1/.5 and .5/1 could be construed as vague.

Jach · on March 30, 2013

Fuzzy set theory makes sense again when you realize it's not about classifying uncertainty or partial truth at all, but merely denoting partial set membership.

Goertzel's et aliorum PLN book and framework that I mentioned before use fuzzy values in exactly that way actually. There's a brief high-level overview on the differences between PLN and other approaches to uncertain inference early on in the book. In PLN, all of these four sentences are uniquely distinguished, whereas in a fuzzy framework it's not so clear what's going on beyond the first sentence:

"Jim belongs to degree 0.6 to the fuzzy set of tall people.", "Jim shares 0.6 of the properties shared by people belonging to the set of tall people (where the different properties may be weighted).", "Jim has a 0.6 chance of being judged as belonging to the set of tall people, once more information about Jim is obtained (where this may be weighted as to the degree of membership that is expected to be estimated once the additional information is obtained).", and "Jim has an overall 0.6 amount of tallness, defined as a weighted average of extensional and intensional information."

loup-vaillant · on March 30, 2013

Yes, it does make sense.