A primer on causal emergence

raymondh · on June 11, 2017

I really enjoy seeing thought-provoking posts like this on Hacker News.

The practical insight is that complex systems have some level of scale where causality experiments yield the most fruit, and that this effect is measurable.

The most interesting parts are the two justifications for why this may be true. (1) "the determinism can increase and the degeneracy can decrease at the higher scale (the causal relationships can be stronger)" (2) "Higher-scale relationships can have more information because they are performing error-correction."

shadowmint · on June 11, 2017

> Measuring causal emergence is like you're looking at the causal structure of a system with a camera (the theory) and as you focus the camera (look at different scales) the causal structure snaps into focus. Notably, it doesn’t have to be “in focus" at the lowest possible scale, the microscale.

Talk about abstract metaphors that have no meaning.

The core of this argument seems to be:

1) Given a fixed state of a system, you can modify it by apply certain operators on the system.

2) You can model the 'causal structure' by observing changes as you randomly apply operators.

3) High level systems at a macro scale have a greater information density than the sum of their parts.

Ie. In a nutshell, you can have high level (ie. real world) systems that display behaviour that is not just hard to predict from changes to low level systems... but actually impossible to predict from them.

Which is to say, basically asserting that you cannot predict the behaviour of macro systems from microscale systems; eg. you cannot predict the behaviour of a molecule based on its quantum state / make up (clearly false) and you cannot predict the behaviour of say, a person deciding what to have to for lunch based on their quantum state/

...but not that you can't because it's hard.

You can't because its not possible.

Am I misunderstanding?

I think that sounds completely crack pot to me.

SCHiM · on June 11, 2017

I'm not a physicist or computer scientist. But I do think I recognize what the author is trying to explain when looking at code sometimes, or even differences in core mathematical operators.

When looking at code and 'reading' it, a programmer is (possibly without realizing it) running the snippet of code inside their heads. It's impossible, I think, to understand a loop without running a little part of it in your own head. In a very real way, you're 'executing' the code in your head. A little part of your program is running right at that moment when you read what you or someone else wrote. In philosophy this is called the idea of multiple realizability, or functional isomorphism.

The point of this is, that the outcome of the loop/code/w\e cannot be determined by looking _just_ at each and every operation or even keyword or character. The code must be _ran_ in your head to know what it does. As that point you're not looking at the 'parts' but at the 'whole'. Simply reading and comprehending the different statements but not simulating the code in your head is what a person does that cannot program in that language, they understand the words 'do', 'for', 'while', but cannot connect and parse and simulate their meaning. The result is obviously that that person wouldn't be able to understand what a loop or program was doing. They _cannot_ understand so long they don't simulate (= zoom out).

Perhaps this is akin what the author and the critics are confused about. It seems that it's perfectly reasonable to start your simulation at the parts, and then scale the simulation up and predict or determine things about the whole. This works as long as you realize that along the way you've started abstracting away the details, compartmentalizing the particles and states, and that you're _actually_ looking at larger parts of the whole, instead of predicting things about the whole by looking at it's parts.

TheOtherHobbes · on June 11, 2017

It's not crack pot at all.

Given an understanding of quantum chemistry and the periodic table, would a reductionist approach be able to predict Hacker News?

You certainly can predict the behaviour - or at least a statistical envelope for the behaviour - of a single molecule.

But once you get beyond a certain scale, information starts being processed and abstracted in the relationships that emerge between larger assemblies.

Naive reductionism has no tools for modelling that information or predicting how it might appear or develop.

shadowmint · on June 11, 2017

> abstracted in the relationships that emerge between larger assemblies.

Yes, but that's the point.

Obviously it's hard to make predictions a macro level purely from studying it at a super micro level and without looking at parts of the interactions it might even be impossible; but that's not what's being asserted here.

What's being asserted here, is, in scott aaronson's words:

> In their new work, Hoel and others claim to make the amazing discovery that scientific reductionism is false—or, more precisely, that there can exist “causal information” in macroscopic systems, information relevant for predicting the systems’ future behavior, that’s not reducible to causal information about the systems’ microscopic building blocks.

Think about that for a second.

You're saying, there's a kind a of 'meta information' in complex systems, that cannot be reduced into information about its constituent parts or how they interact.

For example, 'what you might pick for lunch' cannot be represented as a information about your blood, body, atoms, stomach.

It's a stupid assertion; if you assert that a system of any complexity cannot be predicted by the behavior of its constituent parts, you're basically saying that 'nothing makes sense'. It's patently false. It's just a flip off to people building probabilistic models like, oh hey, don't bother, that doesn't work.

It's just not true.

What's mathematically interesting is how they've devised the paper.

If you can show, mathematically, that you have more information for predicting future state from considering a macro state than all micro states, that's a pretty interesting result; but it's also exactly the point scott demolishes in his blog post.

...and the rebuttal?

> Doing a series of A/B tests to capture the effects of the macroscale states doesn’t correspond to doing a series of A/B tests to capture the effects of microscale states. A randomized trial at the macroscale of medical treatments to see their effect on tumors won’t correspond to an underlying set of microscale randomized trials, because many different microstates make up the macrostates.

Which is where we started; ie. the assertion that the behaviour of microscale effects doesn't reflect macro scale effects.

...but we know that it does. We don't invent new drugs by going off and randomly trying crap; we model the molecules and predict the macro scale effects they'll have.

What he's asserting here is quite literally, demonstrably false.

chadcmulligan · on June 11, 2017

Not sure I follow it all either but one thing I think I gleaned is given a series of n measurements, the measurements at the macro scale provide more information than the same number of measurements at the micro scale. Thinking about the switch analogy they use this seams feasible. Though I think that may only be a part of what's being said.

olejorgenb · on June 11, 2017

Yeah, I've read through the critique (http://www.scottaaronson.com/blog/?p=3294) that triggered to linked blog post and the main disagreement seems to be about this.

From the critique:

"In their new work, Hoel and others claim to make the amazing discovery that scientific reductionism is false—or, more precisely, that there can exist “causal information” in macroscopic systems, information relevant for predicting the systems’ future behavior, that’s not reducible to causal information about the systems’ microscopic building blocks."

I think it would've helped cleared thing up if Hoel actually addressed and clarified the above quote.

Instead (From the blog post):

"Why does causal emergence matter?

The theory does imply that universal reductionism is false when it comes to thinking about causation, and that sometimes higher scales really do have more causal influence (or information) than whatever underlies them. This is common sense in our day-to-day lives, but in the intellectual world it’s very controversial."

At this point I'd really like the definition of "universal reductionism"..

An illuminating example from the critique (possibly taken from Hoels paper). (Basically an example of your 1-3 steps)

"For here is the argument from the Entropy paper, for the existence of macroscopic causality that’s not reducible to causality in the underlying components. Suppose I have a system with 8 possible states (called “microstates”), which I label 1 through 8. And suppose the system evolves as follows: if it starts out in states 1 through 7, then it goes to state 1. If, on the other hand, it starts in state 8, then it stays in state 8. In such a case, it seems reasonable to “coarse-grain” the system, by lumping together initial states 1 through 7 into a single “macrostate,” call it A, and letting the initial state 8 comprise a second macrostate, call it B.

We now ask: how much information does knowing the system’s initial state tell you about its final state? If we’re talking about microstates, and we let the system start out in a uniform distribution over microstates 1 through 8, then 7/8 of the time the system goes to state 1. So there’s just not much information about the final state to be predicted—specifically, only 7/8×log2(8/7) + 1/8×log2(8) ≈ 0.54 bits of entropy—which, in this case, is also the mutual information between the initial and final microstates. If, on the other hand, we’re talking about macrostates, and we let the system start in a uniform distribution over macrostates A and B, then A goes to A and B goes to B. So knowing the initial macrostate gives us 1 full bit of information about the final state, which is more than the ~0.54 bits that looking at the microstate gave us! Ergo reductionism is false."

From this I think we all can agree that finding out how a system with A and B as states is much simpler than figuring out the low level 9 state system. But how that somehow disproved "reductionism" is not clear. I'm not sure that this is type of compression is "controversial in the intellectual world" either?

That doesn't mean the study of "Causal emergence" isn't worth studying of course.

mannykannot · on June 11, 2017

Thanks for this comment, and especially the example, which appears to get to the crux of the matter (though I am not in a position to be sure.)

In that example, does not the different result come, in an unsurprising way, from the fact that a uniform distribution over the 8 microstates is not the same as a uniform distribution over the two macrostates?

In a separate note[1], Hoel seems to be claiming that the issue is the definition of 'black-boxing', and if I am following the small example there, his definition allegedly allows him to use a uniform distribution over four states, or a uniform distribution over three states, depending on whether the S1 microstate is declared as being 'black-boxed'. Furthermore, this seems to differ from your example in that here, S1 does not become a macrostate, it is completely ignored once it is declared to be black-boxed. I do not see how Hoel is comparing apples to apples.

[1] http://www.erikphoel.com/uploads/1/7/8/8/17883727/black-boxi...

westoncb · on June 11, 2017

I read Scott Aaranson's initial criticism of Hoel's causal emergence paper (which is pretty funny because of things unrelated causal emergence really: "Higher-level causation exists (but I wish it didn’t)": http://www.scottaaronson.com/blog/?p=3294)

I've skimmed the linked article, and will in all likelihood go back to it—but I wonder about some stuff from the conclusion:

> It also provides some insight about the structure of science itself, and why it’s hierarchical (biology above chemistry, chemistry above physics). This might be because scientists naturally gravitate to where the information about causal structure is greatest, which is where they are rewarded in terms of information for their experiments the most, and this won't always be the ultimate microscale.

I don't see how more information existing at higher levels would explain the hierarchical structure of the sciences: saying there's more information at the higher levels is the reason would imply that e.g. we found biology to be more valuable than physics, whereas the actual situation seems to be that we value these levels equally. Maybe that's just a phrasing issues. In any case, it seems simpler that we organize the sciences hierarchically because the human brain organizes information that way.

I also don't see how there being more information at certain levels is necessarily useful: isn't the quality of the information as important or more important than the quantity? But I guess if the it's specifically 'causal' information, there's an implication (at least for the sciences) of ideal quality...

kalu · on June 11, 2017

I think you are looking at it in a linear way. But maybe the relationship between scale and causal information is nonlinear. Causal information might oscillate as the scale increases. So that physics, chemistry, and bioligy are concentrated around scales that are associated with peaks in causal information.

dogruck · on June 11, 2017

Both questions are great. The primer feels fuzzy (by declared design) but that doesn't quench.

nabla9 · on June 11, 2017

>What’s causal emergence? It’s when the higher scale of a system has more information associated with its causal structure than the underlying lower scale.

I don't understand why the writer invents new (grand) terms to discuss phenomena that is already widely studied.

Information as a function of scale in a system is well known phenomenon in physics and complex systems theory. Systems different in microscopic scale can behave similarly in macroscopic scale. Renormalization group formalizes this emergent principle of universality. Multiscale information theory is generalization of it.

https://en.wikipedia.org/wiki/Renormalization_group

https://arxiv.org/abs/1409.4708

maxhodges0 · on June 11, 2017

I think you need an additional primer on the topic of: why I should care about "casual emergence". What's the point? How can it be put to use? Started reading your article but couldn't get into it. Is it just a pointless philosophical notion?

mannykannot · on June 11, 2017

I think the third paragraph suggests the point is that it is allegedly a measure for whether a purely reductionist approach is a useful way to explain a given thing or phenomenon. For example, while I have no doubt that the history of life on earth could, in principle, be explained in terms of physics, I find that a Darwinian theory of evolution is a more useful way of looking at it.

Aaronson's argument seems to be that the resulting measure is an unsurprising outcome. I am not sure whether that proves it is irrelevant, but I think the onus is on its proponents to show that this measure goes beyond being trivially true, and is actually useful.

In the conclusion of the article, Hoel seems to be making a stronger claim: "The theory does imply that universal reductionism is false when it comes to thinking about causation, and that sometimes higher scales really do have more causal influence (or information) than whatever underlies them." There seems to be a hint of the motte-and-bailey strategy here, with Hoel defending the narrower claim but expecting us to accept the broader one.

I have not figured out how this measure is actually calculated without having a reductionist measure of the system's information content, so I wonder if it has been created to be used as a pawn in some philosophical argument, perhaps such as the nature of consciousness.

shadowmint · on June 11, 2017

If you're really keen, read Scott's article: http://www.scottaaronson.com/blog/?p=3294

About half way down is a section that starts "But of course this still leaves the question: what is in the mathematical part of Hoel’s Entropy paper? What exactly is it that the advocates of causal emergence claim provides a new argument against reductionism?"

...but basically, I wouldn't bother. I regret the time I wasted deciphering what it was on about already.

visarga · on June 11, 2017

"casual emergence" != "causal emergence"

cousin_it · on June 11, 2017

> Maybe, as the physicist Yakir Aharonov has advocated, our universe has not only a special, low-entropy initial state at the Big Bang, but also a “postselected final state,” toward which the outcomes of quantum measurements get mysteriously "pulled"—an effect that might show up in experiments as ever-so-slight deviations from the Born rule.

Me, me! I know a kind of postselection effect that can be explained on a napkin (though nobody knows if it's actually true). As a bonus, it can affect not just the Born probabilities, but the probabilities of anything you choose, even things that already happened. Here's how it works.

The idea is a variation on anthropic reasoning, originally due to Bostrom (http://www.anthropic-principle.com/preprints/cau/paradoxes.h...) If there's a completely fair quantum coin, and many people over many generations decide to have kids iff the coin comes up heads, then the coin might appear biased to us for anthropic reasons (more people in the heads-world than in the tails-world). You can influence all sorts of things this way, like Bostrom's example of Adam and Eve deciding to have kids iff a wounded deer passes by their cave to provide them with food. (That's if anthropic probabilities work in a certain intuitive way. If they work in the other intuitive way, you get other troubling paradoxes in the same vein. All imaginable options lead to weirdness AFAIK.)

A few years back I spent a long time on such problems, and came up with a simple experiment about "spooky mental powers" that doesn't even involve creating new observers. It's completely non-anthropic and could be reproduced in a lab now, but the person inside the experiment will be deeply troubled. Here's how it goes:

You're part of a ten-day experiment. Every night you get an injection that makes you forget what day it is. Every day you must pick between two envelopes, a red one and a blue one. One envelope contains $1000, the other contains nothing. At the end of the experiment, you go home with all the money you've made over the ten days. The kicker is how the envelopes get filled. On the first day, the experimenters flip a coin to choose whether the red or the blue one will contain $1000. On every subsequent day, they put the money in the envelope that you didn't pick on the first day.

So here's the troubling thing. Imagine you're the kind of person who always picks the red envelope on principle. Just by having that preference, while you're inside the experiment, you're forcing the red envelope in front of you to be empty with high probability! Since your mental states over different days are indistinguishable to you, you can choose any randomized strategy of picking the envelope, and see the result of that strategy as if it already happened. In effect, you're sitting in a room with two envelopes, whose contents right now depend not just on what you'll choose right now, but on what randomized strategy you'll use to choose right now. If that's not freaky, I don't know what is.

Going back to Aaronson's original point, the world as it looks to us might easily contain postselection and other weird things. Reducing everything to microstates is a valid way to look at the universe, but you aren't a microstate. You are an observer, a big complicated pattern that exists in many copies throughout the microstate, and the decisions of some copies might affect the probabilities observed by other copies at other times. The effects of such weirdness are small in practice, but unavoidable if you want a correct probabilistic theory of everything you observe (or a theory of decision-making for programs that can be copied, which is how I arrived at the problem).

visarga · on June 11, 2017

A problem similar to the one you're describing (having a number of options and limited number of attempts to maximize gains) is known as the "Multi-armed bandit" and has been studied in A/B testing, optimization and reinforcement learning.

https://en.wikipedia.org/wiki/Multi-armed_bandit

> In probability theory , the multi armed bandit problem (sometimes called the K - or N armed bandit problem) is a problem in which a gambler at a row of slot machines (sometimes known as "one armed bandits") has to decide which machines to play, how many times to play each machine and in which order to play them. When played, each machine provides a random reward from a probability distribution specific to that machine. The objective of the gambler is to maximize the sum of rewards earned through a sequence of lever pulls.

In essence it is a problem of balancing exploration and exploitation (the so called exploration/exploitation tradeoff), which is at the heart of how agents learn.

omginternets · on June 11, 2017

>Since your mental states over different days are indistinguishable to you, you can choose any randomized strategy of picking the envelope, and see the result of that strategy as if it already happened. In effect, you're sitting in a room with two envelopes, whose contents right now depend not just on what you'll choose right now, but on what randomized strategy you'll use to choose right now.

You lost me, and I'm not sure where... could you elaborate?

cousin_it · on June 11, 2017

Imagine two people going through this experiment, Alice and Bob.

When Alice wakes up, she decides to pick red, because she always picks red which is her favorite color. Lo and behold, the red envelope is empty with >90% probability (because if Alice picked it on the first day, it will be empty on the next 9, and Alice doesn't know which day it is).

When Bob wakes up, he decides to flip a coin, because that's how he always makes decisions. The coin tells him to pick red. Lo and behold, the red envelope is full with 50% probability. Even though Alice and Bob chose the same color.

Nerdy explanation: if you maximize expected utility in the Von Neumann and Morgenstern sense, you can prove a theorem saying there's always an optimal strategy that's deterministic. You shouldn't need randomization, even if the world is random. In my experiment, the classical assumptions don't apply (specifically the axiom of independence), so the theorem becomes false and you need randomization to get the best result. The point of the experiment is showing that the axiom of independence isn't reasonable when you have copies, so vNM utility maximization needs to be modified. (For cases like this one, the right fix is modeling utility maximization as a single player game with imperfect information, chance moves, and behavioral strategies.) Bostrom's anthropic problems also show the same weirdness, but my contribution is making a non-anthropic scenario with fixed number of observers that still shows the weirdness.