Idea averaging is not an example of "the wisdom of crowds". It's a manifestation of groupthink.
More specifically, it's an example of when a group of people will tend to agree on something that is the least objectionable on average. This means that it's less likely that an unconventional idea will be accepted.
I was watching the election results on my computer in 2012. By 8 or 9pm most of the rural area votes had been counted in Florida and the urban votes were still being counted. Most of the votes left to count would have been for Obama who was just over 50% at that point. However, on the prediction market, Intrade, at the moment the contract on an Obama win was $0.22 paying a dollar if he does. Holy cow!. I tried so hard to get as many thousands of dollars in because I figured that very hopeful conservatives had been betting on Romney and probably watching FoxNews so they didn't know he lost Florida for a long time after. Unfortunately, I couldn't at that time get money into the market because of laws and it had to be held in escrow or something for 24 hours before a person could bet. I just did a search and there is a discussion about how lopsided the bets on Romney were.[1]
Since most unconventional ideas tend to fail (because people have thought for a long time about what would be good, and over time those things have often become conventional), it is a useful heuristic to go with conservative, conventional ideas. I think this totally fits in with "wisdom of crowds".
Of course, this also means you won't generate that once-in-a-million super great unconventional idea. But then again, I think that's the point. If you try to generate the great unconventional idea, the base rate of failure is so high that you should not have so much hubris as to believe you actually can do it, and most often you should admit that what you think is a great unconventional idea really probably can't do any better than whatever conventional wisdom would choose.
I think my view on this is not popular because everyone wants to believe in the idea underdog -- the idea that was spurned by conventional wisdom but proved itself regardless and made someone famous for executing upon it.
That's great and all, but in the same way that putting all of your money on a single stock is a bad strategy, failing to average across ideas embraced by conventional wisdom is also (often, but not always) a bad strategy.
On a different note though, there is a lot of value in your comment even connecting it back to the "wisdom of crowds" stuff. One of the issues with "wisdom of crowds" is that the aggregated average tends only to be better whenever there is a sufficient variance of opinions in the crowd. This way, there is not systematic bias in everyone's thinking, meaning that the average too will have that bias. The more variation, the more that each participant's particular way of being wrong will contribute to "canceling out" the errors, leaving the average in a good position.
To me it suggests that when unconventional ideas do hit it big, it is because they are highly related to some source of demonstrable bias in conventional wisdom, and by removing that bias, such ideas can actually improve upon whatever is the current state of conventional wisdom.
But most of the time, it seems we only hit upon these bias-based improvements by accident because tons and tons of people keep trying (and failing, often to their personal detriment) to commit to unconventional ideas that aren't articulated in terms of any demonstrable bias, and it ends up basically being an idea lottery to see who happens to hit the "jackpot" idea. That's what makes conventional wisdom appealing as an alternative, steady expected return with low risk, but it's also what means it will never lead you to the next big thing.
Ironic, then, conventional wisdom is a useful heuristic because of all these unconventional ideas individuals do attempt, and so the answer to keep moving humanity forward is, yes, go ahead and try that unconventional idea, or you will contribute to systemic bias.
I would expect the "wisdom of the crowds" to have a different result if, when asking for each person how many peas are in a bottle, each person is first told of the current running average or median. I think it would achieve a better result if everyone made their own guess independent of the "crowd". Otherwise, the random bias that occurred from the first 20 people's guesses is going to percolate because everyone who is guessing after will have their guess anchored when they're told of the group opinion.
Idea averaging is a wonderful thing, and you should probably do it way more than you already do, and it should probably be your default mode when looking to choose between ideas.
A lot of this is spelled out in the literature about why ensemble methods tend to be superior to individual methods in statistical inference, which has a lot of ties to why averages of non-expert opinions tend to be as accurate or even more accurate than small pools of expert opinions.
In the limit, when you reallocate credibility to a set of ideas in a manner that is fully derived from your prior beliefs and the evidence you have, it becomes exactly Bayesian reasoning. A posterior distribution is exactly an "idea average" where each idea (each potential outcome) gets as much (and no more) credibility as it deserves, according to the prior and the probability model at hand.
There is also the recent stuff under the marketing buzzword "Wisdom of Crowds" e.g. < http://onlinelibrary.wiley.com/doi/10.1111/j.1551-6709.2011.... > or the Wikipedia article on it too (and many sources that point out potential problems and corner cases that are also important).
In general though, I think I have to disagree with the article's premise. Unless two distinct ideas truly are mutually exclusive (most things aren't), then it's better to have a diversified portfolio over the space of all the ideas than to put all your eggs in one basket, and averaging ideas is sort of a humble heuristic that seems to work in a lot of areas.
This is a very confused post. A posterior is not an "average", it's a probability distribution. That distribution may have a mean value, but that's not a meaningful quantity in general.
For example, you're chasing someone and you come to a fork in the road. You know they must have taken the left or the right branch, so your posterior distribution on their location has two modes. It also has a mean value, in the middle of the forest between the two branches, but unless you have specific evidence they've abandoned the road, this is a very low-probability state.
The type of averaging prescribed by Bayesian decision theory is averaging of utilities, not of beliefs or actions. You take the action having the highest expected utility, where the expectation is taken over your posterior distribution. Assuming you actually want to catch your suspect, the expected utility of following either the left or the right fork will be much higher than the utility of dropping the pursuit to poke around in the middle of the forest.
I think you are mistaken. Yes, a posterior is a probability distribution, which is exactly taking each possible outcome in accordance with its probability. I didn't say anything about collapsing the distribution down to a single point estimate like the posterior mean, or the MAP estimate, or any other single statistic. I am saying that I hold in my head a bunch of beliefs all at the same time (thus they are "blended" or "averaged" together), each in accordance with the amount of credibility it deserves. I think it's totally fine to speak about this as a type of uncollapsed "averaging" process, and indeed when you do something like hierarchical models where you supply a metaprior distinguishing between two alternative models, we widely talk about such things as model averaging even in the statistics literature. It seems like a rather misguided nitpick to me to insist that the English word "average" cannot be invoked unless it specifically coincides with the statistics word "mean".
Further, you're simply wrong about Bayesianism being about averages of utilities. The best account of Bayesian probability as a mapping of plausibilities to the unit interval is in Jaynes' The Logic of Science, but there is also a brief account of it in David Mumford's essay The Dawning of the Age of Stochasticity and e.g. in the introduction of Bayesian estimation supersedes the t-test by Kruschke [1] (where he explicitly describes it in terms of reallocation of credibility). Further, a la Jaynes, I think the right way to understand probability at all is in a mind-projection fallacy sense of the term: it describes your state of ignorance about the uncertain item.
It's not about utilities of actions -- utilities can be modeled with probabilities if the utility of certain actions is uncertain, but that is different than the base concept of probability being about which outcomes are more valued.
No, it's you who are mistaken. The posterior is not remotely like idea averaging.
Idea averaging is about taking multiple ideas and merging them into one. The Bayesian posterior is merely assigning probabilities to each individual idea. I.e., idea averaging is "knife + spoon = spoon with sharp edge". Bayesian posterior is "50% knife is best, 50% spoon is best => evidence => 5% knife is best, 95% spoon is best".
The latter is what the article is saying we should do; don't build a new idea incorporating everyone's contributions, instead just figure out which individual idea is the best and do it.
I still think the other comment was mistaken. At a given moment just before drawing a sample from some random variable X, that random variable can have a lot of different possible outcomes. What do I believe X is? I believe X is every different possibility in X's sample space. I simultaneously believe X is a lot of different things. But the degree of belief for each thing is governed by the probability for that outcome being the one I draw. In this sense, my cumulative belief about X is an average of all those different beliefs. That's not the same thing as saying that I have to carry around just one belief that is equal to the posterior mean (or any other point estimate).
> idea averaging is "knife + spoon = spoon with sharp edge". Bayesian posterior is "50% knife is best, 50% spoon is best => evidence => 5% knife is best, 95% spoon is best".
There's a lot wrong with the above quote from your comment. First, you're confounding two very different things. One thing is whether two ideas are mutually exclusive (e.g. the product is fundamentally non-functional if it is a spoon with sharp edge, so blending the two ideas is not in the set of feasible solutions). I already granted in my first comment that genuine mutual exclusivity can be a reason to avoid idea averaging -- but it's just overstated: true mutual exclusivity like that doesn't happen as much as people argue it does.
Secondly, and more importantly, you could break up your budget to pursue the knife solution and the spoon solution independently. What fraction of your budget should you allocate to the knife part? That is, what if 'knife' and 'spoon' are two distinct models that you are considering, and instead of doing crappy model selection to choose merely one of them, you want to produce a model that is a superposition of the two -- what metaprior probability should you choose that the 'knife' model will be the one that succeeds, versus the 'spoon' model?
Maybe something more concrete would be helpful, just think of a Gaussian Mixture Model. You have many different models and you're not choosing just one. You are blending them (and people call this model averaging) by using whatever each different component model's prior probability is to govern what the overall outcome is.
Maybe we are just talking past one another, but I really think the way you keep emphasizing the idea "posterior is a distribution" makes me think you continue to misunderstand me.
Another way to put it is that you are averaging over a space of distributions -- sort of like the Dirichlet distribution. The thing you end up with, the thing that is the result of computing an average, is itself a probability distribution, that comes as an average over a bunch of other probability distributions.
In short, when the posterior is arrived at as the average of a bunch of other distributions, as in Bayesian Model Averaging, it is absolutely fine to say that the posterior is itself an average (it is the average of a bunch of other distributions). So a posterior can be both a distribution and also an average.
This is why I don't agree with your claim, "The posterior is not remotely like idea averaging." It's exactly like averaging when you're talking about model averaging and I think model averaging is a very useful and good analogue to idea averaging in cases where there is not genuine mutual exclusivity between ideas.
You are misunderstanding both BMA and Gaussian mixtures. The posterior on the world state is a distribution over Models x Model Params.
To make predictions about the world, you compute an nexpected value (average) over Models. The explicit assumption of both GM and BMA is that only one gaussian or only one model is correct - you just don't know which one, and therefore need to average over your uncertainty to take into account all possibilities.
Idea averaging (as described in the article) is about averaging over world states not models.
> The explicit assumption of both GM and BMA is that only one gaussian or only one model is correct - you just don't know which one, and therefore need to average over your uncertainty to take into account all possibilities.
This is so wrong I don't even know where to start. In fact, this is basically the notion of frequentism! One of the very most fundamental ideas of Bayesian reasoning is that there is no one true set of parameters nor is there one true model. There is only your state of knowledge about the space of all possible parameter sets or all possible models. I'm very surprised to see you, of all people, claiming this. Even a cursory Google search of BMA fundamentals disconfirms what you are saying, e.g. [1]
> Madigan and Raftery (1994) note that averaging
over all the models in this fashion provides better
average predictive ability, as measured by a logarithmic
scoring rule, than using any single model
Mj, conditional on M. Considerable empirical evidence
now exists to support this theoretical claim...
This is so wrong I don't even know where to start. In fact, this is basically the notion of frequentism!
Frequentism doesn't even allow you to represent your belief with a probability distribution.
One of the very most fundamental ideas of Bayesian reasoning is that there is no one true set of parameters nor is there one true model. There is only your state of knowledge about the space of all possible parameter sets or all possible models.
Bayesian reasoning says there is one true set of parameters/model, you just don't know which one it is. The posterior distribution allows you to represent relative degrees of belief and figure out which model/parameter is more likely to be true.
Assuming you gather enough data, your posterior distribution will eventually approximate a delta function centered around that one true model. This is also what happens when you do BMA or gaussian mixtures.
The fact that model averaging provides better predictive ability doesn't contradict what I said.
> Assuming you gather enough data, your posterior distribution will eventually approximate a delta function centered around that one true model. This is also what happens when you do BMA or gaussian mixtures.
What happens when your data is actually distributed according to a mixture of gaussians (c.f. Pearson's crabs).
In a situation like that, you have a space of models M and a space of populations P. The state space is M x P, and the model M generates the population P (probabilistically). In Pearson's crab example, there are some models where an individual crab can come from one or more gaussians.
The generative model (first choose gaussian, then draw from it) is the specific model. As you gather a sufficiently large sample from the population P, eventually you'll converge to a delta-function on the space of M.
So in Pearson's crab example, given enough data, you'll eventually either converge to the single model of 2 gaussian crab species, or you'll converge to a 3 gaussian crab species model, or you'll converge to a weibull single species model (assuming you put all 3 of those models into your prior).
I read the original (andrewxhill's) post as making a point analogous to the "fork in the road" example: it's fine to maintain a full posterior on ideas, but if you need to choose, you should put in the effort to choose the best idea, not blindly try to merge them into a single idea that might combine the worst of both worlds. The latter would be like searching the forest between the two roads instead of just choosing a branch. Under this reading the point seems quite sensible to me.
You seem to have read the post quite differently, in a way that causes it to seem totally wrong.
I actually do think it's just a fact that "average" in both colloquial and mathematical usage means something like "to collapse multiple values to a single typical value" (even Bayesian model averaging collapses a distribution over probabilities into a single probability). But even if it were genuinely ambiguous what andrewxhill meant, you generally get a lot more out of a post by choosing the reading that allows it to be insightful over the reading the causes it to be nonsensical.
First, model averaging is a bit different than when you describe it as (at least in my reading, but maybe you mean "single probability" differently than I read it):
> even Bayesian model averaging collapses a distribution over probabilities into a single probability
as I mentioned in the child comment to the other reply to this comment, model averaging creates an average value, but the type of that value is a distribution. That is, you have a distribution over distributions (each coming from a different model) and the effect of averaging does not reduce you down to a point estimate, rather it reduces you down to just one distribution.
This is why it's totally fair to say a posterior can be an average. It is the average of a bunch of other distributions. I think if I had said it that way in my first comment, it would have removed some confusion.
But it is important, because the criticism that "a posterior isn't an average" is very wrong. A posterior most certainly can be the average of some other stuff, if that other stuff was itself a bunch of distributions -- and that's exactly what I am trying to talk about.
But to your other point, about the 'sensible' vs. 'not sensible' readings, I mostly agree. However, the problem is who gets to decide when two ideas fall into the "knife-vs-spoon-clearly-exclusive" category, or when it's more gray than that, and the choice is not so black and white, and there is not a need to over-commit to just one approach?
The reason the OP post strikes me as problematic is that it seems like a matter of opinion, or in the worst case a matter of bureaucratic/dictatorial mandate, as to when ideas are of the type that can be averaged vs. when they are not.
I'd generally like people to be more humble about it and tend to believe that conventional wisdom and model averaging are better, at least as a first heuristic, than deeply committing to just one thing. That way there might be less urgency to rush into the claim that some debate is "knife-vs-spoon".
I sort of see the whole "knife-vs-spoon" thing as a kind of Godwin's law of brainstorming. Once you invoke the "knife-vs-spoon-so-we-can't-average" claim, it's like game over and all useful intellectual discussion dies and everyone just either picks Team Knife or Team Spoon and then the political battles start. Unless it's really mutually exclusive, I'd rather that doesn't happen.
Averaging different economists' predictions makes sense. They each provide an x-y curve. Those curves average together in a mathematical way.
Product design ideas don't average together in this way. If one person wants to design a knife and another wants to design a spoon, and you average them together, you get a spoon with a sharp blade on one side, or maybe the blade is on the handle of the spoon, but either way, bad idea!! Or best case maybe you end up with a swiss army knife, in which case it's harder to clean, etc. Adding the knife feature sacrifices the simplicity of the spoon, which is extremely valuable.
First, I did mention that when two things truly are mutually exclusive, it can be a reason for picking only one. Two designs that are entirely non-functional when combined could count as mutually exclusive. Even two designs that could be successful each on their own, but such that neither one could be created with 1/2 of the total budget, could be considered mutually exclusive.
I think you're pointing out a straw-man though. There are lots of cases where two ideas are not totally mutually exclusive, especially in design. And there are lots of times when the budget will allow you to explore building both designs in parallel and "investing in both" (thus diversifying your investment by not allocating the whole budget to only one project).
I agree when there is real mutual exclusivity you have to confidently choose one way to invest. I just think this is rarer than stated, and that lots of times you aren't actually forced to make that type of bifurcating decision, and so in general you should be eager to compromise more and more humble about not knowing which of the ideas will work, thus more willing to average over them, or divide up the budget and try multiple things.
It's hard for me to understand exactly what it is that you're saying. You're trying to layer too many insights into your sentences, which which makes your comment confusing
Not sure... what part of it interests you? Teamwork? Innovation? Decision making?
I was putting together a "best read of 2015" type post but never finished it. There were a couple on my list that I think are related and maybe you'd find interesting.
Higher level company organization stuff I thought these two books gave at times contradictory and at times very complimentary thoughts on what made teams work. In reality, neither were exactly about making teams work though :)
The rotating image carousel seems like an example of idea averaging.
- "What should we put on our homepage?"
- A bunch of people have different ideas.
- "I know, we can put them all up with a carousel!"
Even Apple has succumbed to his, which is perhaps not surprising. Carousels often come out of collaborative cultures (everyone wins!), which has been a point of emphasis for Tim Cook.
Avoiding carousels requires that a single person is willing to make a bunch of colleagues unhappy by picking only one thing at a time to feature.
I think something the article missed is you should probably test your ideas(cheaply if possible) because facts are better than opinions(apparently drunken' ones in the authors case).
More specifically, it's an example of when a group of people will tend to agree on something that is the least objectionable on average. This means that it's less likely that an unconventional idea will be accepted.
This concept has been studied for a long time in the social psychology literature. For a survey of related topics: http://chicagounbound.uchicago.edu/cgi/viewcontent.cgi?artic... (see specifically "Hidden Profiles and Common Knowledge").