They are almost orthogonal concepts in some regards. Bayesian models (and in particular Bayesian networks or graphical models) and neural networks are about different things. The former try hard to capture uncertainty and causality. The later are all about non-linearity.
For example, Pyro implements tons of facilities to have Bayesian models augmented with neural networks.
It makes a lot of sense from a modeling perspective to model the big picture using a Bayesian model (generally a graphical model) and then use neural networks for some components. You capture the overall causal structure, but you are also outputting really precise predictions. For example, a deep markov model.
There are tons of unexplored ideas combining both, and in general I think this is the future of deep learning and one component towards AGI.
I actually really think that the way Bayesian probability factors in subjective probability is key, in that even if an algorithm spits out a result, it is still subject to human interpretation as well. I think some kind of composite decision support with both purely objective results (e.g. neural networks or other models that are purely machine based) as well as subjective beliefs could be really interesting and I still haven't seen much that does this.
I think maybe reinforcement learning where human feedback becomes part of the loop is about as close as I could think of. But that is different than factoring in human input to probability calculations.
Indeed. Mathematically speaking, a graphical model merely formalizes conditional independence. Their advantage is a statistical interpretation, which is also a factor that makes algorithms (like belief propagation) harder to parallelize on GPUs.
Belief propagation is hard to parallelize because, in its basic form, it’s a sequence of serial computations. If the graph is a tree, though, then it’s actually pretty easy to parallelize. The only issue is you have to wait for all the incoming messages from your children to arrive before you push your message upward. Once you reach the root, you can fully parallelize the downward pass (in a tree).
The other main issue is that in graphs with multiple paths to a single node, you can’t quite do this forward backward pass and get the _exact_ answer. You can approximate it by just passing messages in these cycles that arise, but you’re not actually guaranteed to converge to the right marginal probability anymore. This is called Loopy Belief Propagation. Additionally, there’s no real true ordering of nodes for a passing order. There’s not a natural sequence like there is from leaves to root and back when you can go around and around in circles.
It surprisingly works reasonably well in a lot of cases anyway though. BP and the various approximate versions are super interesting. The original algorithms really only work on low dimensional discrete spaces or things with analytic solutions to integrating from conditional/joint to marginal distributions. However, there’s been some really cool stuff coming out in the last 5ish years about using particle based approximations to work on more complicated/continuous spaces.
That's the synthesis we've come to now, after great effort. It was far from clear in the 1980s and 1990s, when Bayesian networks and neural networks looked a lot more like competitors.
For example, Pyro implements tons of facilities to have Bayesian models augmented with neural networks.
It makes a lot of sense from a modeling perspective to model the big picture using a Bayesian model (generally a graphical model) and then use neural networks for some components. You capture the overall causal structure, but you are also outputting really precise predictions. For example, a deep markov model.
There are tons of unexplored ideas combining both, and in general I think this is the future of deep learning and one component towards AGI.