> Then, in the late 1980s--spurred by the early work of Judea Pearl, a professor of computer science at UCLA, and breakthrough mathematical equations by Danish researchers--AI researchers discovered that Bayesian networks offered an efficient way to deal with the lack or ambiguity of information that has hampered previous systems.
The "mathematical equations by Danish researchers", for those interested, are most likely this paper:
Lauritzen, S.L. and Spiegelhalter, D.J. (1988), Local Computations with Probabilities on Graphical Structures and Their Application to Expert Systems. Journal of the Royal Statistical Society: Series B (Methodological), 50: 157-194. https://doi.org/10.1111/j.2517-6161.1988.tb01721.x
I like how in the mid 90s neural networks were almost a write-off. "But the neural nets won't help predict the unforeseen. You can't train a neural net to identify an incoming missile or plane because you could never get sufficient data to train the system."
They are almost orthogonal concepts in some regards. Bayesian models (and in particular Bayesian networks or graphical models) and neural networks are about different things. The former try hard to capture uncertainty and causality. The later are all about non-linearity.
For example, Pyro implements tons of facilities to have Bayesian models augmented with neural networks.
It makes a lot of sense from a modeling perspective to model the big picture using a Bayesian model (generally a graphical model) and then use neural networks for some components. You capture the overall causal structure, but you are also outputting really precise predictions. For example, a deep markov model.
There are tons of unexplored ideas combining both, and in general I think this is the future of deep learning and one component towards AGI.
I actually really think that the way Bayesian probability factors in subjective probability is key, in that even if an algorithm spits out a result, it is still subject to human interpretation as well. I think some kind of composite decision support with both purely objective results (e.g. neural networks or other models that are purely machine based) as well as subjective beliefs could be really interesting and I still haven't seen much that does this.
I think maybe reinforcement learning where human feedback becomes part of the loop is about as close as I could think of. But that is different than factoring in human input to probability calculations.
Indeed. Mathematically speaking, a graphical model merely formalizes conditional independence. Their advantage is a statistical interpretation, which is also a factor that makes algorithms (like belief propagation) harder to parallelize on GPUs.
Belief propagation is hard to parallelize because, in its basic form, it’s a sequence of serial computations. If the graph is a tree, though, then it’s actually pretty easy to parallelize. The only issue is you have to wait for all the incoming messages from your children to arrive before you push your message upward. Once you reach the root, you can fully parallelize the downward pass (in a tree).
The other main issue is that in graphs with multiple paths to a single node, you can’t quite do this forward backward pass and get the _exact_ answer. You can approximate it by just passing messages in these cycles that arise, but you’re not actually guaranteed to converge to the right marginal probability anymore. This is called Loopy Belief Propagation. Additionally, there’s no real true ordering of nodes for a passing order. There’s not a natural sequence like there is from leaves to root and back when you can go around and around in circles.
It surprisingly works reasonably well in a lot of cases anyway though. BP and the various approximate versions are super interesting. The original algorithms really only work on low dimensional discrete spaces or things with analytic solutions to integrating from conditional/joint to marginal distributions. However, there’s been some really cool stuff coming out in the last 5ish years about using particle based approximations to work on more complicated/continuous spaces.
That's the synthesis we've come to now, after great effort. It was far from clear in the 1980s and 1990s, when Bayesian networks and neural networks looked a lot more like competitors.
There's been enough progress in approximate Bayesian methods that many things can be done thousands of times faster than back then, as well. The reputation of Bayesian methods as being slow is undeserved nowadays.
Can someone point me to any examples where Bayesian neural networks are successfully used for any practical applications? Like where they are better than regular non-Bayesian NNs? By better I mean better accuracy.
Bayesian network is a synonym for directed graphical model.
Any time you see graphical models, they’re usually BNs. Undirected graphical models are very closely related too (all directed models can be represented as undirected models, but not all undirected models can represent directed models), but they’re usually not referred to as BNs.
They’re used all over the place. One school of causal inference is heavily steeped in BNs/DAGs. This shouldn’t be surprising because the creator of BNs, Judea Pearl, is heavily involved in causal inference now.
Not NN, just simple BN. A risk assessment application. More specifically, calculation of financial risk of climate change-related risks for the mining sector. Link:https://www.mdpi.com/2412-3811/4/3/38
Interesting example, but it's funny how parts of the abstract read like an intro probability class:
> The framework estimates the climate change risks in economic terms by modeling the main activities that a mining company performs, in a probabilistic model, using Bayes’ theorem. The model permits incorporating inherent uncertainty via fuzzy logic and is implemented in two versatile ways: as a discrete Bayesian network or as a conditional linear Gaussian network. This innovative quantitative methodology produces probabilistic outcomes in monetary values estimated either as percentage of annual loss revenue or net loss/gains value.
The "mathematical equations by Danish researchers", for those interested, are most likely this paper:
Lauritzen, S.L. and Spiegelhalter, D.J. (1988), Local Computations with Probabilities on Graphical Structures and Their Application to Expert Systems. Journal of the Royal Statistical Society: Series B (Methodological), 50: 157-194. https://doi.org/10.1111/j.2517-6161.1988.tb01721.x
Direct PDF Link: https://www.eecis.udel.edu/~shatkay/Course/papers/Lauritzen1...