The reason Julia is a good fit is it has both good numerics support (arrays, matrices, autodiff, samplers, etc) and good metaprogramming support. Probabilistic programming systems really need both to feel natural and well-integrated while still being able to have nice inference algorithms.
A PPS in an environment without those two things is at a disadvantage.
I am pretty new to all of this but I get the impression that Stan and PyMC3 are the leaders in this area and I don't see them as having great meta-programming support. Maybe I am wrong? Are they currently hitting limitations in that regard? Or is this in a particular area of PPL such as non-parametrics?
I think you’re right! I haven’t explored these systems enough to comment with certainty.
I think Stan and PyMC3 both focus on optimized implementations of Hamiltonian Monte Carlo - a Markov chain Monte Carlo algorithm which requires that you express models with target variables whose log probability densities are differentiable with respect to their sample spaces. I think Stan provides other algorithms as well, and possibly also PyMC3 but I know that these systems are known for their implementations of HMC (similar to NumPyro, which has a highly optimized version of HMC).
This is certainly not all probabilistic programs - but many popular models do tend to fall in this category (where you can use HMC for inference). In both cases, I’m also unsure if these systems are “universal” in the sense that you can express any stochastic computable function which halts with probability 1. Similar to Turing completeness, if the system does not allow you to express control flow with runtime bounds, or disallows stochastic bounds - it’s not Turing universal. This is not typically a bad thing, because most classical models don’t require this feature, but it does tend to separate frameworks for PP.
As a shameless plug, my own library doesn’t rely on AST meta programming (e.g macros) but instead relies upon dynamic compiler interception - which is metaprogramming of a different sort.
I think systems built around effect handlers like pyro and edward2 form a nice corner in the design space by being less reliant on metaprogramming, being fairly composable, and give nicer UIs.
The downside is a trickier to write and debug API on the inference developer's side. Do link your library as I try to evaluate every PPS I encounter.
I agree. Turing.jl, which is one of the major PPLs in Julia, is also based around effect handlers and does not relying on meta programming for the inference part. This allows the composition of inference algorithms and makes it more easy to overload functions for specific behaviour.
PyMC3 does a lot of AST manipulation. More metaprogramming support a language has the easier and more at home this feels. The easier it is to inspect and modify the AST, less is the disconnect between the modeling language and the host language.
I don't know quite enough about the field, but it's possible they're the leaders (like Python) because they're just "good enough" and they were the best options widely available when they were adopted + took off?
One of the arguments for python is the exceptional support of automation differentiation and GPU computing through deep learning libraries. Most python based PPLs focus on static model with differentiable log joints, allowing the application of HMC or variational inference. Unfortunately, the support of efficient automatic differentiation libraries in Julia is still in its infancy. But I hope with some more work by the community and the Turing team, this will change sooner than later.
I thought with libraries like Zygote there is some really nice stuff already in Julia. I'd say it's still early days for good autodiff libraries in general and I think we still haven't really explored what they can do.
The reality is that most of a modelling task is preprocessing your data before it can be passed to a probabilistic model and postprocessing to make decisions using it. The code is usually written in R or Python so there is a strong pressure for your library to be in that language as well.
And being rough around the edges is an ok price to pay for not losing an ecosystem.
My take is that most modern PPLs have language bindings in JavaScript/Python/R because they are explicitly courting analysts/data-scientists/applied-statisticians, or they are taking advantage of modern technical stacks implementing auto-diff and co-processor routines.
Most pre- and post- processing (with the exception of visualization) should probably be a part of your model!
A PPS in an environment without those two things is at a disadvantage.