Hacker News new | past | comments | ask | show | jobs | submit login
PPX: Probabilistic Programming EXecution Protocol and API Based on Flatbuffers (iris-hep.org)
44 points by ArtWomb on Dec 11, 2019 | hide | past | favorite | 7 comments



This is only tangentially related to PPX, more like a general question about probabilistic programming: Every time I hear about probabilistic programming I get excited. Then I try to read more about it, and all I see everywhere is just integral signs flying by. Is there any tutorial where they start with a real world problem, or at least something one can believe a real human want to solve maybe if we squint a bit, and then go through how that problem can be formulated to be something one can solve with PP?

I had the same feeling with artificial neural networks. I have read a lot about perceptrons, and how they are in a sense universal approximators, but it didn't really click until I have seen a tutorial where they used a convolutional network to classify handwritten digits. I never needed to write a program to recognise handwritten numbers, but could imagine how that can be useful. Also without the tool in question (neural networks) it would have been very hard to achieve the same result.

In short: what is the mnist digits equivalent toy problem of probabilistic programming?


As far as PPLs on Python go, I have found that Pyro's documentation and tutorials are quite stellar. Built on a Pytorch backend (now NumPyro has support for JAX), it's by far my favorite PPL to use.

For your bread and butter PPL starter, I'd go for Variational Autoencoders (VAE). Fun to visualize and tweak: http://pyro.ai/examples/vae.html

If that's too much going on, try this basic tutorial on inference: http://pyro.ai/examples/intro_part_ii.html

The full set of examples is here: http://pyro.ai/examples/


>>> what is the mnist digits equivalent toy problem of probabilistic programming?

I wish it were that simple! Unfortunately, the only way to really get an intuitive feel for computational probability is going through the exercises.

Before even starting PyMC3, I'd systematically go through some classic problems in probability theory such as the birthday paradox, sleeping beauty problem, 3 prisoner's, etc. And implement them directly in code, without using any dependencies.

Here, for example, is Monte Hall in Golang. You can quickly see how trivial it is to extend the game from 3 doors to 99+ doors. And how a derivation might follow:

https://play.golang.org/p/RGkJ9dToda9

Only after you get a feel for going from probability spaces to events to random variables can you begin to understand the power behind taking human error out of inference that probabilistic environments provide.

Best of luck ;)


If you're genuinely eager to learn probabilistic programming then by far the best resource I've found is a book called Statistical Rethinking by Richard McElreath. His (almost finished) draft for the second edition is up here: http://xcelab.net/rmpubs/sr2/statisticalrethinking2_08dec19....

Dr. McElreath also posts his lectures on youtube. The R code in the book and his lectures use a library/package he wrote which provides a wrapper to simplify building Stan models. The code has also been translated to PyMC3 on Python.

Code, slides, lecture videos are all referenced here: https://github.com/rmcelreath/statrethinking_winter2019

It might look like a huge amount of content but this course leads you very gently through key concepts, keeping the mathematics to a minimum. Don't be put off if you don't know the R language. The concepts are more important than the programming language and the code examples are kept simple.

If you make it through Statistical Rethinking then you might consider picking up Doing Bayesian Data Analysis by John Kruschke (a.k.a. the puppies book). I've found DBDA to be heavier going than SR but Kruschke takes a different approach to McElreath which can be useful if you get stuck on a concept, need more detail or just want a different angle on the subject.


But you haven't answered OP's question. Why would one want to invest several months of learning PP without seeing any clear examples of how it could be useful?


I got started when a colleague recommended this series of 3 hands-on blog posts with PyMC3: https://twiecki.io/blog/2013/08/12/bayesian-glms-1/ which build up from normal linear regression, to non-quadratic error terms, to hierarchical modeling.

(Disclosure: I use probabilistic programming at Triplebyte, and have previously written a bit about it here https://triplebyte.com/blog/bayesian-inference-for-hiring-en... , though this blog post is really just at the level of Naive Bayes to be easier to understand.)


Alex Lew gave a great talk on using PP for data cleaning tasks: https://www.youtube.com/watch?v=MiiWzJE0fEA




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: