Hacker News new | past | comments | ask | show | jobs | submit login
Probabilistic Programming & Bayesian Methods for Hackers (camdavidsonpilon.github.io)
186 points by luu on Sept 9, 2013 | hide | past | favorite | 23 comments



The "understanding-first" approach is so right, especially with a topic like Bayesian methods where the logic makes _so much sense_, and yet is so easy to miss by brute-force memorization of equations. IPython notebook is a great choice. I've only skimmed the first couple of chapters but cloned and will definitely spend some time with this.

This seems inspired by the awesome Yudkowsky article that comes from a similar (introductory, not programming-specific) place: http://yudkowsky.net/rational/bayes/

For anyone that hasn't used IPython notebook and is interested in scientific computing in python, you need to check it out. The ability to mix prose and live python, with effortless plotting, storable in git, sharable with links or nbviewer, is just magic. That probably seems mundane, but for things like exploratory data analysis across a team, it's a game-changer. Another staple in this stack is Pandas ( http://pandas.pydata.org/ )


you should see the kinds of things Mathematica notebooks can do


Ooh, have any examples?


The most powerful stuff revolves around the Dynamic symbol[1]. If you use this, for example:

  Dynamic[Graphics[Disk[p]]]
And then somewhere else, either manually or programmatically, you change the value of p, you will see the disk move around. (In link [1], open up the "Neat Examples" dropdown).

Basically, with Dynamic you are saying "I want this updated as soon as any of its components are updated." And the system takes care of the rest. Furthermore, the dynamic content can be completely arbitrary. Mathematica's Manipulate symbol[2] is essentially a wrapper around Dynamic. See [3] for countless examples built using Manipulate.

To be clear, these aren't "generated programs." All this dynamism resides comfortably and natively inside notebooks. (Mathematica also allows data that persists across sessions, through DynamicModule).

See the last example in [4], combine that with the fact that you can have arbitrary expressions anywhere, and you can start to get a sense of the power the system gives you. For example, you can create an ad-hoc tool for making diagrams or whatever, and then plop that tool right in the middle of a piece of source code, inline.

As another example of the power of Mathematica's notebooks, links [1] [2] and [4] are HTML exports of Mathematica notebooks. Note all of the fancy formatting.

But wait, there's more, as Mathematica notebooks are themselves fundamentally Mathematica expressions (essentially M-expressions. Mathematica is in part a Lisp on top of symbolic semantics). Thus you can construct/alter them programmatically, etc. In other words, Mathematica is a ridiculously homoiconic system, not only in the language sense but also in the broader systemic sense. As an example, [5] is the Mathematica expression behind the notebook of [4].

[1] http://reference.wolfram.com/mathematica/ref/Dynamic.html

[2] http://reference.wolfram.com/mathematica/ref/Manipulate.html

[3] http://demonstrations.wolfram.com/

[4] http://reference.wolfram.com/mathematica/ref/DynamicSetting....

[5] http://pastebin.com/58GYSCFy


For those who want a more solid take on machine learning, and who still remember their math and probability/statistics, (i.e. advanced undergrad or new grad student), the best texts seem to be:

The Elements of Statistical Learning by Hastie, Tibshirani and Friedman, available for free on line.

Pattern Recognition and Machine Learning by Chris Bishop. Very Bayesian.

Machine Learning: A Probabilistic Perspective by Kevin Murphy. Also Bayesian, although not as Bayesian as Bishop. The most recent of the three, and therefore covers a few topics not covered elsewhere like deep learning and conditional random fields. The first few printings are full of errors and confusing passages, should be better before too long.

Did I miss any?


The best introductory book and also the most cohesive one is "Learning from Data" from Yaser Abu-Mostafa, accompanied by great video lectures:

http://amlbook.com/

It differs from other books in that all the material is treated from the unified perspective of statistical learning theory and VC dimension, as a result the book feels less like a hodgepodge of unrelated techniques and more like an introduction to a coherent field.

Hastie and Tibshirani also have a new, less demanding mathematically book out:

http://www.amazon.com/Introduction-Statistical-Learning-Appl...


I would add to that two classics with free PDFs available from the authors' websites:

1. David Barber: Bayesian Reasoning and Machine Learning

2. David MacKay: Information Theory, Inference, and Learning Algorithms

[1] http://www.cs.ucl.ac.uk/staff/d.barber/brml/

[2] http://www.inference.phy.cam.ac.uk/mackay/itila/

Both require a willingness to immerse in the mathematics. But the maths isn't hard, even for someone like me who hasn't formally studied it since I was 15.



Pg's own A Plan for Spam is also an excellent introduction to Bayes for boolean comparison.

http://www.paulgraham.com/spam.html


I also like "Think Bayes" even though it is "just" a book. http://www.greenteapress.com/thinkbayes/


I've been working with both books and Think Bayes is more accessible (and its also free). I recommend going through it before getting to PyMC.

One huge reason why is that the author has implemented everything you use in Python in a way that enables you to read his code to more fully understand what's happening. I'm on chapter 5 or 6 and he hasn't even touched on MCMC yet which is most welcome.


The book looks awesome, thanks.


The title is a bit confusing as probabilistic programming is a research field itself that the book seems to not touch upon. See http://probabilistic-programming.org/wiki/Home


I don't see any confusion at all. The first few paragraphs of the link say it's based on PyMC, which itself appears in your link under "Existing probabilistic programming systems". So it's a book that's a practical guide to using one of the systems you reference.


As a researcher with MCMC interests, I agree with the grandparent post. Used as technical jargon "probabilistic programming" tends to mean: specify a model using a programming language, a compiler then works out how to do inference in that model, and writes the inference code for you.

PyMC is a toolkit that makes it easier to write inference code for a wide range of models, but isn't as automatic as the field of probabilistic programming promises.

As it says on the linked site, it lists more than probabilistic programming systems, and PyMC falls into the latter categories of things it lists:

Below we have compiled a list of probabilistic programming systems including languages, implementations/compilers, as well as software libraries for constructing probabilistic models and toolkits for building probabilistic inference algorithms.


PyMC follows the library approach to probabilistic programming rather than inventing yet another application specific language that only a niche of developers will be willing to spend time to learn.

Despite not introducing a new language syntax or DSL, PyMC is still probabilistic programming in the sense that you have python variables that represent random variables with prior distributions and then use those to derive new random variables by using deterministic python expressions or functions. Finally you plug an inference engine to be able to invert the execution order and derive a posterior distribution on the unobserved variables.


I tend to agree with Ian that it's confusing to conflate probabilistic programming and libraries that support Bayesian inference. In PyMC variables represent random variables, but these variables can't be used with Python constructs like conditionals and loops. Python is used to construct a DAG, which is then executed.

I think a better definition of probabilistic programming languages are languages where you can replace any variable of type T with Random<T>. The line isn't entirely clear, but library approaches in languages like Python don't fit, since they can't handle control flow. Bugs/JAGS/Stan might qualify, although they are very limited declarative languages. Their motivation is primarily to have a compact modeling syntax, not a real programming language.

There's no need for probabilistic programming languages to be some esoteric DSL. You can convert languages like Python or Matlab into a probablistic programming language with a lightweight compiler transformation: http://www.mit.edu/~wingated/papers/lightweight_pp.pdf. Actually doing inference efficiently, however, remains as challenging as always.


>it's confusing to conflate probabilistic programming and libraries that support Bayesian inference

But it's a generic term so you could say the same about functional programming or logic programming, both of which can be done in Python even if there are more advanced or integrated systems elsewhere. I don't really think most people care, besides perhaps PL researchers, at which portion of the stack things are happening at or being optimised; if you are using the relevant mathematics and statistics that's what you are doing. I think people are playing semantics to say it only means one thing when it's obviously used in a general way and a sometimes in specific way.

The bottom line is the guy who wrote the book thinks it's probabilistic programming, ogrisel does, I do, and the people who run http://probabilistic-programming.org/wiki/Home seem to be referring to it as probabilistic programming as well. I don't buy Ian's argument that it's part of some latter type of category on the site, PyMC is directly linked in a section titled "Existing probabilistic programming systems". They use "as well as" to link the two groups so either the first is "systems" and the rest are still "probabilistic programming" just without "systems" or they are all "probabilistic programming systems" if "as well as" is operating in that way. The arguments against this seem to be splitting hairs and playing semantics far too much when n-grams regularly have more than one meaning. Indeed it's amusing to see probabilistic people arguing for one interpretation rather than saying that there could be more than one and it depends on context (an NLP program trying to disambiguate the meaning of a given n-gram would look at other words present, topic models for the document, et cetera).


This is fantastic. Both in terms of content, but also in terms of delivery.

I've been toying with ideas relating to some kind of text-book killing general publishing platform for a while, but it's not something I'll implement (in the next 5 years anyway). Obviously people are doing this kind of thing already, but this is certainly the closest implementation to the ideas I've been thinking about that I've seen.


I read a little of this book when it was on Hacker News a few months ago, but only half a chapter. Did anyone read through the whole book and do you have a review?


Looking forward to read Chapters X1, X2 for Machine Learning. Thanks for all the work.


To run this in iPython (once iPython is installed and notebooks cloned), in your terminal run: ipython notebook Chapter1_Introduction/Chapter1_Introduction.ipynb


You listen to Programming Throwdown too huh?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: