Show HN: Prolly – DSL to express and query probabilities in code

noelwelsh · on March 1, 2015

Probability distributions can be nicely modelled as a monad. See, for example, https://github.com/jliszka/probability-monad or http://www.cs.tufts.edu/~nr/pubs/pmonad-abstract.html

I think this goes beyond what Prolly offers, and so might provide some inspiration for extending it in a consistent manner.

tel · on March 2, 2015

Oleg's Hansei is also very worth checking out

http://okmij.org/ftp/kakuritu/

iamwil · on March 2, 2015

It does go beyond what Prolly offers. I've not heard of that Prob Distr being modeled as monads. I'll have to look into it. Thanks!

meesterdude · on March 1, 2015

Wahoo! this is neat, and in ruby!

However, I have some issues here. First, I don't know that I like the interface. Adding to the class instead of an instance of the class irks me. I also find the documentation hard to follow - the examples are too limited, and others are poorly formatted.

But its ruby, so I have no excuse to not try and clean it up some. I'll tinker and see what I can come up with in a PR.

Related, can anyone point me to something digestible that will bring me up to speed on the finer points of the mathematics?

Also, curious what kinda things this is used for in the wild? I don't know of anything that operates at all based on probabilities, so some examples would be helpful.

iamwil · on March 1, 2015

OP here, I used "adding to class" because I had assumed people would only use a single probability space at a time, and it would shorten the syntax since you didn't need to instantiate the space.

However, it's possible to make an instance of Ps and use the same interface. But I'd need to change a method or two around to support it. It wouldn't be a hard change. Do you envision using more than one separate probability spaces at once?

As for the docs, lemme know in an issue or PR what's too hard to follow. I tried to make it clear, but I welcome other eyes on it.

As for mathematics, perhaps Naive Bayes is a good place to start? http://suanpalm3.kmutnb.ac.th/teacher/FileDL/choochart822554...

Naive Bayes Classifier is an example of something expressed all in probabilities. I only implemented a decision tree learner using Prolly, but am planning to implement more things using it in the near future.

meesterdude · on March 1, 2015

I would definitely rather instantiate and build that way. I can dream up some examples where I might want to work with more than one type of probability. It would be good design to facilitate that, for sure.

I'm not up on the actual functionality - any reason you couldn't just work with an arbitrary collection of objects? Would be nice to make an AR query and wrap it in a Prolly object and work that way.

So, I'm pondering. When/where would I want to rely on probabilities instead of raw data? Everything I come up with seems like it would just work better with actual data values than deriving probabilities. But, that might just be my mathematical ignorance at play.

Cool gem, thanks for creating!

iamwil · on March 2, 2015

Sure, I'll change it around to have the option to use instantiated PSpace.

As for relying on probabilities, check out Naive Bayes. Hidden Markov Models also rely on probabilities, rather than raw counts.

Thanks for the feedback!

halostatue · on March 2, 2015

You could do what I’ve done with MIME::Types, which is make the class-level methods work against an instance. Most people use MIME::Types as the class-level methods, but it is possible to instantiate and use it otherwise.

iamwil · on March 2, 2015

By "work against an instance", do you mean that you instantiate the instance in the class-level methods? So at most, the class-level methods are just wrappers around the instance methods, emulating a singleton?

halostatue · on March 6, 2015

Yes. Sorry I didn’t see this earlier, but that’s exactly it.

stdbrouw · on March 1, 2015

In the wild I think you'd be more likely to use something like PyMC or Stan, which work like Prolly but support arbitrarily complex models.

Still, love the idea.

To learn more about probability theory and Bayes, "Probability Demystified" is pretty good. (The Demystified series is McGraw-Hill's take on For Dummies.) To learn more about probabilistic programming, try http://camdavidsonpilon.github.io/Probabilistic-Programming-.... Or Google for MCMC.

I dunno how you'd use it in everyday programming, but it works really well for any sort of predictions. One of the examples in the "Probabilstic Programming and Bayesian Methods for Hackers" book is ranking a bunch of Reddit posts based on ups/downs but taking into account that fewer total votes equals more uncertainty.

iamwil · on March 1, 2015

Yeah, this is a new library. I don't expect people to use it in the wild just yet.

As for how to use it in everyday programming, I think machine learning techniques and algorithms could probably be made easier. But besides that, I was envisioning just as you can use regular expressions to do a "fuzzy match" on a string, I had wanted something that could do a "fuzzy match" on hashes, so I could branch based on a classification, rather than an exact match using "==". Something like a spam filter would be a common application. Your example about reddit would be another. Basically, making decision not just with the information on hand, but also based on prior examples.

brandonb · on March 1, 2015

Super cool idea! I love the fact that the interface is so simple and intuitive; I think that makes machine learning accessible to a much longer tail of problems that people would not have previously considered.

iamwil · on March 1, 2015

Thanks! I had tried a couple times to write this, but had some false starts. I tried to make it less verbose, and it's as good as I could get it for now.