Human-level concept learning through probabilistic program induction

cromwellian · on Dec 15, 2015

Isn't it the case that human one-shot classification tasks can rely on a pre-existing corpus of concepts that span a richer and broader area?

For example, someone who looks at a segway, won't just see the pixels of a two wheeled vehicle, and that it is axially unlike a bike, but they may have also read articles about it, and know that two wheeled vehicles without peddles or mechanisms for riders to stabilze them, imply the device has a self-stability function.

Thus, a human being could group a segway or unicycle segway type device, with BB-8 style self-balancing/gyro balls that people can ride as chairs. But to do so, they are relying on reasoning about how the device must work, not just image recognition.

I mean, I welcome all of these advances at getting better at recognition tasks, but I still feel like we're producing "Fast Brain" AI, not "Slow Brain" AI.

smackwagon · on Dec 15, 2015

I was thinking this too. I'd be curious if one could even provide a single clear and unambiguous example of something people learn via "one-shot" classification.

choicewords · on Dec 15, 2015

From personal experience, touching a red hot stove. No further classification was needed.

vonnik · on Dec 14, 2015

Bayesian program learning looks really promising, largely because it requires less data than deep neural networks to learn.

A couple things to note:

1) One of the authors of the paper, Russ Salakhutdinov, has worked alongside Geoff Hinton to produce major advances in deep learning.

2) While the title appears to include a broad set of problems, for the moment their work has been on images.

nickhuh · on Dec 15, 2015

I agree, but I'd expand on that and say that BPL and Probabilistic programming in general benefits hugely from the fact that it is interpretable and thus useful in ways that deep learning is not quite as useful. On top of that, the integration of causal concepts allows the program to generalize to completely novel situations in ways that it's unclear whether it nor other techniques can. For example, can we teach a neural net that if the sun didn't rise tomorrow people would still go to work at 9am? (As long as they weren't too freaked out of course :-) )

laarc · on Dec 14, 2015

This paper introduces the Bayesian program learning (BPL) framework, capable of learning a large class of visual concepts from just a single example and generalizing in ways that are mostly indistinguishable from people.

This is one of the most exciting and readable papers I've come across.

Does anyone know if the code is available anywhere? Can we reproduce their results? I can think of a dozen applications for such an ability.

bdamm · on Dec 14, 2015

The link to the code is in the paper as well. After reading the media hype around the paper I thought I should read the paper, and like you, was surprised to find a very readable paper. Although I think to really understand the mechanism I am going to have to read the code, because without that this just looks like a good parlor trick (because the permutation in stroke output is kind of a fun but minor piece of code, yet a big part of the media breathlessness.)

There should be a flurry of activity as practitioners take these concepts and start applying them to other fields, such as static code analysis. Much of the magic seems to be in the choice of atoms that you feed in to the algorithm.

tensor · on Dec 15, 2015

The take home here is actually that by modelling the physical process of writing you get a more accurate model. It requires fewer examples partly because of pre-training, and partly because of physics hard coded into the model structure. It's not just the atoms that you feed in, but the entire algorithm is designed around drawing glyphs.

It's not entirely clear how you'd apply these concepts to new problems. Certainly in many cases you could come up with more detailed models of the processes involved. But in others, like text understanding, it's not at all clear how you'd make models more sophisticated.

alcima · on Dec 14, 2015

Sorry I should have included the link in my post since I searched and found it.

https://github.com/brendenlake/BPL.git

poppingtonic · on Dec 15, 2015

http://github.com/brendenlake/visual-turing-tests, http://github.com/brendenlake/omniglot and http://github.com/brendenlake/BPL

dang · on Dec 14, 2015

Some articles about this: https://news.ycombinator.com/item?id=10733556, https://news.ycombinator.com/item?id=10728641.

BenoitEssiambre · on Dec 15, 2015

IMO something like this is the future of AI. I've been trying to crack it for a while as a hobby. I also use vision tasks as test but unfortunately my algorithms get stuck in local maximums or coarse representation of images.

I'm not sure this paper's algorithm is that powerful either. It seem to start with a squiggly line grammar that is already close the the final written text-like characters it is trying to learn. This narrows the model space very much and makes the problem easier.

It would be more impressive if it started with an SVG like grammar and learned to draw any kind of image. This is the approach I try to take anyways even if I can't seem to get very far.

The problem is that the search space is so large and so irregular with regards to output that you can't brute force it. You need to somehow cluster similar generative rules of different complexity together then get your algorithm to search the model space in a way that it tries simpler rules first, then goes on to try generating more complex rules but prioritize the ones that are known to have similar output to the simpler ones.

Basically you have to cluster your generative rules into an taxonomy that your algorithm can navigate from the top. It's exponentially inefficient to try to generate "dog" until you have recognized that the simpler "animal" is a good approximation.

I also think that in the ultimate solution, the output leaf statements of the grammar will be parametrized like a normal programming language, so that for example, the algorithm can generate a color, a radius and then generate 100 circles referencing this single color and radius to represent repeated patterns without having to learn their size and color individually when they are clearly homogeneous or invariant across a bunch. The bayesian, occam's razor, solution to a bunch of similar things that you don't have a category for yet is shared parameters. These parameters is how you learn with very few examples. You don't have to learn a full new category to make good predictions, the algorithm can simply notice the homogeneity in a part of a scene and extrapolate immediately.

argonaut · on Dec 15, 2015

Err... Nothing is the "future" of AI. Not even deep learning. If you knew what the future of AI was, you'd have your PhD and faculty position in no time.

BenoitEssiambre · on Dec 15, 2015

Bayesian generative models are logically and mathematically on a better foundation than deep learning's "It works but we don't know why". Bayesian models can be proven to be optimal under more flexible assumptions than deep learning. They are also more in line with human like common sense reasoning.

Not sure how to make them computationally tractable however.

We'll see, I guess...

bmer · on Dec 15, 2015

Can someone who understands this explain it? I tried to read the article, and it felt like there wasn't enough math there to explain what they were trying to do exactly.

jamessb · on Dec 15, 2015

For papers in 'general interest' or biological journals, the mathematical details necessary to understand what was actually done usually go in Supplementary Materials, rather than the paper itself:

https://www.sciencemag.org/content/suppl/2015/12/09/350.6266...

poppingtonic · on Dec 15, 2015

Bayesian Program Learning is probably the most powerful of the technologies that come with the use of fully probabilistic programming languages. This is a subset of a technique known as "probabilistic generative modelling", in which one designs a model based on inference algorithms like MCMC or Particle Cascade, that generates samples based on some simple rule. You can learn more from a pretty good online book called Probabilistic Models of Cognition: https://probmods.org/, also here: http://www.robots.ox.ac.uk/~fwood/anglican/ and here: http://probabilistic-programming.org/wiki/Home.

bmer · on Dec 16, 2015

Very cool, thanks!

alcima · on Dec 14, 2015

Seems like a good direction, seems more intuitive to me. Very classy that it is all in a github!

habitue · on Dec 14, 2015

Getting a certificate error on chrome for android