Stanford Class on Deep Multi-Task and Meta-Learning

317070 · on June 10, 2020

The field of meta-learning is still very immature though. I can see why you would already want to start a scholarly discourse on the topic, but I am not sure how useful these techniques are for the students involved. They are still very ad hoc and often unprincipled.

This is a good article on the topic: https://arxiv.org/abs/1902.03477

amcoastal · on June 10, 2020

This is Stanford, they aren't teaching the next generation of applied ML practicioners -- they are teaching the next generation of theorists. This is a perfect class for that and getting their students ahead of everyone else. I'm jealous of them.

frakt0x90 · on June 10, 2020

Assuming 330 is undergraduate level, they're teaching the full gambit of practitioners, theorists, and future drop outs.

goliathDown · on June 10, 2020

300 series are advanced graduate classes. As an undergrad the sight of a 300 is horrifying.

dhosek · on June 10, 2020

It's always amusing how different institutions number their classes. At the Claremont Colleges, I took a class Math 103 - Fundamentals of Mathematics which was an upper division course geared towards preparing students for analysis, abstract algebra, etc. Because I started college having completed my math coursework through linear algebra and differential equations, this was the lowest-numbered math course on my transcript from Claremont and when I started grad school for a teaching credential a couple decades later, the program director thought that it was a remedial math course.

bradchris · on June 10, 2020

Recent Pomona College alum here checking in to say that the course numbering system has not changed (though Math 103 is now Intro to Combinatorics), and anecdotally it's still a point of confusion for those who go on to the grad schools the Colleges feed into.

I've never before paid course numbers too much mind, but it does surprise me there's not yet some widespread standard of to help graduate admissions officers, graduate advisors, and grad students themselves when determining prerequisite eligibilty.

dhosek · on June 10, 2020

Yeah, I was looking for the class to see what the number was and it doesn't appear to exist anymore. I remember being amused that at Mudd, Calculus as Math 1a/b back in my day. It appears to have been renumbered a bit higher since then and now they only offer one semester of calculus (back in the 80s it was radical that Mudd did the Calculus sequence in two rather than three semesters, although I noticed that our local high school offers a third-year high school calculus class covering multivariable calculus).

r00fus · on June 10, 2020

It's surprising to me - 300+ level courses were part of my undergrad required coursework - of course I wasn't studying at Stanford. Are course numbers standardized across academia or unique to an institution?

copperx · on June 10, 2020

Course numbers are not standardized, although there are common numbering schemes. There are some uncommon ones such as MIT's, which uses a number and a dot instead of a subject name. And I've never known what institution the typical "CS 101" numbering scheme applies to.

ludwigschubert · on June 10, 2020

Afaik they're unique—and not monotonously increasing in difficulty either. Here's Stanford's numbering system for reference: https://cs.stanford.edu/academics/courses

bitL · on June 10, 2020

This class is available on SCPD, so it's open to (almost) anyone willing to pay >$5k.

JamilD · on June 10, 2020

Undergrad courses are 100-level, graduate courses are 200-level, advanced graduate courses are 300-level.

ddrt · on June 10, 2020

Odd way to spell Philosophy Grad. To each their own I suppose.

Eridrus · on June 10, 2020

Another paper on the related topic of metric learning arguing that metric learning hasn't actually made any progress and is piggybacking on progress elsewhere: https://arxiv.org/abs/2003.08505

orange3xchicken · on June 10, 2020

This is a pretty good paper, & they bring up many reasonable points, but I think it's important to distinguish deep metric learning from more traditionally formal ml methods for metric learning - there is plenty of progress being made in the context of scalable & provable metric learning algorithms that are robust to noise/corruption & missing data.

Recommend work & talks by Anna Gilbert for anyone interested. Entertaining & good at distilling technical content. Here is her most recent one, but there are other good ones on youtube. https://www.youtube.com/watch?v=Sb1ZhtsZjyM

devalgo · on June 10, 2020

>This is a good article on the topic: https://arxiv.org/abs/1902.03477

Not to nitpick but that article is a year old and the field is moving at lightspeed

317070 · on June 10, 2020

There have not really been a lot of major breakthroughs in meta-learning in the last year, as far as I am aware. The paper is basically saying there was not a lot of progress in the 3 years before that either.

All in all, nobody really has a clue on how to do meta-learning right (or I am not aware of their work). There is progress being made on benchmarks, but some argue that progress is not really tackling the real issue at hand, i.e. learning to learn. Moreover, the current common benchmarks are not really good at untangling the progress in deep meta-learning from the progress in deep learning in general.

devalgo · on June 10, 2020

Isn't GPT-3 exactly the kind of meta learner you're thinking of?

317070 · on June 10, 2020

I would say it is exactly the opposite. :)

It is showing how you can get drastically better at deep meta-learning by being better at deep learning. But it does not really show how you can be better at deep meta-learning outside of the improvements in deep learning.

You can take any deep meta-learning algorithm, take the deep part in it, apply the improvements in deep learning from the last year and claim that you have improved on the deep meta-learning problem this year. Well yes, but actually also no.

It's like trying to find a new antibiotic, and the solution is throwing more existing antibiotics into the same pill. Well yes, it works, but it is also not exactly the problem.

Don't get me wrong though, GPT-3 is amazing work.

panpanna · on June 10, 2020

Asking as someone who does not work with AI but has taken a couple of courses in ML:

What new things will I be able to do after this course?

(in a practical sense, the technical description on the course page I can read myself)

Immortal333 · on June 10, 2020

At glance, It looks like ongoing research of Multi-Task and Meta-Learning will be discussed. Some new tricks on the architecture of NN, Few new methods to train, and some theoretical setting of the area. I will be looking forward to seeing whole series. Maybe share some notes or summary of videos.

mark_l_watson · on June 10, 2020

Good to see Chelsea Finn end up at Stanford. I had breakfast with her and her parents in 2013 when she was an undergraduate at MIT and it was fun to hear what options she was thinking about for her career.

I took a look at the course outline, and except for AutoML, it appears to be a one stop shop for learning multi-task and meta-learning. I just bookmarked the lectures on youtube.

ArtWomb · on June 10, 2020

Humans observe an object once, such as a cup for drinking water, and we immediately grasp its "cupness". We can identify infinite varieties of cups despite variations in morphology, design, utility and context. Simply based on a single learning instance. This absence of any neural theory of inference is at the crux of the problem ;)

Shortcut Learning in Deep Neural Networks

https://arxiv.org/abs/2004.07780

inductive_magic · on June 10, 2020

> Humans observe an object once, such as a cup for drinking water, and we immediately grasp its "cupness"

Adults do (i.e. the agents pretrained holistic model of its entire observed physical context). By reducing the phenomenon to the single observation, you're conveniently ignoring the early childhood phases spent exploring shapes/3d-geometry that enable this very ability of inference. this isn't fair, because regarding humans, the line between training-phase and trained model is very blurry, whereas a statistical model is trained when the weights are set and done.

Brute forcing through 2d-projections of 3d-objects (further denormalized through camera-artifacts etc.) until something sticks in a convoluted (heh) composition of arbitrarily initialized set of nodes and connections is obviously far different from the physical exploration kids do. Comparing the models resulting from the latter with the former is, in a word, absurd.

Through exploration, humans develop a model of physics itself, from which the nature of cupness can be inferred (which is, in fact, the magic term).

Deep learning alone won't get us there, but it'll probably give us the components that enable us to simulate this intricate process happening in kids brains.

In fact, I'm pretty sure that that's what a lot of the smart people researching general intelligence are working on (because that's what I would do, excuse my hybris).

ArtWomb · on June 11, 2020

Good discussion! I'll just respond here, but plenty of though-provoking points all around ;)

I think what I was looking at was the result that has been often observed, that progress in AI research roughly tracks with hardware developments. Looking at AlphaGo to AlphaZero to MuZero. Training time for self-play increases. But parallelism in the tensor units of the hardware is an order of magnitude faster. It's great for problem domains like autonomous vehicles, contactless payments in retail stores and fraud detection in the data center. But what about generalizability? What about the black box communicating how it has learned? Will it be suitable for next-gen applications like robots designed to assist humans in space expansion?

I attended an event in NYC around the creative use of AI by a new breed of emerging artists like Mario Kliegmann from Germany. ArtBreeder can train a GAN on a single input sample and generate paintings in the style of Fragonard or Picasso or Rothko. And someone made a remark along the lines of: "if this had existed in the 1960s, we wouldn't have need Warhol to invent Pop Art!". But in reality, Andy Warhol experimented with a wide variety of media and techniques. From film to "oxidation art". And it struck me that was the truly creative part of the process. One that arises from a place other than rational optimization on a single task or even multiple known tasks.

kensai · on June 10, 2020

This is a very insightful comment. I wonder if artificial intelligence can learn anything on how the development of the brain from a child to an adult functions, by actually pruning connections as well as creating/reinforcing new ones.

Well, that's what partly machine learning already does, right? :)

bulka · on June 10, 2020

> Simply based on a single learning instance.

Did not get this part. I have limited sample of two kids, but I would say it takes at least a year before humans understand "cupness"

ghaff · on June 10, 2020

I assume the parent is referring to kids past some level of neurological development and, of course, it's not necessarily as simple as one and done.

But, in general, deep learning requires far more examples to train image recognition and even then it's relatively fragile. (Not that humans can't be fooled but having models of the world in our brains help a lot. No, that's probably not a flying pig even though it looks like one.)

zamfi · on June 10, 2020

> deep learning requires far more examples to train image recognition

Than kids? Who have have video input 10h/day for years (~1B images) and can also choose their examples actively?

There are many ways in which deep approaches differ from kids (understatement of the year?), but to say that kids don’t see a lot of data seems not quite right. They’ve got a huge “world model” to draw on by the time they are good at one- or few-shot learning.

sp332 · on June 10, 2020

I can go to Flickr and download 3 million photos labelled "cup". How many cups have you seen in your whole life? Probably fewer.

Frost1x · on June 10, 2020

I think the point against these "few shot" or "one shot" learning models is that humans typically have a lot more data than are being credited for.

If I pickup a cup and move it around, pour water into it, turn it over so the fluid falls out, do it again, try some other object, compare them... how many data samples did I just receive in my 1 minute endeavor? The answer is: a lot.

Depending on how continuous (or not) space/time is and how fast your brain processes things... you're potentially being exposed to some extremely large discrete sampling of an infinite set.

How many actual discrete samples? Who knows, can probably come up with some bounding estimates with some work but that assumes we understand the brain better than we do. In addition, there's a reasonable amount of information that you likely inherit genetically so think about those massive discrete samplings over the time of your ancestors' lives and summarization and compression you get through evolution, over many thousands of years. The fact I have fingers, thumbs, and walk upright is a solution set found across a massive problem space and I get that out-of-the-womb (tm).

I don't believe modern ANN models are quite right and I suspect there's quite a bit we don't understand about our own intelligence that isn't or may never be captured here (hey, I want to be irreplaceable) but we have to consider the comparison more holistically.

One of the fundamental issues I believe DNN fail at is the fact they train against on high level problems and typically don't couple together for learning processes (it's not quite practical yet). As a human, when I learn how to do something I incorporate and connect it with future learning processes to help me learn faster or gain novel insights. Computationally this would be connecting stupid amounts of DNNs together in all sorts of ways (GANs are one line of thought similar here but I believe this would be a rich field to explore).

Personally, I'm completely against endeavors to search for a real AGI. I find it hard to believe once such a goal was obtained, it could possibly be good for my long term survival but hey, I'm a bit greedy in that respect.

mLuby · on June 10, 2020

Exactly this. Show a human a thing for which it doesn't already have 10K hours of "training" and they won't do so well.

Monads might be a good example of a thing for which humans don't have a massive headstart on ML due to "years of experience manipulating monad-like things" (as we do for most physical and social concepts). And I don't think anyone one-shot learns a monad.

zamfi · on June 10, 2020

> How many cups have you seen in your whole life? Probably fewer.

How many different cups are there in that dataset?

How many hours have I spent looking at cups? Anecdotally, my 2-year-old spends a huge amount of time playing with cups.

jklehm · on June 10, 2020

Thinking about it from the wrong direction. We have lifetimes of images that are "not cups" so when we get some cups flagged to us it's differentiated against a huge non-cup baseline.

bulka · on June 11, 2020

A pretty interesting podcast from Quanta Magazine suggests that it might not be a simple matter of training set size: https://www.quantamagazine.org/where-we-see-shapes-ai-sees-t...

bitL · on June 10, 2020

One could argue we are either born with some "pre-trained model/architecture" or we spend the first 10 years of our lives "pre-training" (with the first 11 months training the fastest).

arkadyark · on June 10, 2020

This looks super cool! Prof. Finn has been doing a lot of interesting research in RL and meta-learning for several years, it's great to get a chance to learn this material directly from her.