I don't see how your endearing enthusiasm is supported by the paper you reference.
It's a paper, so I won't be doing it justice by tl;dr'ing it in three sentences but, in short:
a) One-shot/ meta learning is not a new thing; the paper references work by Seb. Thrun from 1998 [1]. Hardly a six-month old revolution that's taking the world by storm.
b) There are serious signs that they are overfitting like crazy, and
c) their approach requires few examples but they must be presented hundreds of thousands of times before performance improves. That's still nowhere near the speed or flexibility of human learning.
Also, did you notice they had to come up with a separate encoding scheme, because "learning the weights of a classifier using large one-hot vectors becomes increasingly difficult with scale" [2]? I note that this is a DeepMind paper. If something doesn't scale for them you can betcha it doesn't scale, period.
So, not seeing how this is heralding the one-shot-learning/ meta-learning revolution that I think you're saying it does.
___________
[1] Their reference is: Thrun, Sebastian. Lifelong learning algorithms. In Learning to learn , pp. 181–209. Springer, 1998.
[2] Things are bad enough that they employ this novel encoding even though it does not ensure that a class will not be shared across different episodes, which will have caused some "interference". This is a really bad sign.
It's a paper, so I won't be doing it justice by tl;dr'ing it in three sentences but, in short:
a) One-shot/ meta learning is not a new thing; the paper references work by Seb. Thrun from 1998 [1]. Hardly a six-month old revolution that's taking the world by storm.
b) There are serious signs that they are overfitting like crazy, and
c) their approach requires few examples but they must be presented hundreds of thousands of times before performance improves. That's still nowhere near the speed or flexibility of human learning.
Also, did you notice they had to come up with a separate encoding scheme, because "learning the weights of a classifier using large one-hot vectors becomes increasingly difficult with scale" [2]? I note that this is a DeepMind paper. If something doesn't scale for them you can betcha it doesn't scale, period.
So, not seeing how this is heralding the one-shot-learning/ meta-learning revolution that I think you're saying it does.
___________
[1] Their reference is: Thrun, Sebastian. Lifelong learning algorithms. In Learning to learn , pp. 181–209. Springer, 1998.
[2] Things are bad enough that they employ this novel encoding even though it does not ensure that a class will not be shared across different episodes, which will have caused some "interference". This is a really bad sign.