Very impressive, but it has also some limitations, the softmax output that shows the confidence is often unreliable (tested empirically). This is an example of misclassification that has a score of 99%: https://imgur.com/a/wnr8K
I imagine that’s because it’s difficult for the model to decide which features are important. Understanding my human bias in looking for an intuitive explanation, the “circle” in this example looks most like the circle in the picture it identified as most similar. I imagine salience assignment is difficult without either more examples or injecting prior knowledge.
This work is surely interesting, but don't let the sophisticated formulation fool you, meta-learning is not the best performing option for few-shot classification, ProtoNets and other simple matching strategies achieve far better performance.
I looked up your claim. The ProtoNets paper by Snell et al. reports 1 shot accuracy 49.42 and 5 shot accuracy 68.20 on miniImagenet, while the new Reptile paper reports 48.21 and 66.00 respectively.
I wouldn't call Reptile sophisticated, the method actually looks really simple (perform a couple of steps of SGD per task, and use these updates as gradients in the outer loop).
Ok, I admit I hadn't looked up the actual numbers before I posted that comment, but the Snell paper wasn't the last word on that line of work, it was followed by https://arxiv.org/abs/1703.05175 (which doesn't have miniImagenet results). And there may have been further work in that direction that I'm not aware of.
You're right that Reptile is the simplest recent algorithm in the meta-learning literature, but I would argue that's basically my point, they started from somewhere pretty ambitious (lets learn a learner, or at least an SGD update rule), and ended up with learning an initialization that can be updated well with a few steps of SGD.
[EDIT]: I also prefer Matching/ProtoNets style work as being simpler to deploy, since you don't need to retrain to add new classes. Maybe one day Meta-learning will be SoTA, but there's a lot of world class researchers on it, and the approaches keep tending away from actual meta-learning IMO, so my money is on the matching approach. Though my money is on integrating with data stores in general and not needing to squish everything into weights, so I'm a bit biased here.
I think it’s more about learning a variety of tasks. And I like the emphasis on getting at higher-order derivatives with only first-order methods, which as an abstract idea has a variety of applications.