Improving Deep Learning Performance with AutoAugment

randcraw · on June 19, 2018

It might be fun to exercise this method across an information-theoretic well-bounded set of shapes or object domains to try to quantify its limitations in generating useful independent forms of novelty.

For example, you might use it to formulate a set of wavelets that when combined judiciously would effectively span a well-defined distribution of shapes generated from a small grammar. In so doing, you could quantify the shape variance and identify which augmentation transformations added most value for training (minimally modeling that variance) and which added least.

Maybe you could also combine this with t-SNE to gain some intuition of which 'wavelet' manifested where in the trained net, which resonated most, and in concert with which other wavelets. You could explore this across different CNN sizes and designs, looking for evidence of wavelet ensemble or hierarchy.

With some careful engineering, you could try to force emergent autoencoders to reveal themselves and then explore their interactions.

PaulHoule · on June 19, 2018

Since the 1990's at least, augmentation has been one of the most important "tricks of the trade" in NN and it may be even more important in the deep learning era.

w_t_payne · on June 19, 2018

Yup. :-)

kriro · on June 20, 2018

Direct link to the paper: https://arxiv.org/abs/1805.09501

PDF: https://arxiv.org/pdf/1805.09501.pdf

paradroid · on June 19, 2018

AutoOverfit is more like it.

Q6T46nT668w6i3m · on June 20, 2018

Overfit to ... your augmentation policy? I think that’s a good thing. :)

anchpop · on June 19, 2018

I wonder how large your dataset has to be for this to be useful. You can get by with small datasets in some fields (i.e. retraining the last layer of Mobilenet, you can get good results with 200 annotations), I'd be interested to see how useful this is there.

mlthoughts2018 · on June 19, 2018

This seems like it could dramatically worsen overfitting-like effects for algorithms like CNNs for image processing, where surface statistics of the available data set seem to be more responsible for the learned model than any type of “semantic” understanding.

If you prespecify what data augmentation you would do, like preregistering the details of a clinical trial, you’ll be less susceptible to a spurious result from this.

It seems like especially things like color distribution manipulation would have a potentially very adverse effect that counters any gains from clamping the supervised learning to be “robust” to that color variation.

I’m thinking in the spirit of: < https://arxiv.org/abs/1711.11561 >.

nafizh · on June 19, 2018

Don't exactly remember the paper, but many of the claims in the paper you mentioned were seemed to be proven false. As such, augmentation like this should, if any, reduce overfitting.

mlthoughts2018 · on June 20, 2018

Could you provide a link to where the Bengio paper was “disproven”?

I would be quite surprised to learn that, especially for the experimental result they have with the low-pass Fourier filter on the training set, and also because the Bengio paper is quite recent.

jjn2009 · on June 19, 2018

I agree, as long as you do out of sample testing correctly then using this as a benchmark for finding the correct transformations to perform should reduce overfitting.

That said I can see how if done incorrectly you could easily end up overfitting instead.

tlb · on June 20, 2018

An example they give is that it doesn't apply shear transforms to CIFAR-10, because there aren't sheared examples in the (test) data. That makes it less indicative of real-world vision performance, given that human vision is fairly robust to shear.

An opposite sort of strategy is to apply every augmentation that doesn't increase human error rate on a test sample.

mlthoughts2018 · on June 20, 2018

Sure, I think the approach could work successfully in some cases, but I think specifically allowing it to learn transformations of the input color space would be quite problematic — you are essentially giving it more degrees of surface statistics freedom with which to data mine a better looking overfitting metric during validation steps.

XnoiVeX · on June 20, 2018

Did they share any code?