Hacker News new | past | comments | ask | show | jobs | submit login
Improving Deep Learning Performance with AutoAugment (googleblog.com)
132 points by rusht on June 19, 2018 | hide | past | favorite | 14 comments



It might be fun to exercise this method across an information-theoretic well-bounded set of shapes or object domains to try to quantify its limitations in generating useful independent forms of novelty.

For example, you might use it to formulate a set of wavelets that when combined judiciously would effectively span a well-defined distribution of shapes generated from a small grammar. In so doing, you could quantify the shape variance and identify which augmentation transformations added most value for training (minimally modeling that variance) and which added least.

Maybe you could also combine this with t-SNE to gain some intuition of which 'wavelet' manifested where in the trained net, which resonated most, and in concert with which other wavelets. You could explore this across different CNN sizes and designs, looking for evidence of wavelet ensemble or hierarchy.

With some careful engineering, you could try to force emergent autoencoders to reveal themselves and then explore their interactions.


Since the 1990's at least, augmentation has been one of the most important "tricks of the trade" in NN and it may be even more important in the deep learning era.


Yup. :-)



AutoOverfit is more like it.


Overfit to ... your augmentation policy? I think that’s a good thing. :)


I wonder how large your dataset has to be for this to be useful. You can get by with small datasets in some fields (i.e. retraining the last layer of Mobilenet, you can get good results with 200 annotations), I'd be interested to see how useful this is there.


This seems like it could dramatically worsen overfitting-like effects for algorithms like CNNs for image processing, where surface statistics of the available data set seem to be more responsible for the learned model than any type of “semantic” understanding.

If you prespecify what data augmentation you would do, like preregistering the details of a clinical trial, you’ll be less susceptible to a spurious result from this.

It seems like especially things like color distribution manipulation would have a potentially very adverse effect that counters any gains from clamping the supervised learning to be “robust” to that color variation.

I’m thinking in the spirit of: < https://arxiv.org/abs/1711.11561 >.


Don't exactly remember the paper, but many of the claims in the paper you mentioned were seemed to be proven false. As such, augmentation like this should, if any, reduce overfitting.


Could you provide a link to where the Bengio paper was “disproven”?

I would be quite surprised to learn that, especially for the experimental result they have with the low-pass Fourier filter on the training set, and also because the Bengio paper is quite recent.


I agree, as long as you do out of sample testing correctly then using this as a benchmark for finding the correct transformations to perform should reduce overfitting.

That said I can see how if done incorrectly you could easily end up overfitting instead.


An example they give is that it doesn't apply shear transforms to CIFAR-10, because there aren't sheared examples in the (test) data. That makes it less indicative of real-world vision performance, given that human vision is fairly robust to shear.

An opposite sort of strategy is to apply every augmentation that doesn't increase human error rate on a test sample.


Sure, I think the approach could work successfully in some cases, but I think specifically allowing it to learn transformations of the input color space would be quite problematic — you are essentially giving it more degrees of surface statistics freedom with which to data mine a better looking overfitting metric during validation steps.


Did they share any code?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: