Adversarial Examples Are Not Bugs, They Are Features

pakl · on May 9, 2019

Deep convolutional networks, by design, are unable to integrate contextual and ambient information present in an image (or in preceding images) to inform how to interpret local features they use. So it's no surprise they struggle with unconstrained images. Images where ambient context varies.

It's intriguing how much focus there is on adversarial examples. You don't need adversarial examples to make a deep network fail - in a sense that's overkill. Just point the poor deep network at a sequence of images from the real world -- images from a self driving car, security camera, or webcam. You'll see it make spontaneous errors. No matter how much training data you gave it.

The field will advance when/if practitioners recognize that classifying pixel patterns in isolation isn't sufficient for robust visual perception, and adopt alternative neural network designs that can interpret what they perceive in light of (no pun intended) context and physical expectations.

It worked for our prototype.[0]

[0] https://arxiv.org/abs/1607.06854

MAXPOOL · on May 9, 2019

Learning multilayer convolutional representations of statistical features is roughly equal to taking few first few layers in visual cortex and stacking them. Creating higher and higher stacks is not going to solve vision.

We are essentially building a frog with better and better visual perception in the hope that it could become a taxi driver. It will become a totally amazing super-frog with super-vision, but it's still just a frog with frog-like visual perception and limits. Using pre-attentive feature recognition stage equivalent for complex object recognition can fake human like object recognition when we force it, but it's wrong approach. We get these catastrophic failures because we hit the limits.

Features seem to exist independently from one another in the early processing stages of human perception. They are not associated with a specific object either. Human perception is not gradually turning features into objects like we do in deep learning. Properly distinguishing feature integration from detection and how to do it is a open question.

jacobush · on May 9, 2019

And people will amaze at the totally super-froggy things these super-frogs can do, and understand even less why the super frogs aren't taxiing already. :-)

AstralStorm · on May 9, 2019

They actually are, but the self driving cars use that subsystem as only one component of the whole and most of it is not a super-frog.

pakl · on May 10, 2019

You are making a lot of incorrect statements about brains and vision. I would advise you to study some visual neuroscience.

> Learning multilayer convolutional representations of statistical features is roughly equal to taking few first few layers in visual cortex and stacking them.

No, it isn't roughly equal the first few layers of visual cortex. The first few layers of visual cortex have substantial feedback connectivity from higher areas which affects the responses of even the most peripheral parts. (Citations in our arxiv preprint linked above.). Most of the brain has more feedback connectivity from elsewhere than feedforward ascending connectivity. This qualitatively affects activations.

>We are essentially building a frog...

I suspect frog vision is far more robust than anything we are "essentially building".

> Features seem to exist independently...

Please have a close look at some modern visual neuroscience. Or speak to an good honest electrophysiologist.

sgt101 · on May 10, 2019

Which citations are you referring to? I would be grateful if you could please be specific.

ben_w · on May 9, 2019

What do you mean by “ambient“? If you hadn’t finished your comment with the words “our prototype” I would’ve assumed you meant things such as pictures of wolves having snow in them, and that snow being a clue that they are wolves, but I know that you can’t mean that.

gliop · on May 9, 2019

When you walk into a grocery store, you assume the fruit isn't plastic. When you walk into a furniture store, you do.

Why? Ambient context.

dec0dedab0de · on May 9, 2019

When you walk into a grocery store, you assume the fruit isn't plastic. When you walk into a furniture store, you do.

Why? Ambient context.

That was a really great way to get the point across, Especially because I still sometimes think it's real, even when I know the context.

ben_w · on May 9, 2019

That’s an example, not an explanation. From only that example, I cannot differentiate “ambient context” from “common sense”, which is a phrase that means totally different things to everyone who I’ve seen use it.

heyitsguay · on May 9, 2019

Very agreed with all this. I've been learning the same lessons working on more robust computer vision for biomedical imaging. I bet unsupervised predictive pretraining could be adapted to (static) 3d image volumes. The z axis replaces the t axis, and you predict the next 2d slice from previous ones. Hmm...

As an aside - from the paper it looks like you worked at Brain Corp a few years back. Any thoughts on them and what they're doing these days? I'll be looking for a job again soon and i see a lot of ads for them.

pjc50 · on May 9, 2019

> classifying pixel patterns in isolation isn't sufficient for robust visual perception

This seems to be only a very small step forward from Minsky's negative result about "perceptrons".

AstralStorm · on May 9, 2019

That's because DNN are only a small step removed from multilayer perceptrons as well. (Few more layers, a tiny bit of internal structure, more advanced nonlinear activation function, better training schedule. Much more training data.)

They're not even close to structural or training algorithm complexity of natural neutral networks yet.

_0ffh · on May 9, 2019

That result was not about multilayer perceptrons, but perceptrons. But, whatever.

AstralStorm · on May 9, 2019

Multilayer perceptrons share many of the same problems single layer perceptrons have, such as trouble with high level structure and generating weird nonrobust features. They are much more nonlinear through and thus somewhat more powerful. (I'm imprecise here but it is easy to find papers on this ancient tech from before AI winter.)

DNN is essentially one of these with more layers than just typical 4 for MLP, because we figured out a way to propagate error and training gradients. (Plus a few important but interesting details.) They are not really qualitatively different according to math they use... The main difference is use of gated or not differentiable activation functions with various ways to compute approximate gradients when faced with this feature. Especially convolutional nets are similar to MLP.

marcosdumay · on May 9, 2019

It seems that we are finally at the point where throwing more hardware/data at a dumb algorithm won't give you much better results. This means that there will be space for smarts at AI again. And this is happening at the same time that throwing more money on general purpose hardware is stopping generating good results too, with great opportunities for synergy.

> The field will advance when/if practitioners recognize that classifying pixel patterns in isolation isn't sufficient for robust visual perception

But this, well, it is very clearly sufficient, and we have well accepted results showing this. It just won't work on practice. That probably means the change will be full of fighting while the old ways still work, and lots of failures and unexpected successes.

p1esk · on May 10, 2019

we are finally at the point where throwing more hardware/data at a dumb algorithm won't give you much better results

Recent success of gpt-2 indicates otherwise.

Chirono · on May 9, 2019

There's a good summary of the paper by the authors here, for people who don't want to digest the pdf: http://gradientscience.org/adv/

gambler · on May 9, 2019

>Our discussion and experiments establish adversarial examples as a purely human-centric phenomenon.

Can someone explain to me the difference between this statement and the notion that AI performance in general is completely subjective?

We're trying to train models to do certain things. It doesn't matter if you call them human-centric or not. The important thing is that we have a goal for training. Adversarial examples force models to do other things, i.e. behave in a way that defies the original goal. How is saying "no, misclassifying those bad examples is okay" different from moving the goalposts?

Their process of retraining doesn't make any sense to me either. So what if the model trained on mislabeled data has some degree of accuracy on real data? This just shows that there is some internal symmetry involved.

andybak · on May 9, 2019

I find academic papers fairly indigestible both because of their language and verbosity and because PDF is a fairly horrible format for reading on screen.

So thank you.

p1esk · on May 10, 2019

They get easier to digest after you’ve read a few dozen. Or written a couple.

Macuyiko · on May 9, 2019

Very interesting paper. With some surprising insights (need to read it a couple more times for sure).

The conclusion states:

> Overall, attaining models that are robust and interpretable will require explicitly > encoding human priors into the training process.

I feel that is true, though another part of the solution IMO lies in coming up with classifiers that can do more than output a probability alone. I agree that classifiers being sensitive to well-crafted adversarial attacks is something that can't be avoided (and perhaps even shouldn't be avoided at the train-data level), but the problem lies mainly at the output end. As a user, the model gives no insights towards how "sure" it feels about its prediction or whether the inputs deviate from the train set (especially in the useful non-robust feature set). This is especially a problem given that we stick softmax on almost all neural networks, which has a tendency to over-estimate the probability of the rank 1 prediction which confuses humans. Most adversarial attacks show [car: 99%, ship: 0.01%, ...] for the original image and [ship: 99%, car: 0.01%, ...] for the perturbed image.

Using interpretability and explanatory tools to inspect models is a good start, though I'd like to see more attention being given to:

- Feedback with regards to whether a given instance deviates from the training set, and to which extent

- Bayesian constructs w.r.t. uncertainty being incorporated, instead of only probabilities. Work exists that tries to do this already [1,2] with very nice results, though is not really "mainstream"

[1]: https://alexgkendall.com/computer_vision/bayesian_deep_learn...

[2]: https://eng.uber.com/neural-networks-uncertainty-estimation/

AstralStorm · on May 9, 2019

DBNN are actually mainstream, the issue being they have the same failure modes while also being slow to train.

We just do not know how high level structure of a mind looks, best we have is some sort of data compression entropy model. That's obviously not enough.

Adversarial training model is probably closer (e.g. A3C) but it's not detailed enough either.

Value and policy loss are extremely blunt tools to evaluate an actor or critic, for example

Macuyiko · on May 10, 2019

Totally agree. Except I didn't know DBNN are mainstream. That is, in research they're obviously well known, though I've personally not yet encountered industry settings (companies different from the tech unicorns, that is) that utilize them or even think about these problems. They often end up using the latest well-known architecture (like YOLO) in TensorFlow. That said, we mostly work with retailers and finance-insurance (non-US).

Would be interested to know if your experience differs and in which industries.

AstralStorm · on May 9, 2019

One thing I don't agree with is that notion of robustness is human specified, when they clearly measure robustness of a given feature before classification is changed.

Robustness is a systems statistical notion of amount or degrees of freedom of state perturbation required to change output, also taking into consideration the magnitude of change. It is related to but not same as system theoretical stability. There's nothing human about the definition. Robust features need not be human derived.

The desired degree of robustness vs absolute accuracy or precision or bias trade-off is human specified but generally the trade-off is not huge between these variables.

omnicognate · on May 9, 2019

Their definition of robustness relies on a pre-specified choice of which set of perturbations to be "robust" to. They use the letter delta for this set in the paper, IIRC. This is where the "human" bit comes in: to define robustness in the abstract you would have to establish which perturbations "should" change the categorisation and which ones "shouldn't", which they avoid attempting to do.

logane · on May 9, 2019

One of the first authors here - happy to answer any questions!

nannananannana · on May 12, 2019

Mirage