Hi there! What do you think are currently the best method to fight adversarial w...

logane · on Nov 1, 2017

Another author here - adversarial training is not sufficient to protect against white-box attacks but it seems to be the best method we have so far (https://arxiv.org/abs/1706.06083).

It seems that all current state of the art architectures are vulnerable to adversarial examples; to the best of my knowledge there are no image classification networks that researchers have failed to reliably produce adversarial examples for.

rota219 · on Nov 2, 2017

The best method for neural networks robust to l_infinity bounded iterative adversarial attacks is this https://openreview.net/forum?id=S18Su--CW (although it is still currently under review).

darkmighty · on Nov 2, 2017

Looks like a very nice paper, thanks.

darkmighty · on Nov 2, 2017

I see. I ask because there's a common observation that a human (and perhaps an AGI) would never really even run the risk of confusing say a turtle with a riffle, which I tend to agree with.

Part of it is mistaking the forest for the trees. It may be just an artifact from the requirements we place on image classifiers (which are very lax) and the way we train them, not anything fundamental.

Indeed I believe we tend to think with more solid logic, specially when the decision becomes difficult. A DNN will look at the statistics of a feature set and make judgement upon that. A human can categorically reject certain hypothesis from definition requirements: a Riffle is a weapon. It must have a barrel to guide the projectile, and a muzzle for it to exit. It must have some firing mechanism (usually a trigger). Even if at a glance we get confused by exactly what an image is picturing, we can make quick logical judgements on sub-features to make epsilon-misclassifications almost impossible.

A network that would act that way would need some recursive behavior (to implement the varialble-time classification efficiently), a recursive "logic module" or "language module" plugged into the end of naive feature classification.

readams · on Nov 2, 2017

You're assuming we couldn't make adversarial examples with white box access to your brain. It's entirely possible that such examples do exist.

red75prime · on Nov 2, 2017

We don't need white box access: https://i.imgur.com/mOTHgnfl.jpg

darkmighty · on Nov 2, 2017

That's an adversarial picture, but not nearly as dramatic as the epsilon-adversarial examples -- and in particular there are no logical inconsistencies (confusing a cat for guacamole or a turtle for a riffle). I don't think anyone doubts a form of camouflage or isomorphic deception is unavoidable.

darkmighty · on Nov 2, 2017

I don't doubt there are weaknesses, or even fundamental lack of interpretability for classification spaces that overlap (i.e. have a "morphing sequence"). One example that sticks out from my childhood is this image:

What do you see here?

https://seeklogo.com/images/A/Antarctica-logo-9202568406-see...

My entire childhood I saw a weird face (without thinking too much about it). This is the logo of a brazilian beer brand. When I was a teenager I saw an ad with two penguins, and then it clicked.

Optical illusions are well documented too, some classic examples:

http://www.optics4kids.org/osa.o4k/media/optics4kids/womanil...

https://obasandbox.files.wordpress.com/2012/07/carolines-vas...

http://i.telegraph.co.uk/multimedia/archive/01120/blackballs...

But it's quite probable we'd found any glaring adversarial issues by now. After all artists can conduct a semi-whitebox, mostly blackbox adversarial optimization of illusions (I'm sure some process like this is how they came up with the Old/Young lady illusion). Note however that even those are particular errors like incorrect brightness estimation, or near-complete dichotomy (it's not that either the young or old interpretation are incorrect in some sense). An epsilon-failure seems much more difficult to come up with. The distinction seems mostly in the ability to apply basic logic on top of pure pattern recognition, sort of greatly enhancing the decision boundaries through recursive thought. Eventually logical features (a quick "proof" of sorts) win. You "prove" that what you're seeing indeed can't be a rifle, it must be a weird turtle.

This logical approach probably has gradations in power. In general it might even be algorithmically undecidable whether an image is logically valid. Generalizations of Escher illusions:

https://upload.wikimedia.org/wikipedia/en/e/e8/Escher_Waterf...

come to mind. In practice we cut off the decision process when we've found main logical features and connections, so unless the image inspires superficial uncertainty a deep logical inconsistency could slip by (that a more intelligent person may not miss by universally applying a deeper consistency check).