Please correct me if I am interpreting this incorrectly. I read the paper and it...

readams · on Nov 2, 2017

They're not changing the original network. That would not be very interesting. They're generating objects that fool the correctly trained network.

jon_richards · on Nov 2, 2017

>given access to the classifier gradient, we can make adversarial examples

It seems like they are finding little "inflection points" in the trained network where a small, well-placed change of input can flip the result to something different. With the rise of "AI for AI", I imagine this is something that could be optimized against.

In the turtle example, it seems that google's classifier has found that looking for a couple specific things (mostly a trigger in this case) identifies a gun better than looking for the entirety of a gun. Perhaps optimizing against these inflection points will force the classifier to have a better understanding of the objects it is classifying and lead to better results in non-adversarial situations.