Are There Deep Reasons Underlying the Pathologies of Deep Learning Algorithms? [pdf]

Teodolfo · on May 10, 2015

The author clearly doesn't understand the Szegedy et al. result and isn't really saying anything interesting. Pretty much ALL machine learning image classifiers suffer from these pathologies from Szegedy et al. so calling them "deep learning pathologies" is absurdly misleading. Logistic regression has them too as do almost all algorithms that learn to classify images. Human visual systems can also get confused by sensory input that (too us) looks like some object, we just are much better at integrating evidence from other sources to suppress the mistaken neurons.

adorable · on May 10, 2015

I guess you are referring to http://cs.nyu.edu/~zaremba/docs/understanding.pdf

For those interested, the papers and discussion are part of this weekly collection of AI-related news and resources: https://aiweekly.curated.co

Houshalter · on May 12, 2015

He's also referring to this follow up paper which attempts to explain the issue and show that it exists in other models too: http://arxiv.org/abs/1412.6572

nl · on May 10, 2015

I don't think this paper is very useful.

The author is from the OpenCog group, who have spent years building a structured knowledge base as a predecessor to building an artificial general intelligence. It isn't clear how AGI emerges from this work.

The OpenCog KB is a useful piece of work, but it's interesting to note that word/phrase embedding models (word2vec etc) can give similar or better results on most practical tasks that you'd use OpenCog for.

My view is that Deep Learning techniques are insufficient for an AGI, but are good candidates for component parts in the same way that the human optical system does significant preprocessing before hitting the "intelligent" brain.

Also thing like memory networks specifically address some of the episodic memory issues the author raised.

Animats · on May 10, 2015

"I don't think this paper is very useful."

That's probably correct. The author discusses a known problem; those mis-labeled images are well known, and have been discussed on HN before. It's clear that feature extraction in deep learning sometimes fastens on irrelevant features that somehow work. Some algorithms result in models where data points are too close to at least one decision boundary in a high-dimensional space, which makes them brittle when faced with small, noise-type changes. I don't know enough about the subject to know how that will be fixed, but there are people working the problem and they don't seem to be stuck.

Then the author, who is from OpenCog, goes on to claim, without supporting evidence, that OpenCog can somehow fix the problem. The paper proposed an "internal image grammar", but doesn't say much about what they mean by that or how to do it. Trying to decompose images into some symbolic representation has a long, disappointing history. The computational neural network people are getting results without doing that.

I think we're reading a grant proposal here.

nl · on May 10, 2015

Regarding the internal image grammar:

I know some people working using everyday knowledge to attempt to do better image labeling. The idea is that a tree is much more likely to appear in a park than in a kitchen, so you can bias the probable interpretations using that.

I guess that's what they could be talking about. But you need a CNN to get the basic partial label in the first place.

ilaksh · on May 10, 2015

'Structured knowledgebase' is a completely ridiculous and inaccurate way to characterize probabalistic logic networks and the OpenCog component architecture.

nl · on May 10, 2015

PSL isn't magic. It's just a way to do reasoning taking probability and/or confidence into account.

I'd note that Stanford's DeepDive explicitly claims to be a KB building tool, despite it doing probabilistic inference. In their case they do Gibbs sampling instead of PSL, but the similarities are clear.

Also, people associated with OpenCog (including the author of this paper) often refer to OpenCog knowledge bases. Eg https://opencog.wordpress.com/2008/09/14/progress-update/

ilaksh · on May 10, 2015

It is a deceptive description. I said PLN not PSL.

bra-ket · on May 10, 2015

Semantic hashing by similarity is one way to create more sensible representations (Re: Proposition 2): http://www.cs.toronto.edu/~rsalakhu/papers/semantic_final.pd...

This would also fit well into memory networks: http://arxiv.org/abs/1410.3916, SDM: http://en.wikipedia.org/wiki/Sparse_distributed_memory or global workspace model: http://en.wikipedia.org/wiki/Global_Workspace_Theory

1971genocide · on May 9, 2015

"So one could view them as just being mathematical pathologies found by computer science geeks with too much time on their hands."

How is this a real paper ?

fiatmoney · on May 10, 2015

It's a valid point - if the failure cases aren't realistic they're not relevant. I would rather have an interesting paper with a colloquialism or two thrown in than yet another tiny methods variation paper written in Proper Academese. And by the looks of it this is basically a white paper or long, footnoted blog post in PDF form, not a submission to Nature.

teh · on May 9, 2015

I'm not sure I understand how image grammars (whatever that is exactly) suddenly pop up as a solution after such a long introduction. I could not find any evidence or literature about them in relation to learning.

The author is stating a hypothesis but no way to test it. I'm not sure what point he's trying to make.

bra-ket · on May 10, 2015

well if you take "sparse autoencoders"(http://web.stanford.edu/class/cs294a/sparseAutoencoder.pdf) they basically give you principal components of the data, so from blobs of raw pixels you get a "dictionary" of lines in different orientation and size, like an edge detector. This is also similar to how the famous Fourier transform works ( http://en.wikipedia.org/wiki/Fourier_transform), it decomposes the raw signal like speech into its base forms or "harmonics". And if you add another layer (to a stacked autoencoder) it will extract higher level forms(e.g. basic shapes like triangles, ellipses etc) and so on until you get a dictionary of different forms, with each layer kind of compressing the signal into more compact summary. At the higher layer you can arrive at the abstract "chair" or "cat" representation which is based on all these lower level forms: shapes, lines and dots.

Then once you got this dictionary of image "words", next thing is to infer how these words interact with each other, i.e. build a grammar (it's also called "grammar induction" in natural language processing http://techtalks.tv/talks/deep-learning-of-recursive-structu...).

By learning a grammar you essentially define a concept of "chair" or "cat" at a higher level of abstraction by determining how these forms relate to their world (i.e. to other forms), e.g. you can determine that "cat sits on a chair" is a legal phrase in the grammar and "chair sits on a cat" is not.

So extracting a grammar (visual or linguistic) from training data is equivalent to restricting the system to common sense reasoning which operates on concepts in terms of "production rules" of the grammar: http://en.wikipedia.org/wiki/Production_(computer_science), and it is a basic goal of AI.

quonn · on May 10, 2015

Looking at Figure 1, it seems that the classifications "baseball" and "electric guitar" are not that silly (but the certainty of 99.6% is). The shown images could probably be generated from a guitar and a baseball by warping and zooming.

akyu · on May 10, 2015

So he is basically saying Jeff Hawkins was right all along. At least on SDRs