A more accurate title might be "Convolutional Neural Network Architectures" or "Neural Network Architectures for Computer Vision" but still a nice overview!
One thing I'm confused about is that everyone seems to treat "Convolutional Neural Networks" as synonymous with or as being the thing that enabled "Deep Learning", but convolutional neural networks are only for image processing, right? Are many layer ("deep") networks useless outside of image processing? Were there other break-through techniques besides convolutional nets that are necessary for deep networks to work well?
While not strictly necessary, those breakthrough definitely helped: dropout and greedy layer-wise pretraining.
Also convolutions are not only used in computer vision. For example, alphago used them: (the paper is called "Mastering the Game of Go with Deep Neural Networks and
Tree Search"). In my opinion, I would say that convolutions should be useful whenever your data has a spatial aspect to it.
Convolution neural networks and many-layered networks are useful for things outside of image processing. CNNs are used for acoustic modeling in speech recognition, and character-convolutional layers are used in language modeling. And pretty much all neural networks in use today anywhere are many layered.
As mentioned in the article, using convolutional layers in ANNs was an idea from the 1980s, but networks that could be trained on the hardware available at the time were never all that competitive until recently. Once we figured out how to train big/deep networks (use GPUs, have lots of data, maybe use pre-training), CNNs started to perform really well. This did make a positive feedback loop: as CNNs started to work better, deeper networks in general started to get more attention, which got more people into CNNs, etc.
Are there many-layered deep networks that aren't convolutional neural nets, or are CNNs practically necessary to make deep networks work? Are there specific extra techniques not necessary for CNNs that are necessary to make deep non-convolutional networks work well?
In natural language processing tasks you see a lot of non-CNN architectures. These usually are designed to be able to deal with sequential data, so some kind of "memory" is needed.
Sometimes you see this combined with a CNN. There has been a few question answering systems that have one or more CNN layers. In don't entirely understand these designs, but presumably the convultional layers are an attempt to understand the different orders of words.
There are lots of techniques that people use to try to make deep networks work well. Mostly theses are about making errors backprog better. One of the most successful recent innovations is the ResNet architectures (https://arxiv.org/abs/1512.03385), and the related highway networks.
There are successful deep networks with pure feed forward non-convolution layers. There are also deep layers of other exotic non-convolutions flavors like LSTMs and GRUs, particularly useful for sequence-to-sequence tasks like machine translation.