Neural Network Architectures

rustyfe · on Sept 2, 2016

Could anyone recommend a starting point on Neural Networks for the uninitiated? The parts of this I understood were fascinating, but I quickly realized I was looking up every third word, and not really absorbing much.

If I could only read one thing to gain the technical grounding for this history, what should it be?

trophygeek · on Sept 2, 2016

This was the one that flipped the lightbulb for me.

`Hacker's guide to Neural Networks` http://karpathy.github.io/neuralnets/

trophygeek · on Sept 2, 2016

Actually, read that one 2nd. Start here: https://ujjwalkarn.me/2016/08/11/intuitive-explanation-convn...

WhoBeI · on Sept 2, 2016

That one was very good and links to a nice demo http://scs.ryerson.ca/~aharley/vis/conv/flat.html

rustyfe · on Sept 2, 2016

Hey, the premise was I could only read one thing! I kid, much obliged.

trophygeek · on Sept 2, 2016

lol. point

Read the 2nd one then.

eddiepierce · on Sept 2, 2016

Karpathy's course notes for Stanford's Convolutional Neural Networks for Visual Recognition is a great intro [1].

It starts with with linear classification, then moves to neural nets, and then explains convolutional neural nets.

[1]: http://cs231n.github.io

billconan · on Sept 2, 2016

this should get you started:

http://neuralnetworksanddeeplearning.com/

vonnik · on Sept 3, 2016

[Disclosure: I wrote this.]

http://deeplearning4j.org/neuralnet-overview.html

Also, this book is coming:

https://www.amazon.com/Deep-Learning-Practitioners-Adam-Gibs...

j_juggernaut · on Sept 2, 2016

How about the textbook http://www.deeplearningbook.org/.

It introduces you to some of the underlying principles which haven't changed much over time. I highly recommend it if you want to get deeper intuitions on the principles of CNN, LSTM/RNN, Restricted Boltzmann Machines etc. Also, Hinton's Coursera lectures, though not sure if you can access it anymore.

Joof · on Sept 3, 2016

It was on academic torrents I believe.

llSourcell · on Sept 2, 2016

https://www.youtube.com/watch?v=h3l4qz76JhQ

ramblenode · on Sept 2, 2016

The other suggestions in this thread are quite good. I'll add "Machine Learning" by Murphy. It's not strictly about neural networks but it's an ML classic and a rigorous introduction to the subject that will give you a principled understanding of the statistical fundamentals. For actual NN implementation the Karpathy and Nielson sources are excellent.

p1esk · on Sept 2, 2016

Definitely not the starting point for someone who has no clue. Murphy is probably the very last thing one would read before becoming an expert, and start publishing ML papers.

imh · on Sept 2, 2016

This course is a great first step in learning about neural nets: https://www.cs.ox.ac.uk/people/nando.defreitas/machinelearni...

gok · on Sept 2, 2016

A more accurate title might be "Convolutional Neural Network Architectures" or "Neural Network Architectures for Computer Vision" but still a nice overview!

AgentME · on Sept 2, 2016

One thing I'm confused about is that everyone seems to treat "Convolutional Neural Networks" as synonymous with or as being the thing that enabled "Deep Learning", but convolutional neural networks are only for image processing, right? Are many layer ("deep") networks useless outside of image processing? Were there other break-through techniques besides convolutional nets that are necessary for deep networks to work well?

pinouchon · on Sept 2, 2016

While not strictly necessary, those breakthrough definitely helped: dropout and greedy layer-wise pretraining.

Also convolutions are not only used in computer vision. For example, alphago used them: (the paper is called "Mastering the Game of Go with Deep Neural Networks and Tree Search"). In my opinion, I would say that convolutions should be useful whenever your data has a spatial aspect to it.

gok · on Sept 2, 2016

Convolution neural networks and many-layered networks are useful for things outside of image processing. CNNs are used for acoustic modeling in speech recognition, and character-convolutional layers are used in language modeling. And pretty much all neural networks in use today anywhere are many layered.

As mentioned in the article, using convolutional layers in ANNs was an idea from the 1980s, but networks that could be trained on the hardware available at the time were never all that competitive until recently. Once we figured out how to train big/deep networks (use GPUs, have lots of data, maybe use pre-training), CNNs started to perform really well. This did make a positive feedback loop: as CNNs started to work better, deeper networks in general started to get more attention, which got more people into CNNs, etc.

AgentME · on Sept 2, 2016

Are there many-layered deep networks that aren't convolutional neural nets, or are CNNs practically necessary to make deep networks work? Are there specific extra techniques not necessary for CNNs that are necessary to make deep non-convolutional networks work well?

nl · on Sept 3, 2016

In natural language processing tasks you see a lot of non-CNN architectures. These usually are designed to be able to deal with sequential data, so some kind of "memory" is needed.

Sometimes you see this combined with a CNN. There has been a few question answering systems that have one or more CNN layers. In don't entirely understand these designs, but presumably the convultional layers are an attempt to understand the different orders of words.

There are lots of techniques that people use to try to make deep networks work well. Mostly theses are about making errors backprog better. One of the most successful recent innovations is the ResNet architectures (https://arxiv.org/abs/1512.03385), and the related highway networks.

gok · on Sept 2, 2016

There are successful deep networks with pure feed forward non-convolution layers. There are also deep layers of other exotic non-convolutions flavors like LSTMs and GRUs, particularly useful for sequence-to-sequence tasks like machine translation.

nl · on Sept 3, 2016

LSTMs

Question1101 · on Sept 2, 2016

Isn't data and processing power the most important thing with neural networks? Even if I knew how they worked I would have no idea what to do with them that hasn't been done already as a hobbyst without access to huge amounts of data like companies do.

p1esk · on Sept 2, 2016

No. As a researcher, you can make it your goal to find/invent a smallest possible architecture for a given task (in terms of number parameters, or number of operations). Alternatively, you can try to invent an architecture to learn faster from data (or require less data to achieve state of the art results).

tree_of_item · on Sept 2, 2016

> As a researcher

But the post you replied to specifically said "as a hobbyist", so it doesn't really sound like there's much hope.

Joof · on Sept 3, 2016

Hobbyist researcher? Seems like a more accessible plan anyway; fuck bothering with huge datasets and long training times and just focus on optimizing small architecture.

eegilbert · on Sept 2, 2016

Yes. This is often, perhaps willfully given the incentives, overlooked by computer scientists (and people trained in that discipline). For a complete, cogent argument see:

http://static.googleusercontent.com/media/research.google.co...

nl · on Sept 3, 2016

Yes, you need data. But the amount depends on your task, and there are pretty significant sources of large amounts of data available online.

For example I was at a presentation where a person built a pretty interesting neural model based on 190k clinical records released via Kaggle. In most fields it is surprising how much data is easily accessible.

naveen99 · on Sept 2, 2016

Data is cheap and all around us.

Processing power yes, but you can get started with a gaming pc.

discardorama · on Sept 3, 2016

If you want to play around, take a gander at Kaggle.

taliesinb · on Sept 2, 2016

> The NiN architecture used spatial MLP layers after each convolution, in order to better combine features before another layer. Again one can think the 1x1 convolutions are against the original principles of LeNet

Why would it be against the original principles of LeNet?

troyastorino · on Sept 2, 2016

As far as I understood from the description, in LeNet the convolutional layer lets you avoid training parameters that will effectively be doing the same thing as a convolution. Adjacent pixels are highly correlated, so convolutions can capture most of the information in groups of adjacent pixels without having to train a fully-connected layer of neurons. Effectively, you're kinda downsampling the image without losing information.

So, if you're using 1x1 convolutions, I think you're basically having a neuron per pixel, so you're forcing your fully-connected layers to learn the spacial correlations of pixels, instead of capturing that information in a convolutional layer. In other words, you're wasting training on capturing spacial correlations of adjacent pixels instead of other correlations.

taliesinb · on Sept 2, 2016

> So, if you're using 1x1 convolutions, I think you're basically having a neuron per pixel, so you're forcing your fully-connected layers to learn the spacial correlations of pixels, instead of capturing that information in a convolutional layer.

Saying "a neuron per pixel" doesn't mean anything, really, that way of thinking isn't helpful unless you're looking at small multi-layer perceptrons. The right way to think about things is that you have tensors and layers that compute new tensors from old tensors.

A 1x1 convolution only 'sees' the feature channels of a pixel, and does the same thing to each pixel. So a 1x1 convolution on a grayscale input (e.g. a 1x28x28 tensor in the case of MNIST) does nothing, basically, other than scale and bias every pixel by the same linear function. It doesn't "force the network to learn" anything, it's just totally pointless.

One of the uses of 1x1 convolutions is to collapse the feature dimension when you're deeper in the network (e.g. 100 channels to 10 channels) to reduce number of parameters subsequent layers need operate on. It's a "channelwise fully connected layer".

I think you're thinking of (and perhaps what the author was thinking of) is the practice prior to convnets of collapsing the image into a vector and then doing a fully connected layer on it. That indeed doesn't exploit translation invariance of natural images, requires the net to learn the same features in every required spatial position at great expense, and so on. But that has nothing to do with 1x1 convolutions.

troyastorino · on Sept 2, 2016

Ah yes, you're right, I was thinking of it that way. Thanks a bunch for your clear and thorough explaination, it makes a lot of sense! So if I understand what you're saying, a 1x1 convolutional layer for collapsing 100 channels to 10 channels would take a 100x512x512 tensor and collapse it to a 10x512x512 tensor?

[Also, sorry for attempting to answer your quesiton incorrectly. I was thinking of putting a disclaimer saying I hadn't worked with CNNs and so might be misunderstanding what the convolutions are doing; probably should have haha]

Maybe when the author was saying 'one can think the 1x1 convolutions are against the original principles of LeNet', he was anticipating my kind of confusion? :)

lightcatcher · on Sept 2, 2016

> So if I understand what you're saying, a 1x1 convolutional layer for collapsing 100 channels to 10 channels would take a 100x512x512 tensor and collapse it to a 10x512x512 tensor?

Correct. As I understand it, this would be applying a 1x1 covolution with 10 filters to a 100x512x512 tensor.

partycoder · on Sept 2, 2016

Not directly related, but never a time waste: http://www.scholarpedia.org/article/Encyclopedia:Computation...

phodo · on Sept 2, 2016

Check out: http://lazyprogrammer.me [his courses and books are accessible, good, and relatively cheap]

phodo · on Sept 2, 2016

Oops that was meant as a reply to "rustyfe" on a starting point resource! Can i edit the response?