Two questions: 1. You've made an off-hand comment on one of your videos that a s...

jph00 · on Jan 30, 2018

> Why isn't it practical? Is it because the network would have to be too deep, or too wide? Would the optimizer just get stuck in a local minima or would overfitting be inevitable? Or perhaps some combination of issues?

Schmidhuber did a paper a few years ago showing near SoTA performance on computer vision using just a fully connected net. One of our students showed how a convolution is just a weight-tied matrix multiply here: https://medium.com/impactai/cnns-from-different-viewpoints-f...

So the issue is that without the weight-tying, you've got more parameters to regularize (which can decrease performance) and train (which takes longer). So you should use weight tying where you can - e.g. by using convolutions.

In general, domain-specific architectures try to find structure in the underlying data and problem, and use that to decrease the number of parameters we need. The use of implicit factorizations in the inception and xception architectures is a good example.