>More memory means more parameters, means you enter the interpolation scheme mea...

PartiallyTyped · on July 30, 2023

It is also the case that many of such local minima are locally identical in structure and you can jump between one another via permutations of the parameters.

magicalhippo · on July 30, 2023

As a simple example, if you have a NN with one input and output neuron and a single two-neuron hidden layer (with same activation function), you can swap the weights (and bias) of the two neurons in the hidden layer and the result will be the same. Right?

Is there something to gain by trying to eliminate or exploit such symmetries?

PartiallyTyped · on July 30, 2023

These symmetries come up in lotteries. The lottery ticket hypothesis says that within an overparameterized ann exists a smaller sparser neural network that behaves at least as good as the original one and learns faster.

Re the example; yes that is correct; a permutation is simply a row-wise shuffled identity matrix, it doesn’t affect the gradients or performance.