Hacker News new | past | comments | ask | show | jobs | submit login

As others have mentioned, there are approaches like regularisation and dropout which try to do similar things. What I find interesting is the fact there are two reasons to do this: to generalise/avoid-overfitting and to reduce resource usage.

It seems like almost all effort is spent on the former, since everyone's aiming for higher accuracy numbers. Are there any widely-used methods to tackle the latter?

For example, I'm imagining a system which is either given measurements of its resource usage (time, memory, etc.) or uses some simple predictive model (e.g. time ~ number of layers * some constant), and works within some resource bound:

- If we're below the bound, expand the model (add neurons, etc.) to allow accuracy increases (note "allow": it's ok to ignore/regularise-to-zero the extra parameters to avoid overfitting)

- If we're above the bound, prune the model (in a way which tries to preserve accuracy)

- Allocate resources to optimise some objective, e.g. reduce variance by pruning the parameters of the best-performing class/predictor/etc. and using those resources to expand the worst performer.

The closest thing I know of are artificial economies, but they seem to be more like a selection mechanism (akin to genetic programming) than a direct optimisation procedure (like gradient descent on an ANN).




There are many ways to compress networks - by pruning neurons, by enforcing sparsity, by representing activations and gradients on one bit (or a few bits), and by transfer learning where a large net is transferred into a smaller one.


Yes, my question was more about meta-level algorithms for balancing size against performance. Especially adaptive methods such that we're not just growing up to a limit and stopping, but selectively allocating resources to those parts which need them. Adapting over time would be nice too: "thinking harder" when there are idle resources, but shrinking the results back down under load.


This paper http://dl.acm.org/citation.cfm?id=2830854 kind of has a solution to being more efficient. It has two networks and uses the smaller one (more efficient) to infere first. If the result is accurate with high probability (the probability of one class is much larger than the probability of any other class) then there is no need to run the big (expensive) network.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: