There are many ways to compress networks - by pruning neurons, by enforcing sparsity, by representing activations and gradients on one bit (or a few bits), and by transfer learning where a large net is transferred into a smaller one.
Yes, my question was more about meta-level algorithms for balancing size against performance. Especially adaptive methods such that we're not just growing up to a limit and stopping, but selectively allocating resources to those parts which need them. Adapting over time would be nice too: "thinking harder" when there are idle resources, but shrinking the results back down under load.
This paper http://dl.acm.org/citation.cfm?id=2830854 kind of has a solution to being more efficient. It has two networks and uses the smaller one (more efficient) to infere first. If the result is accurate with high probability (the probability of one class is much larger than the probability of any other class) then there is no need to run the big (expensive) network.