> Only log actionable items Easy to say but much harder to implement. For exampl...

lmm · on Nov 4, 2016

Set a threshold, and log only once you hit that threshold.

softawre · on Nov 4, 2016

And keep track of that state across 20 different instances?

What we do is just log the failure and have a system like New Relic monitoring everything so that it can alert us when we hit 20% network failure.

lmm · on Nov 4, 2016

Sure - but then the developer-facing "log" is the New Relic interface, and your instances transmit failure information to it via some API (I mean I suppose you could have one program output a plain-text log file and then another program or service parse that to figure out how many errors were happening, but you wouldn't do that for any other kind of inter-system communication).