Little’s Law: An insight on the relation between latency and throughput

jbert · on Nov 2, 2014

To me, the interesting thing in terms of building reliable, performant systems is what do you want to do when the request rate (incoming desired throughput) exceeds your possible throughput.

Broadly, you have to choose between quickly failing the additional requests, queueing them up yourself (have a fixed number of requests you process at a time) or you just attempt to process them as they come in.

- If you don't plan for this eventuality you'll basically end up doing the latter option.

This will overwhelm your processing step, which will then typically cause the latency of all requests you are processing to up. (e.g. because you're over your working RAM set, CPU budget etc). Worse, it will go up in unpredictable ways, often with a sharp hockey stick curve.

- If you queue up requests, you're adding latency to everything right there. Your backend is still cranking away at the same low latency, but your system latency now additionally has the queue dwell time added in. On the plus side, you'll avoid unpredictable performance "cliffs" where everything drops down to nothing. On the downside, if your queue gets too long, you'll end up in the unhappy state of very quickly processing stale requests which no-one cares about.

- Simply failing the additional requests protects your system, but exposes error conditions externally. (Of course, so does having an 10s HTTP request time...)

Hybrid approaches can work well. (e.g. running a managed queue but cull the queue (and fail the associated request) before you get into the "queue of death" scenario).

Aaaaand....every approach apart from "just try to process it" requires you to have an understanding of your backend capacity.

You can to test and measure to find a static number, but an additional problem is that the load caused by real-world requests can differ from your test load.

And your backend capacity can be affected by failing components, high load on adjacent components, code deployment, backup, cold cache restart, etc etc.

So it's best probably to have a queueing/failing strategy which takes into account the real-world health and latency of your backend.

Which appears to handle all cases, except that due to the sharp hockey-stick latency in response to overload, by the time you detect a problem it may be too late.

Fun.

robbfitzsimmons · on Nov 1, 2014

Amazing how broadly applicable Little's Law is. (I suppose that's why it's a big-L law.)

We were just recently using it in business school to look at cycle times in assembly line output.

bake · on Nov 2, 2014

Very versatile -- It can also be applied to drug R&D organizations (one Dr. Jeffrey Low has proposed such an analysis)

graffitici · on Nov 2, 2014

The reason for this broad applicability is because it holds for any queueing system. The beauty of the law is that it holds for any distribution of arrival, occupancy, and latency. Just multiplying the averages gets the job done!