Queue theory is almost a distraction because it overcomplicates the situation. The underlying insight is that rates matter. If a system can handle 100 events a second, then 99 events per second everything is fine and 101 events per second the system is down. There is a threshold where everything falls apart.
Queue theory encourages people to think in terms of flow rates and has the technically correct de-rates to nominal capacity to account for variance. But normally that is overkill except in highly engineered systems.
The battle in an organisation is convincing people to look at flows at all. In my experience people love metrics that track stock (we have X widgets or can handle Y orders) and not flows (we built A widgets and sold B widgets). Anyone who has training in queue theory knows when it is appropriate to look at flows rather than stocks and that is where the value is. The formulas and variances tend to just scare people away from monitoring flows because they don't understand what is happening.
Of course there are systems where not being able to process everything right when it comes in means that you're "down". It's not a given though and queuing systems are actually perfect for use cases where not being able to process at the speed of incoming requests is completely fine. Eventual consistency is a thing.
I have the same experience though that it seems to be hard for folks to accept that yes, 100k messages in _that_ queue is totally fine. Some process dumped 100k messages to then be processed over the next hour and that's totally fine. That way we don't have to actually spend a small fortune to process them at the speed at which they can be generated.
Of course it depends on the problem at hand. If that particular flow in the system is one that is highly user interactive, then yes, this means 'system down' for your users. Never mind that the system will eventually process it all. I always smile when I see another system than ours and I can spot distributed processing w/ queues in between in how the system behaves in certain circumstances.
If you can only handle 100 events/second and you are getting 101 events/second continuously the queue depth is going to go up by 1 event/second up to infinity, causing your latency to go to infinity.
The moral of that story is don't let your queue go up to infinity. Drop 1/101 events and serve the other 99% with a typical latency. Instead of latency going to infinity to everyone, your service gracefully-ish degrades by serving as many requests as it is able to and quickly reporting an error to those it is not. Once load drops below capacity, you are immedietly back to a fully functioning state.
Granted, depending on how well behaved your clients are, the number of events each second may as the failed requests are retried.
So we should solve the problem at the port of Long Beach by randomly dropping cargo into the ocean until we no longer get container ships piling up off the coast?
Or build sufficient back pressure mechanisms so that the cargo doesn’t get loaded in the first place or better still the goods doesn’t get manufactured and so on.
Send the feedback as far up the source as possible.
The problem there is signal lag. When the backlog at the port is cleared and you signal the factory to start production again, it might take 3 months for the products to start arriving at the port again. If the port gets congested again and you stop production it might take a month for the goods already made and on their way to stop arriving.
But you can also redirect cargo to different ports, or increase the price you charge for servicing cargo. Even if you dump containers you don’t have to do so randomly. Customers could even mark containers as rejectable, or non critical and therefore first to go to a backlog for a discount.
I found the article really interesting. For most IT use cases jobs are generic but thanks for raising the port example.
The signal lag is a huge problem. When you go a few stages through a supply chain, a gradual variation in consumer demand turns into huge whiplashes in demand. This is called "the bullwhip effect" and is a huge problem in supply chain management.
I know you mean it sarcastically, but I am guessing this could solve the problem… the throughput of the ports would increase and the demand shock ameliorated.
I am not talking about infinity. I am not talking about a scenario where you have a constant event stream, which will never vary. You are correct that if you have such a system, then at 101 events/second "you're dead".
I am talking about a bazillion other use cases in which there is variance in the number of events produced at any given time and where the immediate processing of those events is not of utmost importance. "Slap a queue in between" and suddenly you are no longer bound by what your systems can handle, you are only bound by what the queuing system can handle and that is usually much easier to scale (or can already handle this kind of load).
E.g. think about a system needing to synchronize with another external system. Back in the old days (and well I'm sure those systems are still out there today) synchronizing two systems may well have been an overnight job. A nightly sync job. Nowadays many such systems probably try near realtime synchronization but you don't want to make it part of the regular flow either and block the actual operation.
But sometimes there's too many synchronizations that need to happen at once, so a queue builds up over some period and goes down again a bit later. It's still better user experience than a nightly job. E.g. your regular load might be at 10 e/s. During some particularly busy time or specific bulk operations being executed in parallel by multiple people, you might have an influx of 200 e/s for a minute or so.
Assuming you have queues and stuff that can hold the requests in the intermediary time. Some systems will just start dropping the data or crashing and loose data.
If you have a queue or something in the middle then the failure is potentially recoverable or never happens in the first place.
The term here is literally queues, so yes, there's a queue in there. I.e. HornetQ, RabbitMQ etc. I'm talking about distributed systems that are built with queuing in mind and that analyze the specific use case to understand when it is OK to have a queue build up and when it isn't.
That is absolutely not what queuing theory says. In equally simplified terms, queuing theory says that for a system that handles 99 events per second if you size it for 100 it will probably fail because 99 is just the mean, and when you overshoot (in events per second) you don’t have the capability to cover it during the undershoot (in events per second) period because events have piled up.
For a numerical example of queuing theory 101 and why variance is very important rather than something that “scares people away” [0]. Flow tracking is not enough.
It is my example, I can set the variance to whatever I like. It is a D/D/1 queue in case that isn't abundantly clear.
Which, in practice, is a model of queuing systems that often gets most of the value of queue theory and is easy to explain without jargon. Then apply an 80% fudge factor. Unless the situation needs care.
I don’t know anything about queue theory, but I’m not sure how your point differs from GP? You both said that a system designed for 100 and being run at 99 falls apart at 101?
I didn’t understand GPs comment about flows - perhaps that’s where the dispute is.
Genuine question, looking forward to learning more :)
I am saying that the system should fail in terms of SLA even at 99.
In the link I shared there is a simple worked example of 10.3 capacity/hr for 10 events/hr that piles up 28 events. It also shows the problem can be mitigated by analysing it in queuing/probabilistic terms. In real life it is even more complicated obviously!
> The battle in an organization is convincing people to look at flow at all.
I guess that's the point... an organization needs enough people to know about queue theory "to look at flow" and understand that queues are ubiquitous and anticipate that these queues experience a latency phase change at some point depending on utilization (and job duration variance). If an organization doesn't have anyone who understands queue theory, then it's probably more likely to have queue-related failures.
Another random software engineer claiming a fundamental theoretical problem irrelevant...
Queuing theory deals with the stochastic behavior of a discreet event in at a single queue, and possibly multiple ones connected in some structured manner.
It's importance is that it can determine before hand what are the stochastic boundary of the system.
Your whole idea of 99 events and 101 events are so strawman that I don't know where to start poking at the holes...
For one thing, the rate of something has its definition, there is jitter or busyness. To say something has a rate is plainly a impractical concept. As there is nothing in the network that had steady rate...
Edit:
I should stand corrected that the parent does not really miss the meaning of queuing theory. It was just that I only read the first paragraph.
But things do have an average rate. And in my experience, this is very practical information. Almost daily I plug mean rates into Little's law and get useful numbers out. (Yes, I do verify my predictions.)
I guess that's the OP's point that you can't escape from talking about queues and which you sort of validated by bringing up Little's law.
Queues are literally everywhere around us in the physical world. It needs to be understood at a basic level -- Little's law is good enough and only requires primary school math. If execs aren't able to understand such a ubiquitous and simple concept then I don't know what to say.
Average flow rate is indeed useful but please explain that along with a queue and show whoever needs to know how queues can grow unboundedly if output rate is consistently less than input rate. And if needed explain further the upstream consequences of unbounded queue growth.
> The underlying insight is that rates matter. If a system can handle 100 events a second, then 99 events per second everything is fine and 101 events per second the system is down.
If you want to maintain 99.99989% uptime or are about to spend $10 million, yes. If you want the cheap option then measure (arrival rate / service rate) and try to keep that below 80%. Then if that doesn't work adjust accordingly. That isn't really queue theory because it doesn't consider variance and requires effectively no knowledge of literature, what rho means or why people use M/M/1 queues as a default. Don't have to spell Poisson. But it will usually get you to where you want to be.
Most of the time feeling it out in production is a lot cheaper than keeping an in house specialist. If it isn't then sure, maintain a specialist to monitor the situation. It is a relatively rare role though.
There's another advantage to feeling it out in production - you're measuring the actual system, not your theoretical model of the system, so all the extra things you've forgotten are automatically included (assuming you measure the right things, of course!)
And keeping your safety margins nice and wide isn't just inefficiency. It's also building in resilience for when something unexpected does happen. Yes, if you need your six-nines performance on a 1% margin then maybe you should try to optimize this to the nth degree, but realistically almost none of us have to sail that close to the wind.
> The battle in an organisation is convincing people to look at flows at all. In my experience people love metrics that track stock (we have X widgets or can handle Y orders) and not flows (we built A widgets and sold B widgets).
This reminds me of an entrepreneurship class I took in college where the professor emphasized the importance of cash flow, something I hadn't considered much and found somewhat counterintuitive.
I'd love to learn more about this perspective on when to focus on the flows over stocks, are there any resources you'd recommend to help me better understand it?
Queue theory is a lot of fun. There aren't a lot of variables needed to characterise a queue (rate in, rate out, variance of both the rates encoded into a probability distribution). But there are a lot of observable variables (queue length, wait times, utilisation of the servers, probability of there being a queue when a new customer enters the system, probability of the queue overflowing, etc). That means the system is highly over-determined, you need a very small number of observed variables to be able to calculate everything.
Basically all you are doing when you "study queue theory" is training your brain to go very quickly from looking at some random metric to thinking "oh, this is a queue with X distribution, I need to work out the arrival rate and service rate then I know everything".
My personal reference is "Fundamentals of Queuing Theory" by Gross, Shortle, Thomson & Harris. But to learn, just pick some interesting queues (ie, queues with variance in the rates) and start calculating (if cars arrive at a light with X mean rate, what variance means that there will sometimes be a queue extending around the street? Would adding another server at this shop have a big or small difference on how long it takes to prepare my order? Based on the number of checkouts at this supermarket, what rate are their planners expecting - is this a busy period?).
No individual idea in queue theory is that powerful, the advantage is really that when you see a queue you don't waste hours trying to link variables in O(n^2) ways and instead work with arrival/service rates. And you realise that the system works by phase changes rather than anything else, which is in a sense obvious but not actually very intuitive if you haven't identified the queue.
The main question posed to business is the following.:
Do you want to provision your system for the worst case scenario volume, and what is this worst case scenario you are willing to pay to insure your system against.
It’s an easy question, but I don’t see how queueing theory can provide answers to. It is at its root a subjective business decision.
There are secondary non-obvious effects of caching and queueing.
Cold start is an important scenario that lots of people overlook - “we won’t restart it during busy times”, etc. however if you have a bug or outage then that might be precisely when you restart. During a cold start you will often find your system behaves as it does at the tails of the performance curve, either because you are getting 100% cache misses, or because all your queues are full due to a client pile-on or queued upstream work, or some combination.
The article described back pressure; I can’t emphasize that enough as part of any resilient queuing system. Chains of queues are only as performant as their weakest link.
Queueing theory is the mathematical study of waiting lines, or queues. We use queueing theory in our software development, for purposes such as project management kanban boards, inter-process communication message queues, and devops continuous deployment pipelines.
The notes don’t explain what problems queue theory solves, i.e. why would I want to use it, as a programmer. The Wikipedia article’s second sentence is a bit more helpful in that respect: "A queueing model is constructed so that queue lengths and waiting time can be predicted."
We typically track many things about the activities in the queue, and we want to summarize the results by choosing a shortlist of the most relevant ones for our projects.
Dμ = Delivery service rate. Devops teams may say "deployment frequency" or "we ship X times per day".
Rτ = Restore lead time. Site reliability engineers may say "time to restore service" or "mean time to restore (MTTR)".
TheNewStack needs more articles like this one and less articles that talk about some new feature where by adding extra five lines of YAML you get something or other in your Kubernetes.
I worked at IBM in earlier days, and a couple of the Research people in the data storage group at Almaden and Poughkeepsie gave some great lectures and made excellent tools for modeling cache disk storage array performance that were based heavily on queue theory. This brings back great memories, thanks!
These are not mathematical curiousities but actually describe real physical systems.
Additionally, the variance works as a "width" only for unimodal distributions. Any variance based metrics or analysis should only be used after checking for multimodality and, for multimodal data, with extreme caution thereafter.
I feel like you are understanding the title as I first understood it, meaning like, "the external facing portions of infrastructure". However, reading the article, it seems clear that he's referring to the edges of a distribution curve (i.e. Infrequent events that impact experience nonetheless).
From the article: "It’s tempting to focus on the peak of the curve. That’s where most of the results are. But the edges are where the action is. Events out on the tails may happen less frequently, but they still happen. In digital systems, where billions of events take place in a matter of seconds, one-in-a-million occurrences happen all the time. And they have an outsize impact on user experience."
EDIT: IMO, the title is still a little annoying in this respect. I think everyone would agree if a request to your site fails 5% of the time, that is unacceptable, even though it "usually works." The discussion of the distribution curve simply to make the point that spikes in usage cause backed up queues which impact performance isn't necessarily helpful as far as I can tell, and it seems done largely in service to the title. In my mind while reading this, I'm thinking, "Okay, cool, but how does the fact that this interesting issue exists at the edge of the curve help me identify it?" Answer: It doesn't. If you see errors occurring, you will investigate them once they are noticed. Being at the edge of the curve may mean it takes longer to notice, but like, what kind of alerting system are you using that discriminates against rare issues in favor of common ones?
Discussing queues, over provisioning, back pressure, etc. are all super interesting and helpful.
> As a rule of thumb, target utilization below 75%
This is one good reason to rely on serverless. Simply outsource the problem to a system that knows how to handle this better.
> Steer slower workloads to paths with lower utilization
This is fraught with all sorts of perils [0], so take caution going down this valid but tricky route.
> Limit variance as much as possible when utilization is high
This is key, and one of the most elegant solutions to this problem I know of comes from a Facebook talk on CoDel + Adaptive LIFO. [1]
> Implement backpressure in systems where it is not built-in
Making downstream dependencies behave is never an option. Selectively isolating noisy neighbours, if possible, from other well behaved clients, tends to work well for multi-tenant systems [2], in addition to monitoring long work (by dropping work that takes forever) [3], or better yet, doing constant amount of work [4] (aka eliminating modes) [5].
> Use throttling and load shedding to reduce pressure on downstream queues.
One way is to impl admission control (ala Token Bucket) to minimise the impact of thundering herds. Though, clients do not take kindly to being throttled. [6]
A great article; many have been written at this point. Very many still have been burned by queues. Though, in my experience, without an exception, some component somewhere was always building up that backlog! [7][8][9]
I would love to read more on queue theory + web applications - is there anything out there worth reading for a more in-depth understanding (applied to applications) but with key takeways/rules of thumb presented like in this article?
I think this might be an error in the text, HTTP status code 503 can be used for backpressure, which I think is what is being referred too
But yes, TCP uses windows for back-pressure, but that isn't really useful for application level backpressure as the OS controls the queues sizes, so pretty much most systems have their own backpressure on top.
It's not that useless. A lot of applications these days are really using HTTP. And in HTTP/1 if you read from a response body from the socket (or write it), you are really already using the backpressure from TCP instead of having anything on top of it. With HTTP/2 and /3 that's different due to mulitplexing of multiple streams on the same socket - in that case there exists the additional flow control windows on top of TCP. These actually make those protocols rather hard to implement correctly.
It's useful still! If you have an application that talks to another application on the other side of the world, you get to control how fast it sends you stuff with your reading speed. If you stop reading from the socket, the sending application will stop sending before long. It seems like spooky action at a distance but it's just backpressure propagating through many layers.
Queue theory encourages people to think in terms of flow rates and has the technically correct de-rates to nominal capacity to account for variance. But normally that is overkill except in highly engineered systems.
The battle in an organisation is convincing people to look at flows at all. In my experience people love metrics that track stock (we have X widgets or can handle Y orders) and not flows (we built A widgets and sold B widgets). Anyone who has training in queue theory knows when it is appropriate to look at flows rather than stocks and that is where the value is. The formulas and variances tend to just scare people away from monitoring flows because they don't understand what is happening.