Moving to async model adds new set of operational challenges as well as some interesting failure scenarios. (Edit) Also, in practice you would need at least one more system to enqueue request into the broker, as the latter would typically not be exposed to the outside
world.
Request/response on the other hand is much simpler to configure and operate.
the lb puts a request where it has some reply_to (ip:port) where it waits (blockingly) for response from whoever picked up the request, it just does now know who that is until a reply comes
As an example of a failure scenario, how does your system distinguish between a request timeout, a response that didn’t get sent back because of network failure and the consumer crashing and losing the message?
for {
select {
case reply := <-replyChannel:
if reply.Uuid == r.message.Uuid {
return reply
}
case <-timeout:
return makeError(r.message, MessageType_ERROR_CONSUMER_TIMEOUT, "consumer timed out")
}
}
not much different than what you do with normal http timeouts, you send a request, sometimes a response comes sometimes it doesnt, up to the load balancer to decide if it wants to retry or error out
also, queue does not mean async model, it means a queue, there are many queues in http requests responses (e.g. the listen(2) backlog queue itself) and it does not make it async :)
“Message queues implement an asynchronous communication pattern between two or more processes/threads whereby the sending and receiving party do not need to interact with the message queue at the same time.”
If we're not talking about an async model then the suggestion is much less drastic than it sounded at first. In that case the crux of your desire is simply allowing the hosts to signal readiness more directly.
You would almost never actually wait for host machines to dial in. You would have a list of hosts that are ready or not ready as they would almost always be ready for more. You want to assume readiness (as this lowers latency) and feed the fire hose.
But in this interpretation, in a world where an LB would be using an existing connection to host machines with HTTP/3 we're basically already there. I suppose its trivial and standard to signal unreadiness to the LB from the host with a 429 Too Many Requests response code.
Off the top of my head I'm trying to think how a host could actively signal to an LB that's its ready for more requests... I suppose its trivial and common to use a health check. Is it even a change to say that these need to be updated to achieve your goal of host to LB pulling?
...that's just push model. "Signalling" via "well, the loadbalancer have 10 sessions max per server" is enough.
Pull model just adds unnecesary RTT.
> Off the top of my head I'm trying to think how a host could actively signal to an LB that's its ready for more requests... I suppose its trivial and common to use a health check. Is it even a change to say that these need to be updated to achieve your goal of host to LB pulling?
Like this.
There is rarely a case where you decide to not serve the next request after serving previous one so push is most optimal for short ones. And if it doesn't want to it can just signal that via healthcheck.
Pull makes more sense for latency-insensitive jobs like "take a task from queue, do it, and put the results back", as if you say make video encoding service that dynamically scales itself in the background and "just do one encode and exit" is commonplace.
it is not possible for a remote destination host to signal to a sending host that it is ready, or not ready, for more requests, in a reliable way
readiness is not knowable by a receiver, it is a function of many variables, some of which are only knowable to a sender, one obvious example is a network fault between sender and receiver, there are many more
even the concept of "load" reported by a receiving application isn't particularly relevant, what matters is the latency (and other) properties of requests sent to that application as observed by the sender
health is fundamentally a property that is relative to each sender, not something that is objective for a given receiver
That's push model when per-server maxconn is full tho.
The biggest benefit from pull model is not having to update backend server list every time you add/remove one but outside of that it isn't really all that beneficial.
You also get added latency, unless each backend server is actively listening and connected but if it is, you're just wasting extra RTT to say "hey, there is a request in queue, do you want it?"
Request/response on the other hand is much simpler to configure and operate.