Something all architecture astronauts deploying microservices on Kubernetes should try is benchmarking the latency of function calls.
E.g.: call a "ping" function that does no computation using different styles.
In-process function call.
In-process virtual ("abstract") function.
Cross-process RPC call in the same operating system.
Cross-VM call on the same box (2 VMs on the same host).
Remote call across a network switch.
Remote call across a firewall and a load balancer.
Remote call across the above, but with HTTPS and JSON encoding.
Same as above, but across Availability Zones.
In my tests these scenarios have a performance range of about 1 million from the fastest to slowest. Languages like C++ and Rust will inline most local calls, but even when that's not possible overhead is typically less than 10 CPU clocks, or about 3 nanoseconds. Remote calls in the typical case start at around 1.5 milliseconds and HTTPS+JSON and intermediate hops like firewalls or layer-7 load balancers can blow this out to 3+ milliseconds surprisingly easily.
To put it another way, a synchronous/sequential stream of remote RPC calls in the typical case can only provide about 300-600 calls per second to a function that does nothing. Performance only goes downhill from here if the function does more work, or calls other remote functions.
Yet, every enterprise architecture you will ever see, without exception has layers and layers, hop upon hop, and everything is HTTPS and JSON as far as the eye can see.
I see K8s architectures growing side-cars, envoys, and proxies like mushrooms, and then having all of that go across external L7 proxies ("ingress"), multiple firewall hops, web application firewalls, etc...
I think folks often make trade offs with their working requirements.
If you provide an end result response from your web app to a user's browser in 50ms-100ms (before external latency) then things like 200 microseconds vs 4 milliseconds have less of a meaningful difference. If your app makes a couple of internal service calls (over HTTP inside of the same Kubernetes cluster) it's not breaking the bank in terms of performance even if you're using "slow" frameworks like Rails and get a few million requests a month.
I'm not defending microservices and using Kubernetes for everything but I could see how people don't end up choosing raw performance over everything. Personally my preference is to keep things as a monolith until you can't and in a lot of cases the time never comes to break it up for a large class of web apps. I also really like the idea of getting performance wins when I can (creating good indexes, caching as needed, going the extra mile to ensure a hot code path is efficient, generally avoiding slow things when I have a hunch it'll be slow, etc.) but I wouldn't choose a different language based only on execution speed for most of the web apps I build.
MicroServices are great, when your "app" is actually 500 different apps - and the user could be none the wiser that they are talking to 500 different one man applications. You probably need a few helper services in this world for common data access, authorization, sending notifications etc. in this environmen - but these things might also be standard libraries.
When Microservices go awry, it's often because one "service" has been broken up to meet some arbitrary org structure that will change in 6 months. In these cases the extra overhead of the microservices becomes additive to the user, and hitting latency budgets becomes exceptionally difficult. Costs increase, and in 12 months the team decry's the non-sensical service boundaries.
Traditional enterprise companies are agglomerations of IT systems from a sprawling network of acquisitions, subsidiaries, and partners. They collect fad languages, architectures, and proprietary ecosystems from across decades of computing history. And then try to somehow make them all play with each other.
At least in our world we have the source code to all the services. They have explicit and intentional APIs. They're constructed from a small set of frameworks and speak an even smaller number of protocols. Our enterprise brothers have none of that. Screen scraping, retrofitting TCP/IP stacks onto things that never had them, patching binaries whose source is long gone, etc.
In my case, microservices are often asynchronous messaging applications serving hundreds, thousands, or _maybe_ tens of thousands of transactions per day. Message processing time matters much less to me than reliability and separation of concerns, generally. Kubernetes is great for this.
Its a different world if I have to deal with synchronous user response time.
> in the typical case can only provide about 300-600 calls per second to a function that does nothing
This is a provocative framing but I'm not sure it makes sense. Functions aren't resources; they don't have throughput or utilization. It would be bad if a core could only call the function 300-600 times per second, but that is why we have async programming models, lightweight threads, etc. So that the core can do other stuff during the waiting-on-IO slices of the timeline. Which, as you mention, dominate.
It would also be bad if a user had to wait on 300-600 sequential RPCs to get back a single request, but like... don't do that. Remote endpoints are not for use in tight loops. There are cases where pathological architectures lead to ridiculous fanout/amplification, but even then we are usually talking about parallel tasks.
There is overhead to doing things remotely vs. locally. But the waiting isn't the interesting part. It's serialization, deserialization, copying, tracking which tasks are waiting, etc. A lot of performance work goes on around these topics! Compact and efficient binary wire protocols, zero-copy network stacks, epoll, green threads, async function coloring schemes, etc. The upshot of this work is also, as is typical in web/enterprise backend world, not so much about the latency of individual requests (those are usually simple) but about the number of concurrent requests/users you can serve from a given hardware footprint. That is normally what we're optimizing for. It's a different set of constraints vs. few but individually expensive computations. So of course the solution space looks different too.
Being fair, for many of the things that it is worth using a microservice for, you should already have some sort of dominant factor to the call that would more than justify the added latency of the remote call. Be it a database read/write or some other heavy calculation.
Granted, this is exacerbated when architectures don't make a good division between control/compute/data planes.
Control plane, which is exposed to users, should almost certainly be limited to a single (or handful, at most) microservice calls. Preferably to the fastest storage mechanism that you have, such that what latency it does add is minimized entirely.
Is it though? This goes back to my point of architects and developers having internalised thoroughly outdated rules of thumb that are now wrong by factors of tens of thousands or more.
This is not a simple problem to solve efficiently using traditional RDBMS query APIs because they're all rooted in 1980s thinking of: "The network is fast, and this is used by human staff doing manual data entry into a GUI form."
Let's say you're writing an "app" that's given a list of, say, 10K numbers to check. You have a database table in your RDBMS of choice with a column of "banned phone numbers". Let's say it is 100 million numbers, so too expensive to download in bulk.
How would you do this lookup?
Most programmers would say, it's an easy problem to solve: Make sure there is a unique index on that column in the database, and then for each row in the input run a lookup such as:
SELECT 1 FROM BadNumbers WHERE PhoneNumber = @numbertocheck
So simple. So fast!
Okay, that's 10K round-trips on the network, almost certainly crossing a firewall or two in the process. Now it'll take minimum of 1 millisecond per call, more like 2ms[1], so that's at least 20 seconds of wait time for the user to process mere kilobytes of data.
Isn't that just sad? A chunk of a minute per 100KB of data.
Like I'm saying, nobody has internalised just how thoroughly Wrong everything is top-to-bottom. The whole concept of "send a query row-by-row and sit there and wait" is outdated, but it's the default. It's the default in every programming language. In every database client. In every ORM. In every utility, and script, sample, and tutorial. It's woven throughout the collective consciousness of the IT world.
The "correct" solution would be for SQL to default to streaming in tables from the client, and every such lookup should be a streaming join. So then the 100KB would take about 5 milliseconds to send, join, and come back, with results coming back before the last row is even sent.
PS: You can approximate this using table-valued parameters in some RDBMS systems, but they generally won't start streaming back results until all of the input has arrived. Similarly, you can encode your table as JSON and decode it on the other end, but that's even slower and... disgusting. The Microsoft .NET Framework has a SqlBulkCopy class but it has all sorts of limitations and is fiddly to use. But that's my point. What should be default case is being treated as the special case because decades ago it was.
[1] If you're lucky. But luck is not a strategy. What happens to your "20 seconds is not too slow app" when the database fails over the paired cloud region? 1-2 ms is now 15 ms and so those 100K round trips will cost two and a half minutes.
While I agree that databases could absolutely be improved to make streaming query results as described better, that isn't a limiting factor here IMO.
I'd tackle that problem by batching my queries to the database into some logical batch size and send them as table valued parameters. If I had 10k phone numbers to check and minimum latency matters, why not batch into queries of 500-1000 values per-query? That cuts down time to first response, while reducing the network roundtrips.
The issue with taking this out of the database, is you lose consistency. I don't know about your industry, but I don't think mine would be terribly happy if I was using stale data to validate my Do Not Call/Email list. Now there are some situations where you can just update your list of numbers nightly/weekly/monthly, etc. If you don't need any concurrency or other guarentees, might as well save the time/resources on your DB server.
If I was to rebuild that python script application today.
to try and match 100,000 records against 10 million. If I were to do it in a database driven micro architecture solution. I’m not sure if I could come up with that returns results faster even using up probably a million times more clock cycles.
Yes, that is a good explainer on the horrors of single value lookups to a database. It isn't the only way to do that though, as explained in my other post.
I absolutely agree that a DB (even an extremely efficient one) is going to use many more clock cycles to return results there than a local data structure + application. No questions asked.
But how is that list kept up to date? If a user wants to be placed on the list, do you do that in real time to your local data structure? Do you wait for a batch process to sync that data structure with your source of truth?
I'm just saying that a simple program like that will be faster because it lacks a lot of the features that people would consider necessary in today's world.
The database is an amazing general tool that can be used to tackle whole classes of problems that used to require specialized solutions.
I cringe every time a senior developer thinks he’s more clever by using an @Annotation, a Jointpoint, and Spring’s Aspect-Oriented Programming to solve issues. Not only it appears in the stacktrace as 10 method calls, but we’ve also forbidden the GOTO, and yet senior developers keep implementing the @COMEFROM [1] instruction. Both slow and impossible to debug.
Okay, but the proportion of overhead is what matters here.
If you have a CPU or memory size bottleneck and a parallelizable workload, it makes plenty of sense to split the work across multiple machines and coordinate them over the network. If your job was going to take 20 minutes for 1 machine to do and you can fan it out to 100 machines and accomplish the same in 12 seconds per machine plus an extra fraction of a second in communication overhead, that’s a huge win in total latency. The overhead doesn’t matter.
If you have a trivial workload that can be handled quickly on one machine, but you unnecessarily add extra network hops and drop your potential throughput by 100x, then it’s a huge loss.
A ping server that makes a bunch of international RPC calls before replying is the worst case scenario for overhead.
The phrase "goes downhill" means things only get worse. The OP is suggesting that since the ping is already slow, a function that does something is going to be much worse.
But for most system you don't want performance to get worse. You want to make performance get better.
The ping is just being used to measure of the foundations of the system and the OP is pointing out don't expect to see a 'high performing' system when the ping has identified the foundations are broken.
If you fix the ping and then system is automatically fixed.
E.g.: call a "ping" function that does no computation using different styles.
In-process function call.
In-process virtual ("abstract") function.
Cross-process RPC call in the same operating system.
Cross-VM call on the same box (2 VMs on the same host).
Remote call across a network switch.
Remote call across a firewall and a load balancer.
Remote call across the above, but with HTTPS and JSON encoding.
Same as above, but across Availability Zones.
In my tests these scenarios have a performance range of about 1 million from the fastest to slowest. Languages like C++ and Rust will inline most local calls, but even when that's not possible overhead is typically less than 10 CPU clocks, or about 3 nanoseconds. Remote calls in the typical case start at around 1.5 milliseconds and HTTPS+JSON and intermediate hops like firewalls or layer-7 load balancers can blow this out to 3+ milliseconds surprisingly easily.
To put it another way, a synchronous/sequential stream of remote RPC calls in the typical case can only provide about 300-600 calls per second to a function that does nothing. Performance only goes downhill from here if the function does more work, or calls other remote functions.
Yet, every enterprise architecture you will ever see, without exception has layers and layers, hop upon hop, and everything is HTTPS and JSON as far as the eye can see.
I see K8s architectures growing side-cars, envoys, and proxies like mushrooms, and then having all of that go across external L7 proxies ("ingress"), multiple firewall hops, web application firewalls, etc...