Something all architecture astronauts deploying microservices on Kubernetes shou...

nickjj · on June 17, 2022

I think folks often make trade offs with their working requirements.

If you provide an end result response from your web app to a user's browser in 50ms-100ms (before external latency) then things like 200 microseconds vs 4 milliseconds have less of a meaningful difference. If your app makes a couple of internal service calls (over HTTP inside of the same Kubernetes cluster) it's not breaking the bank in terms of performance even if you're using "slow" frameworks like Rails and get a few million requests a month.

I'm not defending microservices and using Kubernetes for everything but I could see how people don't end up choosing raw performance over everything. Personally my preference is to keep things as a monolith until you can't and in a lot of cases the time never comes to break it up for a large class of web apps. I also really like the idea of getting performance wins when I can (creating good indexes, caching as needed, going the extra mile to ensure a hot code path is efficient, generally avoiding slow things when I have a hunch it'll be slow, etc.) but I wouldn't choose a different language based only on execution speed for most of the web apps I build.

lumost · on June 17, 2022

MicroServices are great, when your "app" is actually 500 different apps - and the user could be none the wiser that they are talking to 500 different one man applications. You probably need a few helper services in this world for common data access, authorization, sending notifications etc. in this environmen - but these things might also be standard libraries.

When Microservices go awry, it's often because one "service" has been broken up to meet some arbitrary org structure that will change in 6 months. In these cases the extra overhead of the microservices becomes additive to the user, and hitting latency budgets becomes exceptionally difficult. Costs increase, and in 12 months the team decry's the non-sensical service boundaries.

fendy3002 · on June 17, 2022

I still don't get why MacroServices isn't getting popular as of today

lvturner · on June 17, 2022

I almost prefer to just say "Services" -- though I feel this comment is already at risk of provoking a holy war

judge2020 · on June 17, 2022

I believe that's called a "monolith" and it's quite popular, especially in pretty much every traditional enterprising company.

closeparen · on June 17, 2022

Traditional enterprise companies are agglomerations of IT systems from a sprawling network of acquisitions, subsidiaries, and partners. They collect fad languages, architectures, and proprietary ecosystems from across decades of computing history. And then try to somehow make them all play with each other.

At least in our world we have the source code to all the services. They have explicit and intentional APIs. They're constructed from a small set of frameworks and speak an even smaller number of protocols. Our enterprise brothers have none of that. Screen scraping, retrofitting TCP/IP stacks onto things that never had them, patching binaries whose source is long gone, etc.

lumost · on June 18, 2022

This is a big reason microservices are popular in enterprise. Performance rarely matters via minimizing development costs.

jsight · on June 17, 2022

In my case, microservices are often asynchronous messaging applications serving hundreds, thousands, or _maybe_ tens of thousands of transactions per day. Message processing time matters much less to me than reliability and separation of concerns, generally. Kubernetes is great for this.

Its a different world if I have to deal with synchronous user response time.

closeparen · on June 17, 2022

> in the typical case can only provide about 300-600 calls per second to a function that does nothing

This is a provocative framing but I'm not sure it makes sense. Functions aren't resources; they don't have throughput or utilization. It would be bad if a core could only call the function 300-600 times per second, but that is why we have async programming models, lightweight threads, etc. So that the core can do other stuff during the waiting-on-IO slices of the timeline. Which, as you mention, dominate.

It would also be bad if a user had to wait on 300-600 sequential RPCs to get back a single request, but like... don't do that. Remote endpoints are not for use in tight loops. There are cases where pathological architectures lead to ridiculous fanout/amplification, but even then we are usually talking about parallel tasks.

There is overhead to doing things remotely vs. locally. But the waiting isn't the interesting part. It's serialization, deserialization, copying, tracking which tasks are waiting, etc. A lot of performance work goes on around these topics! Compact and efficient binary wire protocols, zero-copy network stacks, epoll, green threads, async function coloring schemes, etc. The upshot of this work is also, as is typical in web/enterprise backend world, not so much about the latency of individual requests (those are usually simple) but about the number of concurrent requests/users you can serve from a given hardware footprint. That is normally what we're optimizing for. It's a different set of constraints vs. few but individually expensive computations. So of course the solution space looks different too.

taeric · on June 16, 2022

Being fair, for many of the things that it is worth using a microservice for, you should already have some sort of dominant factor to the call that would more than justify the added latency of the remote call. Be it a database read/write or some other heavy calculation.

Granted, this is exacerbated when architectures don't make a good division between control/compute/data planes.

Control plane, which is exposed to users, should almost certainly be limited to a single (or handful, at most) microservice calls. Preferably to the fastest storage mechanism that you have, such that what latency it does add is minimized entirely.

treeman79 · on June 17, 2022

Many many years ago I wrote the companies first do not call list cleaner. Feed it a list phone numbers and it would give you the ones not on the DNC.

Converted the list to band performed a simple binary search to find it.

A basic python script. could handle about 4,000 records a second.

Corporate IT reached out to Oracle. Built a custom solution that cost probably a couple hundred thousand.

They tried to force us to use it. They were a little upset when I asked if they could up the performance by a few thousand percent.

I was on their shit list after that until I had leave.

Tostino · on June 17, 2022

That's a quite simple query to just about any database.

treeman79 · on June 17, 2022

Keep in mind this was 20 years ago.

A couple years before this project I thought SQL sounded hard. So I wrote my own database engine.

jiggawatts · on June 17, 2022

Is it though? This goes back to my point of architects and developers having internalised thoroughly outdated rules of thumb that are now wrong by factors of tens of thousands or more.

This is not a simple problem to solve efficiently using traditional RDBMS query APIs because they're all rooted in 1980s thinking of: "The network is fast, and this is used by human staff doing manual data entry into a GUI form."

Let's say you're writing an "app" that's given a list of, say, 10K numbers to check. You have a database table in your RDBMS of choice with a column of "banned phone numbers". Let's say it is 100 million numbers, so too expensive to download in bulk.

How would you do this lookup?

Most programmers would say, it's an easy problem to solve: Make sure there is a unique index on that column in the database, and then for each row in the input run a lookup such as:

    SELECT 1 FROM BadNumbers WHERE PhoneNumber = @numbertocheck

So simple. So fast!

Okay, that's 10K round-trips on the network, almost certainly crossing a firewall or two in the process. Now it'll take minimum of 1 millisecond per call, more like 2ms[1], so that's at least 20 seconds of wait time for the user to process mere kilobytes of data.

Isn't that just sad? A chunk of a minute per 100KB of data.

Like I'm saying, nobody has internalised just how thoroughly Wrong everything is top-to-bottom. The whole concept of "send a query row-by-row and sit there and wait" is outdated, but it's the default. It's the default in every programming language. In every database client. In every ORM. In every utility, and script, sample, and tutorial. It's woven throughout the collective consciousness of the IT world.

The "correct" solution would be for SQL to default to streaming in tables from the client, and every such lookup should be a streaming join. So then the 100KB would take about 5 milliseconds to send, join, and come back, with results coming back before the last row is even sent.

PS: You can approximate this using table-valued parameters in some RDBMS systems, but they generally won't start streaming back results until all of the input has arrived. Similarly, you can encode your table as JSON and decode it on the other end, but that's even slower and... disgusting. The Microsoft .NET Framework has a SqlBulkCopy class but it has all sorts of limitations and is fiddly to use. But that's my point. What should be default case is being treated as the special case because decades ago it was.

[1] If you're lucky. But luck is not a strategy. What happens to your "20 seconds is not too slow app" when the database fails over the paired cloud region? 1-2 ms is now 15 ms and so those 100K round trips will cost two and a half minutes.

Tostino · on June 17, 2022

Yes, it is.

While I agree that databases could absolutely be improved to make streaming query results as described better, that isn't a limiting factor here IMO. I'd tackle that problem by batching my queries to the database into some logical batch size and send them as table valued parameters. If I had 10k phone numbers to check and minimum latency matters, why not batch into queries of 500-1000 values per-query? That cuts down time to first response, while reducing the network roundtrips.

The issue with taking this out of the database, is you lose consistency. I don't know about your industry, but I don't think mine would be terribly happy if I was using stale data to validate my Do Not Call/Email list. Now there are some situations where you can just update your list of numbers nightly/weekly/monthly, etc. If you don't need any concurrency or other guarentees, might as well save the time/resources on your DB server.

That's just not the world I have worked in.

treeman79 · on June 17, 2022

Brilliant explanation.

If I was to rebuild that python script application today.

to try and match 100,000 records against 10 million. If I were to do it in a database driven micro architecture solution. I’m not sure if I could come up with that returns results faster even using up probably a million times more clock cycles.

Tostino · on June 17, 2022

Yes, that is a good explainer on the horrors of single value lookups to a database. It isn't the only way to do that though, as explained in my other post.

I absolutely agree that a DB (even an extremely efficient one) is going to use many more clock cycles to return results there than a local data structure + application. No questions asked.

But how is that list kept up to date? If a user wants to be placed on the list, do you do that in real time to your local data structure? Do you wait for a batch process to sync that data structure with your source of truth?

I'm just saying that a simple program like that will be faster because it lacks a lot of the features that people would consider necessary in today's world.

The database is an amazing general tool that can be used to tackle whole classes of problems that used to require specialized solutions.

Aeolun · on June 17, 2022

> can blow this out to 3+ milliseconds surprisingly easily

And then your actual function starts, and returns after roughly 10s.

I think you underestimate just how inefficient enterprise can be. The extra time taken in connections between layers is not even a consideration.

eastbound · on June 17, 2022

I cringe every time a senior developer thinks he’s more clever by using an @Annotation, a Jointpoint, and Spring’s Aspect-Oriented Programming to solve issues. Not only it appears in the stacktrace as 10 method calls, but we’ve also forbidden the GOTO, and yet senior developers keep implementing the @COMEFROM [1] instruction. Both slow and impossible to debug.

[1] https://en.wikipedia.org/wiki/COMEFROM

jacobolus · on June 17, 2022

> Performance only goes downhill from here if the function does more work,

Should be the opposite. Overhead as a proportion of total time goes down the more useful work is involved.

TylerE · on June 17, 2022

Proportionally sure, but actual wall clock time can only increase.

jacobolus · on June 17, 2022

Okay, but the proportion of overhead is what matters here.

If you have a CPU or memory size bottleneck and a parallelizable workload, it makes plenty of sense to split the work across multiple machines and coordinate them over the network. If your job was going to take 20 minutes for 1 machine to do and you can fan it out to 100 machines and accomplish the same in 12 seconds per machine plus an extra fraction of a second in communication overhead, that’s a huge win in total latency. The overhead doesn’t matter.

If you have a trivial workload that can be handled quickly on one machine, but you unnecessarily add extra network hops and drop your potential throughput by 100x, then it’s a huge loss.

A ping server that makes a bunch of international RPC calls before replying is the worst case scenario for overhead.

ben0x539 · on June 17, 2022

> Performance only goes downhill from here if the function does more work,

Doesn't this mean it's less of a problem? Like, isn't that good?

jussij · on June 17, 2022

The phrase "goes downhill" means things only get worse. The OP is suggesting that since the ping is already slow, a function that does something is going to be much worse.

ben0x539 · on June 17, 2022

But as performance gets worse, the ping being slow matters less, relatively, right?

jussij · on June 18, 2022

But for most system you don't want performance to get worse. You want to make performance get better.

The ping is just being used to measure of the foundations of the system and the OP is pointing out don't expect to see a 'high performing' system when the ping has identified the foundations are broken.

If you fix the ping and then system is automatically fixed.

travisgriggs · on June 17, 2022

All for the low low price of letting someone derive analytics on your call graph. What a deal!