But the network is so slooooooowwwwwwwww. Don't get me wrong, sometimes it's wor...

5600k · on Feb 19, 2021

This. One million times this.

I’ve been developing for more years than dime if you have lived, and the best thing I’ve heard in years was that Google interviews were requiring developers to understand the overhead of requests.

In addition, they should require understanding of design complexity of asynchronous queues, needing and suffering from management overhead of dead letter, scaling by sharding queues if it makes more sense vs decentralizing and having to have non-transactionality unless it’s absolutely needed.

But not just Google- everyone. Thanks, Mr. Fowler for bringing this into the open.

Yeroc · on Feb 19, 2021

Indeed! The "Latency Numbers Every Programmer Should Know" page from Peter Norvig builds a helpful intuition from a performance perspective but of course there's a lot larger cost in terms of complexity as well.

bulleyashah · on Feb 19, 2021

I mean, you can always deploy your microsevices on the same host, it would just be a service mesh.

Adding network is not a limitation. And frankly, I don't understand why you say things like understanding network. Like reliability is taken care of, routing is taken care of. The remaining problems of unboundedness and causal ordering are taken care of (by various frameworks and protocols).

For dlq management, you can simply use a persistent dead letter queue. I mean it's a good thing to have dlq because failures will always happen. About which order to procese queue etc. These are trivial questions.

You say things as if you have been doing software development for ages, but you're missing out on some very simple things.

danielheath · on Feb 19, 2021

That’s because those things are only simple when they’re working.

Each of them introduces a new, rare failure mode; because there are now many rare failure modes, you have frequent-yet-inexplicable errors.

Dead letter queues back up and run out of space; network splits still happen;

bulleyashah · on Feb 19, 2021

Sounds like you're saying "Don't do distributed work" if possible (considering tradeoffs of course, but I guess people just don't even consider this option is your contention).

And secondly, if you do end up with q distributed systems, remember how many independently failing components there are because thag directly translates to complexity.

On both these counts I agree. Microservices is no silver bullet. Network partitions and failure happen almost every day where I work. But most people are not dealing with that level of problems, partly because of cloud providers.

Same kind of problems will be found on a single machine also. Like you'd need some sort of write ahead log, checkpointing, maybe optimize your kernel for faster boot up, heap size and gc rate.

All of these problems do happen, but most people don't need to think about it.

LargeWu · on Feb 19, 2021

I'm not reading this as "Don't do distributed work". It's "distributed systems have nontrivial hidden costs". Sure, monoliths are often synonymous with single points of failure. In theory, distributed systems are built to mitigate this. But unfortunately, in reality, distributed systems often introduce many additional single points of failure, because building resilient systems takes extra effort, effort that oftentimes is a secondary priority to "just ship it".

spion · on Feb 19, 2021

If you are using a database, a server and clients, you already have a distributed system.

You'll also likely use multiple databases (caching in e. g. Redis) and a job queue for longer tasks.

You'll also probably already have multiple instances talking to the databases, as well as multiple workers processing jobs.

Pretending that the monolith is a single thing is sneakily misleading. It's already a distributed system

fendy3002 · on Feb 20, 2021

Indeed. So with monolith usually we already have 3-4 (or more) somewhat reliable systems, and one non-reliable system which is your monolithic app. Why add other non reliable systems if you don't really need it?

Making a system to be reliable is really really hard and take many resources, which seldom companies pursuit.

spion · on Feb 20, 2021

just because all the code is in one place doesn't mean its one system.

TickleSteve · on Feb 19, 2021

Compare the overheads and failure modes of a request to that of a method call, thats the comparison.

Requests can fail in a host of ways that a call simply cannot, the complexity is massively greater than a method call.

bulleyashah · on Feb 19, 2021

+1

I realized this one day when I was drawing some nice sequence diagrams and presenting it to a senior and he said "But who's ensuring the sequence?". You'll never ask this question in a single threaded system.

Having said that, these things are unavoidable. The expectations from a system are too great to not have distributed systems in picture.

Monoliths are so hard to deploy. It's even more problematic when you have code optimized for both sync cpu intensive stuff and async io in the same service. Figuring out the optimal fleet size is also harder.

I'd love to hear some ways to address this issue and also not to have microservice bloat.

lantastic · on Feb 19, 2021

I heartily agree. Don't treat yourself like a 3rd party.