Distributed == Relational

hughesjj · 2024-08-13T00:20:39 1723508439

> Truly efficient distributed systems are most naturally expressed through functions as triggers invoked from upsert operations on addressable relations

O_o

Just because you can express something as an 'upsert', that doesn't make it 'relational'. Transactions exist outside the concept of rdbms'. The article doesn't mention relational algebras once.

Yes, a lot of terminology and math in rdbms' are useful in distributed computing, but you have the causality backwards.

I don't get all the author's hate for sql. It's one of the most successful declarative languages ever.

Nothing prevents you from modelling a distributed system as a set of key-value stores (as we often do today), the idea of a message queue is independent of using a database as the mechanism to do so. 'using postgress'/rdbms doesn't mean your entire system is 'relational'.

Wouldn't it be better for D to lazily request information from b and c on your behalf in the 'maximally efficient' case? Given D is where the computation is run and it could cache the results. From an auth perspective that seems simpler than cross wiring connections between all nodes as proposed.

All this talk and nothing about n-phase commits or Byzantine generals/any tie backs to the typical way of talking about distributed computing, but they dance around the subjects.

IDK. Sorry man. Didn't like the article, which feels bad because you seem passionate about the presentation of it.

Edit: looked through the sub stacks other posts. They do kinda talk about relational algebra in a subsequent post, but overall I'm curious if the author has looked into dataflow programming before. It seems like the author is kind of trying to describe that concept but with a vocabulary mostly consisting of rdbms terminology and history

https://en.m.wikipedia.org/wiki/Dataflow_programming

cmrdporcupine · 2024-08-13T01:30:57 1723512657

I feel like the author needs to read the Out Of the Tarpit paper if they haven't already. It feels like they are grasping at some of the concepts that are presented there in a far clearer fashion.

Detrytus · 2024-08-13T13:54:52 1723557292

> I don't get all the author's hate for sql. It's one of the most successful declarative languages ever.

To be fair, they criticize SQL as a language for writing triggers, which are imperative by nature, and SQL/PLSQL as an imperative programming language does suck.

LudwigNagasena · 2024-08-13T01:22:31 1723512151

In the first diagram I see well-encapsulated services B, C and D and their orchestrator A.

In the second diagram I see ill-defined responsibilities and a looming callback hell.

What if we could look at more concrete examples with a dozen services that model a real process? Maybe that would elucidate the benefits.

xpe · 2024-08-14T01:35:26 1723599326

> In the second diagram I see ill-defined responsibilities

Why do you say this?

Yes, passing a message along to another network service is another step. But whether or not it is an additional "responsibility" is debatable I think. It depends on what your baseline is.

1. If the baseline is a synchronous, in-process RPC, then, sure, an additional network call is an extra step. (Even so, I don't necessarily view this as a problem.)

2. But if your baseline is an asynchronous RPC, remember that there is often some mechanism to report status back to the caller. (This is part of the cost of doing business in async.) There would be no net difference to pass a result forward compared to back to the caller.

Tell me if my logic is not sound. (I wonder if you have fallen for the fallacy of overlooking what is 'normal' and criticizing what is different? Just a guess. Maybe, maybe not.)

> and a looming callback hell.

There is nothing intrinsically wrong with using callbacks. Callback hell is a syntax problem with some languages. In particular, visually it looks unappealing to programmers. Visual appeal (or not) is not necessarily coupled to wise design decisions. Especially in distributed systems there are a different set of drivers than, say, in-browser JS constraints (event loop, rendering, etc).

Example 1: Clojure, for example, uses macros (core.async) so that programmers can write code that is indented one level deep and looks like there are no callbacks. But underneath, callbacks are still used.

Example 2: In Rust, callbacks aren't used under the hood for async/await. Instead, the syntax maps to a state machine. But the general point holds: a state machine could be hellish to write explicitly (call it "state machine hell" perhaps, haha) so Rust provides a better syntax for it.

Broader point: some people conflate how code looks in some particular language with overall quality. Beware of taking this to an extreme. I have been guilty of this from time to time, especially when doing a lot of Ruby long ago.

LudwigNagasena · 2024-08-14T21:14:10 1723670050

> Why do you say this?

> Yes, passing a message along to another network service is another step. But whether or not it is an additional "responsibility" is debatable I think. It depends on what your baseline is.

It's just my feeling based on my experience.

What if there are 15 services? Who decides where to route and why? I want to see how it would work in practice. That's why I asked for something more concrete than A, B, C, D. Maybe it would actually be easier and more straightforward, I don't know.

> There would be no net difference to pass a result forward compared to back to the caller.

For example, A knows it has called B, so if there is no response it knows that it has to handle an error, do a rollback and whatnot.

> There is nothing intrinsically wrong with using callbacks. Callback hell is a syntax problem with some languages.

Right, bad wording on my part. If it's A's responsibility to say where to route the result to D, is it also its responsibility say where to route the result of D? Would I have to specify the whole call chain? Do I pass the whole call chain to both B and C? That's what I meant.

What if the request didn't originate from A but from Ω? And B, C are also called to be routed to D by another service called by Ω in the chain sometimes. As stated in the article, D waits for B, C that originate from the same call, which is Ω. Now someone who maintains A is going to have very good time trying to debug that to understand why sometimes D is called with wrong values and then maybe even crashes or returns an error.

> a state machine could be hellish to write explicitly (call it "state machine hell" perhaps, haha) so Rust provides a better syntax for it

Every computer is a finite state machine and we are using mental models like async/await to hide this pesky fact and simplify our lives.

xpe · 2024-08-19T01:37:12 1724031432

>> There would be no net difference to pass a result forward compared to back to the caller.

> For example, A knows it has called B, so if there is no response it knows that it has to handle an error, do a rollback and whatnot.

Good point. Many application designs wouldn't work as-is in the "pass forward" approach. I hesitate to say in response: "well, then you're architecting it wrong". But perhaps there is a grain of truth in that. Maybe it is hard tradeoff.

xpe · 2024-08-19T01:46:20 1724031980

Thanks. Add the corresponding diagrams and this could be an interesting blog post!

xpe · 2024-08-13T00:27:27 1723508847

> A distributed computing system, then, is naturally expressed as a set of relational stores with triggers.

1. This statement is too big of a leap; it doesn’t follow from the setup which only required an upsert operation and key/value storage. Nothing in the setup example requires relational algebra. Agree?

2. This seems like a hasty generalization. Is the author claiming that the toy example is representative of distributed systems? If so, what aspects?

IMO, the post started strong but fizzled; this is why I’m giving pointed feedback.

brandonbloom · 2024-08-13T03:49:35 1723520975

The request/response optimization discussed in the first half of this post has been explored quite a bit in the context of Object-Oriented Programming and Actors, where the desired feature is called "Promise Pipelining":

http://www.erights.org/elib/distrib/pipeline.html

Outside of the E programming language and in the realm of language-agnostic tooling, you can find promise pipelining in some RPC frameworks, such as Cap'n Proto:

https://capnproto.org/rpc.html

Generally, this work comes from the Object-Capabilities community.

JoachimSchipper · 2024-08-13T07:47:02 1723535222

I was also thinking of promise pipelining, but note that the article proposes communicating A -> {B, C} -> D, never directly communicating from A to D. Cap 'n Proto "level 3" nodes could send promises from B and C to D, but that still needs A to talk to D, i.e. A -> {B, C, D}; {B, C} -> D. Same latency / depth of dependency chain, but still more messages - right?

(In return, note that Cap 'n Proto's A -> D message makes it more obvious how A figures out whether the operation succeeded; I'm not quite sure how that works in the proposed diagram. I suppose the proposed system actually puts all messages in a system-wide database, which does solve the problem.)

brandonbloom · 2024-08-14T18:29:36 1723660176

> that still needs A to talk to D

That should not be the case with promise pipelining. The "Mobile Almost-Code" section of the E page explains this. You mentioned "continuation passing style", which is effectively what promise pipelining does: For the constrained class of continuations that can be serialized as a dataflow graph, pass those along with the message.

Importantly, the system wide constraint is willing participation from each actor, not a shared database. Instead of each actor needing to know how to interact with the shared database, each actor needs to be willing and able to execute these passed continuations.

two_handfuls · 2024-08-13T04:19:32 1723522772

This article claims that the sequential messages of the first examples can be replaced by database operations for performance.

I would like to see someone run the experiment. Databases come with their own overheads, after all.

xpe · 2024-08-13T00:20:14 1723508414

If the author is here, I’d suggest naming the arguments for D to be (b, c) so as to correspond with B, C.