> Truly efficient distributed systems are most naturally expressed through functions as triggers invoked from upsert operations on addressable relations
O_o
Just because you can express something as an 'upsert', that doesn't make it 'relational'. Transactions exist outside the concept of rdbms'. The article doesn't mention relational algebras once.
Yes, a lot of terminology and math in rdbms' are useful in distributed computing, but you have the causality backwards.
I don't get all the author's hate for sql. It's one of the most successful declarative languages ever.
Nothing prevents you from modelling a distributed system as a set of key-value stores (as we often do today), the idea of a message queue is independent of using a database
as the mechanism to do so. 'using postgress'/rdbms doesn't mean your entire system is 'relational'.
Wouldn't it be better for D to lazily request information from b and c on your behalf in the 'maximally efficient' case? Given D is where the computation is run and it could cache the results. From an auth perspective that seems simpler than cross wiring connections between all nodes as proposed.
All this talk and nothing about n-phase commits or Byzantine generals/any tie backs to the typical way of talking about distributed computing, but they dance around the subjects.
IDK. Sorry man. Didn't like the article, which feels bad because you seem passionate about the presentation of it.
Edit: looked through the sub stacks other posts. They do kinda talk about relational algebra in a subsequent post, but overall I'm curious if the author has looked into dataflow programming before. It seems like the author is kind of trying to describe that concept but with a vocabulary mostly consisting of rdbms terminology and history
I feel like the author needs to read the Out Of the Tarpit paper if they haven't already. It feels like they are grasping at some of the concepts that are presented there in a far clearer fashion.
> I don't get all the author's hate for sql. It's one of the most successful declarative languages ever.
To be fair, they criticize SQL as a language for writing triggers, which are imperative by nature, and SQL/PLSQL as an imperative programming language does suck.
> In the second diagram I see ill-defined responsibilities
Why do you say this?
Yes, passing a message along to another network service is another step. But whether or not it is an additional "responsibility" is debatable I think. It depends on what your baseline is.
1. If the baseline is a synchronous, in-process RPC, then, sure, an additional network call is an extra step. (Even so, I don't necessarily view this as a problem.)
2. But if your baseline is an asynchronous RPC, remember that there is often some mechanism to report status back to the caller. (This is part of the cost of doing business in async.) There would be no net difference to pass a result forward compared to back to the caller.
Tell me if my logic is not sound. (I wonder if you have fallen for the fallacy of overlooking what is 'normal' and criticizing what is different? Just a guess. Maybe, maybe not.)
> and a looming callback hell.
There is nothing intrinsically wrong with using callbacks. Callback hell is a syntax problem with some languages. In particular, visually it looks unappealing to programmers. Visual appeal (or not) is not necessarily coupled to wise design decisions. Especially in distributed systems there are a different set of drivers than, say, in-browser JS constraints (event loop, rendering, etc).
Example 1: Clojure, for example, uses macros (core.async) so that programmers can write code that is indented one level deep and looks like there are no callbacks. But underneath, callbacks are still used.
Example 2: In Rust, callbacks aren't used under the hood for async/await. Instead, the syntax maps to a state machine. But the general point holds: a state machine could be hellish to write explicitly (call it "state machine hell" perhaps, haha) so Rust provides a better syntax for it.
Broader point: some people conflate how code looks in some particular language with overall quality. Beware of taking this to an extreme. I have been guilty of this from time to time, especially when doing a lot of Ruby long ago.
> Yes, passing a message along to another network service is another step. But whether or not it is an additional "responsibility" is debatable I think. It depends on what your baseline is.
It's just my feeling based on my experience.
What if there are 15 services? Who decides where to route and why? I want to see how it would work in practice. That's why I asked for something more concrete than A, B, C, D. Maybe it would actually be easier and more straightforward, I don't know.
> There would be no net difference to pass a result forward compared to back to the caller.
For example, A knows it has called B, so if there is no response it knows that it has to handle an error, do a rollback and whatnot.
> There is nothing intrinsically wrong with using callbacks. Callback hell is a syntax problem with some languages.
Right, bad wording on my part. If it's A's responsibility to say where to route the result to D, is it also its responsibility say where to route the result of D? Would I have to specify the whole call chain? Do I pass the whole call chain to both B and C? That's what I meant.
What if the request didn't originate from A but from Ω? And B, C are also called to be routed to D by another service called by Ω in the chain sometimes. As stated in the article, D waits for B, C that originate from the same call, which is Ω. Now someone who maintains A is going to have very good time trying to debug that to understand why sometimes D is called with wrong values and then maybe even crashes or returns an error.
> a state machine could be hellish to write explicitly (call it "state machine hell" perhaps, haha) so Rust provides a better syntax for it
Every computer is a finite state machine and we are using mental models like async/await to hide this pesky fact and simplify our lives.
>> There would be no net difference to pass a result forward compared to back to the caller.
> For example, A knows it has called B, so if there is no response it knows that it has to handle an error, do a rollback and whatnot.
Good point. Many application designs wouldn't work as-is in the "pass forward" approach. I hesitate to say in response: "well, then you're architecting it wrong". But perhaps there is a grain of truth in that. Maybe it is hard tradeoff.
> A distributed computing system, then, is naturally expressed as a set of relational stores with triggers.
1. This statement is too big of a leap; it doesn’t follow from the setup which only required an upsert operation and key/value storage. Nothing in the setup example requires relational algebra. Agree?
2. This seems like a hasty generalization. Is the author claiming that the toy example is representative of distributed systems? If so, what aspects?
IMO, the post started strong but fizzled; this is why I’m giving pointed feedback.
The request/response optimization discussed in the first half of this post has been explored quite a bit in the context of Object-Oriented Programming and Actors, where the desired feature is called "Promise Pipelining":
Outside of the E programming language and in the realm of language-agnostic tooling, you can find promise pipelining in some RPC frameworks, such as Cap'n Proto:
I was also thinking of promise pipelining, but note that the article proposes communicating A -> {B, C} -> D, never directly communicating from A to D. Cap 'n Proto "level 3" nodes could send promises from B and C to D, but that still needs A to talk to D, i.e. A -> {B, C, D}; {B, C} -> D. Same latency / depth of dependency chain, but still more messages - right?
(In return, note that Cap 'n Proto's A -> D message makes it more obvious how A figures out whether the operation succeeded; I'm not quite sure how that works in the proposed diagram. I suppose the proposed system actually puts all messages in a system-wide database, which does solve the problem.)
That should not be the case with promise pipelining. The "Mobile Almost-Code" section of the E page explains this. You mentioned "continuation passing style", which is effectively what promise pipelining does: For the constrained class of continuations that can be serialized as a dataflow graph, pass those along with the message.
Importantly, the system wide constraint is willing participation from each actor, not a shared database. Instead of each actor needing to know how to interact with the shared database, each actor needs to be willing and able to execute these passed continuations.
O_o
Just because you can express something as an 'upsert', that doesn't make it 'relational'. Transactions exist outside the concept of rdbms'. The article doesn't mention relational algebras once.
Yes, a lot of terminology and math in rdbms' are useful in distributed computing, but you have the causality backwards.
I don't get all the author's hate for sql. It's one of the most successful declarative languages ever.
Nothing prevents you from modelling a distributed system as a set of key-value stores (as we often do today), the idea of a message queue is independent of using a database as the mechanism to do so. 'using postgress'/rdbms doesn't mean your entire system is 'relational'.
Wouldn't it be better for D to lazily request information from b and c on your behalf in the 'maximally efficient' case? Given D is where the computation is run and it could cache the results. From an auth perspective that seems simpler than cross wiring connections between all nodes as proposed.
All this talk and nothing about n-phase commits or Byzantine generals/any tie backs to the typical way of talking about distributed computing, but they dance around the subjects.
IDK. Sorry man. Didn't like the article, which feels bad because you seem passionate about the presentation of it.
Edit: looked through the sub stacks other posts. They do kinda talk about relational algebra in a subsequent post, but overall I'm curious if the author has looked into dataflow programming before. It seems like the author is kind of trying to describe that concept but with a vocabulary mostly consisting of rdbms terminology and history
https://en.m.wikipedia.org/wiki/Dataflow_programming