Hacker News new | past | comments | ask | show | jobs | submit login

From the post:

Individually, none of these concepts are new. I’m sure you’ve seen them all before. You may be tempted to dismiss Rama’s programming model as just a combination of event sourcing and materialized views. But what Rama does is integrate and generalize these concepts to such an extent that you can build entire backends end-to-end without any of the impedance mismatches or complexity that characterize and overwhelm existing systems.

You have the general model correct, but here are a few clarifications:

- PStates are partitioned, durable, replicated indexes that are represented as arbitrary combinations of data structures. A PState can be as simple an an integer per partition, or it can be complex like a map of lists of maps of sets. PStates allow you to shape your indexes to perfectly match your application's use cases.

- I wouldn't call Rama queries an "engine", as it's considerably more straightforward in how it works than something like SQL. The base query API is called "paths", which are an imperative way to concisely reach into one partition of one PState to fetch or aggregate values. There's also "query topologies" which are predefined, on-demand distributed computations that can fetch and aggregate data from many partitions of many PStates.




Thanks, I will read more soon! I'm curious... how do you resolve the "impedance mismatch" between some "canonical" models that business decisions are made, based upon, which need to be synchronous with the depots (and mutually synchronous with other models sharing fragments of the same data), and the eventually consistent read models, which have a more lax constraint on how up to date they are?

How do you ensure consistency here? How do you organize it in the data flow?

Say I update a user, because that user seems to still be there in the query result/indexes, but actually an event for this user being deleted has happened some time ago?

This can also happen I suppose of the depots run queries themselves on PState in order to determine if a certain event is valid at all or not, and how exactly to carry it out.


The impedance mismatches you're used to from using databases are gone because:

- You can finely tune your indexes to be exactly the optimal shape for your application (data structure). You can see this in our Mastodon implementation with the big variety of data structures we used for all the use cases. - You're generally just using regular Java objects everywhere: appending to depots, during ETL processing, and stored in indexes.

How you coordinate data creation with view updates is a deeper topic, so I'll just summarize one of the basic mechanisms Rama provides for coordinating this. Depot appends can have an "ack level" that determines the conditions before Rama tells you that depot append has completed. The default level is "full ack" which includes all streaming topologies colocated with that depot fully processing that record. With this level, when the depot append completes you know that all associated indexes (PStates) have been updated.

There's also "append ack", which only waits for the depot append to be replicated on the depot, and "no ack", which is fire and forget. These all have their uses depending the specific needs of an application.


Thanks! So we can see these ACKs as "wait and synchronize" signals I suppose? However how can we ensure an "all or nothing" between all parties trying to ACK a conditions they're mutually dependent on? I.e. transactionality or atomicity?




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: