We built a distributed DB capable of running across 100s of global locations

ignoramous · on March 5, 2019

Great work. Building a distributed database isn't easy at all and takes considerable effort. I'd like to see more on failure scenarios. Here's a few preliminary questions I have (admittedly, I haven't read the entire post, apologies if my queries are answered already):

What is the SLA for durability and availability of the db?

- how are scenarios like edge locations going down for multiple minutes handled? Are the writes lost?

- how many replicas of data at a single edge location?

- Are there limits on the table size (how does one enforce this in a multi-master mode)?

- what's the SLA on replication time to say, at least 50% edge locations, and then to 100%?

Are DDL operations allowed? If so, how are the conflicts handled?

If data is stored in LSMs underneath, how are geo location queries handled? Is there an index/materialised view? If so, how long does that take to generate?

What's the TPS/QPS supported for the KV interface?

- Is there a scenario where a set of operations at X TPS across edge locations might then take a long time to converge globally?

It'd be great if you can list down scenarios where the db can lose write to 'true conflicts' in a FAQ somewhere.

- For instance, what happens in a create-delete-create scenario where a table is created at one edge location, deleted at another, created at a third edge location; all the while when there's writes and reads happening globally?

Thanks.

ctesh · on March 5, 2019

Thanks for the truly thoughtful questions - I will put together a short faq and post a link and a a tl/dr answer here.

We also have a more technical internals paper almost completed and will publish next week.

Thanks again! This is such a helpful comment.

durga_mm · on March 5, 2019

Hi Ignoramous - Thanks. Very helpful comment with good questions. I will update or cover in the next post.

Regards

CharlesW · on March 5, 2019

Shouldn't this be a "Show HN", since the submitter is the company CEO/President?

durga_mm · on March 5, 2019

Hi CharlesW - Thanks. Probably we will publish another post with details on how people can try out the global edge fabric. That probably would be good fit for "Show HN"?

Regards

ctesh · on March 5, 2019

The TL/DR on how:

1. Causal consistency with the ability to create collections (tables) with strict consistency - uses vector clocks and not wall clock timestamps for ordering DB operations

2. Streams to propagate DB changes from one geo location (node) to another with guaranteed ordering and reliable delivery

3. Generalized Operational CRDTs to make all DB operations Associative, Commutative, Idempotent and Distributed (the new ACID)

4. One data model - multiple interfaces - store and query your data as Key/Value DB, DocumentDB, Graph DB, StreamDB, Geo location DB (query by lat/long/altitude)

5. Automatic GraphQL and REST api generation for your schema

6. Multi master - query and update any of our 25 global locations and get one consistent view of data globally

7. Access via CLI, REST, GraphQL (built in server), client drivers for python, Javascript today (Java, Go, Ruby in the works)

kilburn · on March 5, 2019

Translation:

There are two "modes" of operation of this product: "regular" and "SPOT collections".

In regular operation mode, the system will lose concurrent writes and cannot enforce ACID (the classical one) transactions. Your data is however highly available, eventually consistent and copied around the globe for faster access.

With SPOT collections you get ACID transactions but lose the distribution advantages. That is, the system's properties are akin to what you get with a postgres/mysql cluster.

It is an interesting product, but IMHO you should only go there if/when: your users are truly all over the world, and your system (or that part of it) requires few or no ACID transactions (think twice because this is a big one).

PS: This seems to be a "productified" version of the gun [1] database (not that this is a bad thing!). Am I right?

[1] https://github.com/amark/gun

ctesh · on March 5, 2019

Yes - you hit the nail in the head - that is indeed how SPOT and regular collections work.

We think we provide the flexibility of both very strict acid behavior using spot collections Along with the strong eventual consistency model for everything else. Important to understand that as a developer you don’t need to deal with any of it - it’s handled for you once you have marked a collection as a spot.

This is not related to gun in any way. We love what gun does and it’s approach and they do some very clever things (including work on the end device). We sit as a back end database as a service in 25 global pops and process dB operations (and code expressed as functions or containers) at the closest location (by latency or geo physical location) to the user or device using an app or api rubbing against us.

we wrote our own operational CRDT engine And streams to solve this.

sagichmal · on March 5, 2019

This is buzzword salad, not a coherent technical summary.

ctesh · on March 5, 2019

ok Take 2:

1. We use causal consistency using vector clocks to establish the causal order of operations instead of using Time stamping and Network Time Protocol which are unreliable over WAN distances

2. We share Database state changes/updates between each Point of Presence (POP) to the others in the cluster using Asynchronous Streams that use a pull model instead of a push to maintain ordering of messages over the network. This has the added benefit of allowing us to know exactly how current each PoP is with changes.

3. We dont use Quorum for inter PoP to establish consistency - there's a white paper on our site that shows you the how and why of coordination free replication. Gist is we have developed a generalized operational CRDT model that allows us to get Associtive, Commutative, Idempotent convergence of changes between PoPs in the cluster without needing quorum

4. the DB is a multi master - you can access and change data at any PoP. Its also multi model and lets you query your data across multiple interfaces such as key/value, documentDB (JSON), Graph etc

5. the DB automatically creates GraphQL and REST APIs for your schema taking away the complexity and effort of a lot of boilerplate development on the backend

6. The DB is available as a managed service in 25 PoPs today - you can request an account and we will give you one. WE will be generally available with a free tier in April and you can signup online and self administer your cluster

7. You can access the DB via a CLI, GUI, or write code that accesses it via REST, GrapHQL or using native language drivers in JavaScript, Python today (we are working on other languages with a view on releasing them over the next few months)

hope this helps...

the_arun · on March 5, 2019

REST interface is good, but could we add business valdation before mutating data? Because bulk of the work backend does is this business logic. How could we do this?

ctesh · on March 5, 2019

Hello and greetings the_arun.

The short answer is yes. Macrometa integrates a function as service (FaaS) which can be hooked into the database and be triggered by events on a stream or a data collection.

So you can for example do the following: Expose a RESTful or GraphAPI (included deep nested queries in graphQL) for one or more collections - when mutating, attach a validation function to the collection as a trigger that is called before the mutation is applied to the DB. You can also have a trigger that calls a function after the mutation is complete.

One can also do this on streams with functions being triggered to a specific topic.

Lastly - there is full support for running containers as well and you can use the endpoints exposed by the container as a trigger.

Oh and one more thing - the dB is real time. It will notify clients of updates to collections automatically (like firebase).

Hope this helps..

the_arun · on March 5, 2019

Thanks for the details. This sounds very similar to Amazon's DynamoDB? Are there features to make macrometa better than DynamoDB?

ctesh · on March 5, 2019

Shares some feature overlap with dynamodb (key/value and document dB interfaces). Where we differentiate - global replications across all our 25 global POPs (50 by end of 2019). Integrated graphQL generator (rest as well), real-time: dB will notify clients of changes to data I.e. no need to poll, rightly integrated with streams and pub/sub, run functions and containers as triggers or stored procedures to the DB, geo query: query by lat/long/height, elastic search integrated (July 2019). There’s more - will announce in April

hestefisk · on March 5, 2019

I would say it’s fairly easy to read. Easier to read than the article. Don’t shoot the messenger.

readerM · on March 5, 2019

Hi sagichmal, why do u think it is buzzword salad? Are the problems mentioned in post with regards to current databases and crdt not real?

rat9988 · on March 5, 2019

That's quite a harsh comment without anything substantial. It's really unfair to the parent comment who attempted to help the HN reader crowd with a summary.

camel_Snake · on March 5, 2019

I, for one, appreciated this summary.

willvarfar · on March 5, 2019

This is actually exciting. Most sales pitches for secret sauce services are much flimsier and don't ooze the actual technical possibilities and promises that this one does.

So, does it solve the double-spend problem or not? The prose is a bit ambiguous on that.

I don't want to work that out for myself. I want to be told one way or another, and perhaps given some simple examples like "users x and y both want to do z at roughly the same time; depending on several factors like the exact timings and network latencies, there are several outcomes. They are..." etc.

And of course we want to know what kind of realistic throughput and latency it gets, how easy it is to add and remove POPs (and what data gets lost when you do), etc.

Finally, it really needs to commission and publish a Jepsen analysis. Hasn't every database vendor learned how critical that is to winning the programming public's heart?

This seems to be a startup who is going to sell it as a service, and I hope them all the best. The DB world is full of better tech that didn't make the dent it technically deserved (hyperdex, tokudb, perhaps rethink etc) so a new startup has to be both vocal and embrace Jepsen from the beginning. I think HN is ripe for a new darling and whichever company becomes that darling might capture a lot of the ill-informed web-scale-before-we-have-a-customer market ;)

ctesh · on March 5, 2019

Thanks for pointing out the ambiguous bits - I’ll clean it up in the next few days.

Yes it does solve the double spend - it let’s you mark (check box) a collection with a SPOT property (single point of truth) which restricts the number of edge region that collection is replicated across. Additionally the spot collection is replicated across two separate data centers (and separate availability zones in those data centers) to provide high availability. The developer doesn’t need to deal with the complexity of where and how this is done - they continue to access the collection like a regular one and the DB will redirect the queries (and do things like joins) between the regular and spot collection.

I’m writing a faq and will include your questions there.

Jepsen is in the plans - we hope to get a report out in the summer.

durga_mm · on March 5, 2019

Hi willvarfar - I am one of the authors of the post. Very good feedback. Appreciate much. I will update the post or write a separate post on `double-spend` problem with clear example.

raghava · on March 5, 2019

This looks pretty neat!

Considering that there's a FaaS layer too, which could be used along with the DB layer offering, what degree of vendor-lockin are users getting into?

Also, this product seems only targeted at those direly needing edge computing, thus missing out on many who might be using Firebase and Serverless (Lambda/Functions etc) for the works. Is that intentional?

ctesh · on March 5, 2019

You bring up a great point. One absolutely can use us as a firebase alternative. We have a few customers using us with lambda and container services directly on their favorite cloud provider in just 2 locations for high availability of their apps.

The edge computing position is intentional in that we think the edge is a way to build globally distributed apps and apis. Our goal is to make as simple to build and run an app in 25 regions as it is to build an app against say firebase or dynamodb.

zzzcpan · on March 5, 2019

So, does it actually have CRDT operations? Doesn't seem like any of the interfaces expose them. Can you use a convergent counter, for example?

Also product page [1] has some weird claims. There is no such thing as "Strong Session Consistency", this is just another name for weak consistency guarantees. And Strong Eventual Consistency, which is a thing, requires using mergeable conflict-free operations that you don't seem to expose? The "smart operation ordering using intent prioritization" is also neither strong eventual consistency nor strong consistency.

[1] https://www.macrometa.co/product

durga_mm · on March 5, 2019

Hi zzzcpan - Yes. The global edge fabric internally uses CRDT operations. It is a conscious choice to not expose the CRDT data types. The idea being the developer work with APIs & Query layer as they do with other databases and underneath the system take care of translating to necessary CRDT operations. Convergent counter is in the works and will be available shortly.

The system has strong consistency within a region (aka datacenter) and strong eventual consistency across data centers. Answering your question, yes we do converge (merge) the changes across regions using conflict-free operations.

Intent prioritization rules comes into picture only when it is impossible for the system to determine the intent of the developer when two conflicting changes occur across regions.

I hope the above helps.

Regards

tckr · on March 5, 2019

Where do I create an account?