Citus Cloud 2, Postgres, and Scaling Without Compromise

samk117 · on Nov 16, 2017

These guys need to add a cheap/freemium tier so small-time developers can scale up their services from nothing. $99/month for their trying-it-out/developer plan is a bit overreaching.

craigkerstiens · on Nov 16, 2017

Using Citus on that small of a plan generally isn't ideal. If you're just starting out single node Postgres works great and there are a lot of options for single node Postgres such as Heroku, RDS, and now Google and Azure. Citus really starts to benefit you later on when you grow, hence our higher entry level.

As for trying it out, we bill based on usage just like if you were on any other large cloud provider so you only page for what you use. In that case trying out a dev cluster for a few hours would be just a few cents.

samk117 · on Nov 16, 2017

I appreciate your response, but my point is this: people are going to go with the service that they can scale up from $5/month to $28,000/month without any interruptions or switching of providers. It would be great to see you guys try to tackle this problem with that in mind, and I think you could do it really well. I just think you underestimate how many people want their infrastructure to cost nothing when they are in pre-launch mode, and will let their pre-launch infrastructure dictate their post-launch.

To reiterate: I get that your service doesn't start to scale well until the $99/month price point, but I don't care about performance until I'm at that price point anyway. I'd rather have the ability to start from nothing with OK/crappy performance, then have to switch services right when my product starts getting traction. It's about continuity and scaling up from nothing.

fizx · on Nov 16, 2017

> I just think you underestimate how many people want their infrastructure to cost nothing when they are in pre-launch mode, and will let their pre-launch infrastructure dictate their post-launch.

Somehow, given Craig's history at Heroku, I imagine he understands this better than any of us ;)

doh · on Nov 16, 2017

This is not a bad advice overall, but I think that Citus has still a huge market to conquer before they go down this path.

Single PG node can handle easily 80k/s writes and reads and store up to 3TB of data safely. Wast majority of companies will never have to think about cluster as this is more than sufficient for most.

No matter if it's Citus or any other DB, horizontal scaling comes at cost and most companies should pay it. Citus helps to the ones that really have issues with performance beyond those levels (like us) and I think the Citus team is smartly targeting those kind of customers, at least for the time being.

ProblemFactory · on Nov 17, 2017

> Single PG node can handle easily

I think that's exactly what people are asking for. A single PG node (or even a shared node) managed and monitored by competent people for a "hobbyist" price. You start your project on that, and when you outgrow it, click a button to upgrade to the real Citus Cloud.

Now it's entirely possible that the customer support and separate infrastructure maintenance doesn't make sense for Citus at hobbyist price points - but it would be a way for them to capture potential customers before they standardise on RDS, Redshift and other database scaling options.

qaq · on Nov 16, 2017

3TB? even on crap option like RDS up to 6TB is OK. You can get generic box with 200+ cores (actual cores not vCPUs) and 12TB of RAM from SuperMicro.

doh · on Nov 16, 2017

You can go to as high as the box allows you to go (GCE allows I believe up to 10TB per server), but the performance will suffer.

chucky_z · on Nov 17, 2017

I would add the caveat that it depends on your data heavily.

I supported a 20TB single-node DB, and it worked just fine. We had a lot of cold data though (about 10TB of it was solely for historical records, and 9.5TB of the rest was only accessed 4x yearly for massive reports).

The hardest part was never postgresql, mainly in numa related stuff.

manigandham · on Nov 16, 2017

How many products actually start getting traction? How many of those then need to scale beyond a single node? How many of those running on a $5/month plan will it take to make up for the dev and support costs?

I'm sure they can easily scale their plan down to the $50/month range with smaller instances but they're a business too and having a cost cut-off makes sense. The companies that need the scale will find them regardless, and they have the Citus Warp product as stated in the article to make migrations easy.

sam0x17 · on Nov 17, 2017

Basically copy the low-end lineup of https://www.elephantsql.com/ and then once you hit $99/month switch it over to sharding / whatever

sam0x17 · on Nov 17, 2017

If the masses use your product, the whales will follow

samk117 · on Nov 16, 2017

I will throw money at you if you set up a basic/crappy $5/month (or free!) node that can gradually get upgraded in-place to higher and higher plans to the point where it is using your sharding technology etc.

tty7 · on Nov 17, 2017

Why not just use postgres on a single node ?

I don't think a small-dev or a cheap/freemium user has as much potential for critus data as you think it does.

I for one appreciate the slightly higher cost and hope citrus continue on

jimbokun · on Nov 16, 2017

Now that Cockroach supports the "PostgreSQL Wire Protocol", how does it compare to CitusDB for scaling a relational data model in a distributed environment?

https://www.cockroachlabs.com/docs/stable/architecture/sql-l...

ozgune · on Nov 16, 2017

(Ozgun from Citus)

Citus extends PostgreSQL to make it a distributed database. In comparison, Cockroach builds the entire database from scratch and provides over the wire compatibility. For our users, this means that they get the following benefits.

1. PostgreSQL Features

I can think of the following features that our customers started using as they landed into core Postgres (there's probably many more):

* Rich data types. For example, jsonb, hstore, XML, and HyperLogLog

* Declarative partitioning. You can shard your tables and then partition them

* Performance improvements. Postgres 10 improves analytical query performance by 30-40%, and PG 11 will add another 3x. For Citus customers, they will get these performance benefits at scale.

2. Application Integration

If you already have an app built on Postgres, migrating to Citus should be significantly easier than moving to a new database. Some benefits I can think of are:

* You can pg_dump your Postgres schema and data. You can then load this data into Citus.

* If you need live migrations, you can use Citus Warp to stream changes from your Postgres database to Citus

* Native integration with hundreds of tools already in the Postgres world - data visualization, monitoring, backups, upgrades, and app frameworks such as Rails and Django.

3. Organizational Skills

Since Citus extends (it's practically) Postgres, you benefit from organizational skills created worldwide by running millions of Postgres applications.

All know-how around how to configure your Postgres database, when to create indexes and database constraints, and how to tune your queries directly applies to Citus.

I hope this helps. If this doesn't help clarify things, could you let me know?

MuffinFlavored · on Nov 16, 2017

> * Performance improvements. Postgres 10 improves analytical query performance by 30-40%, and PG 11 will add another 3x. For Citus customers, they will get these performance benefits at scale.

When is that to be expected or even usable in beta?

manigandham · on Nov 16, 2017

Postgres 10 is out now, that's what's running when you start a new Citus Cloud cluster. Postgres 11 is in progress, you can always follow the official releases on the main site: https://www.postgresql.org

anarazel · on Nov 16, 2017

Please note that in both cases it obviously depends on where your bottleneck is. E.g. if you're IO bound, you're not going to see a lot of improvements. Similarly if most of your time is spent doing index scans there'll not be that much efficiency gains. Although in both cases parallelism can help.

If instead you're doing a bunch of aggregates over table, and the bottleneck is the computation of the transition values, computation of hashes, tuple deforming, etc... you might see quite some benefits (first two some in v10 already, all three hopefully in v11).

jimbokun · on Nov 16, 2017

That is a very useful answer, thanks!

andreimatei1 · on Nov 16, 2017

Ozgun has done a good job describing some of Citus' benefits; allow me to write the CockroachDB-biased answer. Btw, congrats to all the Citus people on your launch.

The highest-level architectural difference is, in my opinion, the fact that CockroachDB aims to not trade any features of single-node SQL databases for the distributed nature of our clusters. In contrast to Citus, you get ACID (serializable) transactions regardless of how the data ends up being distributed - so touching more than one shard/node in a transaction does not weaken any consistency guarantees. Similarly, DML statements can modify distributed data with no fuss. The sharding is transparent in crdb; one doesn't need to explicitly declare any partitioning on a table or choose a shard count. We do give you, however, control over the data locality should you need it.

The other big architectural difference has to do with our availability/durability guarantees. In crdb, all data is replicated, and the replication is done through a consensus protocol. This means that data appears to be written at once on multiple replicas. Losing any one replica at any point in time is not a big deal; all the data that had previously been written is still accessible. You'll never lose any writes because of a machine failure. This is in contrast to most master-slave type architectures, where generally data is not "atomically" written across the master and the slave. Whenever a slave is promoted to be a master, you will probably incur some data loss. It is my understanding that Citus falls in a flavor of such a master-slave architecture.

Now, this all doesn't speak about the current blog post in particular; crdb does not currently have a managed service offering.

mslot · on Nov 17, 2017

It's always about trade-offs. Citus has ACID transactions within nodes, and ACD transactions across nodes. It does not have distributed snapshot isolation, but that means it also avoids the latency and concurrency penalties that come with distributed snapshot isolation.

Because clients talk to Citus using the synchronous postgres protocol, latency is one of the most important factors in performance. A single connection can never do more than 1/latency transactions per second. While you could use more connections, that is often complex, not always possible (e.g. long transaction blocks), and also comes with a significant performance penalty (fewer prepared statements, high SSL overhead, ...). It's important to optimise for latency.

We find the Citus transaction model is sufficient for most purposes. If you think of the typical balance transfer example, then there is no way to double spend or overspend. Distributed transactions are atomic and locks in postgres will serialise distributed transactions that update the same rows and ensure constraints are preserved. Importantly, Citus has distributed deadlock detection, which allows generous concurrency at almost no cost to the application (apart from maybe having to reorder transaction blocks).

Postgres supports different modes of replication, including synchronous replication to avoid losing any data on node failure. Many users actually prefer asynchronous replication, to avoid the extra round-trip and the broader performance implications that would have. It's also the reason we use streaming replication over other replication mechanisms that might give higher availability. Streaming replication supports higher concurrency and thus gives better throughput.

At the end of the day, people use Citus because a single node postgres (or MySQL) database is not performant enough to keep up with their workload. Performance is key. Citus is not a silver bullet for every postgres performance problem, but it has two particular strengths:

Minimal pain scaling out for multi-tenant applications that can shard by tenant.

    - Any SQL query is supported when filtering by partition column, including CTEs, window functions, complex joins, etc,
    - N times more CPU, memory and I/O bandwidth to process queries, and 1/N as much data to read
    - High throughput, low-latency transactions, including UPDATE/DELETE with subqueries
    - Citus Warp can do live migration from postgres (e.g. RDS) using logical decoding
    - Major ORMs are supported after small data model changes

Superpowers for real-time analytics apps (e.g. dashboard).

    - Bulk loading, up to millions of rows/sec
    - Massively parallel SQL with subqueries containing outer joins, lateral joins, etc. across all shards when joining by partition column
    - Massively parallel roll-ups (again with joins) using INSERT...SELECT
    - Massively parallel DELETE and UPDATE
    - Integration with other Postgres extensions, such as HLL, PostGIS, pg_partman
    - Advanced index types, including GIN, GiST, partial indexes
    - Table partitioning (for bulk delete, faster scans)

Citus has a laser-sharp focus on addressing these use cases and makes certain trade-offs because of that.

andreimatei1 · on Nov 17, 2017

> Postgres supports different modes of replication, including synchronous replication to avoid losing any data on node failure.

I would like to understand this better, for my edification. The best source on the exact semantics of pg replication that I've found were in this talk by Andres Freund: https://www.youtube.com/watch?v=VkzNL-uzBvA

What I understand is that "synchronous replication" means that the master commits locally, then waits for a slave (or k of them with quorum replication) to receive (and possibly also to apply) a commit before acknowledging the commit to the client. So what does that mean for "avoiding losing any data on node failure"? Commits acked to their client are indeed durable. But what about commits that are not acked? For example, is it possible for the master to apply a commit which is then seen by another query (from another client), then crash before the commit is sent to the slave? And so, if we then failover to the slave, the commit is gone (even though it had previously been observed)? This would qualify as "losing data".

Similarly, in that talk, Andres mentions that there is a window during which "split brain" is possible between an old master and a new master: when doing a failover, there's nothing preventing the old master from still accepting writes; so you need to tell all the clients to failover at the same time - which may be a tall order for the network partition scenarios when these things become interesting. If the old master is still accepting writes, then these writes diverge from the new master (so, again, lost data). With sync replication, I guess the window for these writes is limited since none of them will be acked to its client (but still, they're visible to other clients) - the window would be one write per connection.

I'm also curious to understand better how people in the Postgres world generally think about failoverd, both in vanilla pg and in Citus). I generally am able to find little information on best practices and the correctness of failovers, even though they're really half the story when talking about replication. For example, when replicating to multiple slaves (even with quorum replication), how does one choose what slave to failover to when a failover is needed? What kind of tool or system decides which one of the slaves has the full log? My intuition says that this is a pretty fundamental different compared to a consensus-driven architecture like CockroachDB: in Cockroach the "failovers" are properly serialized with all the other writes, so this kind of question about whether one particular node has all the writes or not at failover time is moot.

mslot · on Nov 22, 2017

> But what about commits that are not acked? For example, is it possible for the master to apply a commit which is then seen by another query (from another client), then crash before the commit is sent to the slave?

A commit doesn't become visible until it is synchronously replicated, regardless of whether its ack fails or succeeds. So in the case you're describing the commit is never acked and never observed.

> there's nothing preventing the old master from still accepting writes; so you need to tell all the clients to failover at the same time

In Citus Cloud we detach the ENI to make sure no more writes are going to the old primary, and the attach it to the new primary.

Without such infrastructure, an alternative is to have a shutdown timer for primaries that are on the losing side of a network partition. The system can recover after the timeout.

If you're using a single coordinator, then this only applies to the coordinator. The workers can just fail over by updating the coordinator metadata.

> how does one choose what slave to failover to when a failover is needed?

You can pick the one with the highest LSN among a quorum, since it's guaranteed to have all acknowledged writes.

qaq · on Nov 16, 2017

No offense but pricing looks high to get a solution that will outperform vanilla PG on i3.16xlarge one is looking at 29K/month? I am afraid to imagine what a solution that can outperform vanilla PG on maxed out SuperMicro 7089P-TR4T would cost.

craigkerstiens · on Nov 16, 2017

We actually have customers that have migrated over far earlier than that going from say a R3.xlarge to our smallest production cluster and seeing 2-3x performance gains. Comparing Citus to single node Postgres isn't really an apples to apples comparison. Part of our performance gains come from sharding tables under the covers, so even when doing painful sequential scans you're scanning less data. Another part is that we parallelize queries.

Yes, for the exact same hardware you have a different price, but the price to performance customers have seen has been much better.

user5994461 · on Nov 16, 2017

A pair of i3.16xlarge is around $8K a month, so it's not exactly cheap either.

qaq · on Nov 16, 2017

5K with 1 year reserved pricing (you are not generally throwing away your db each month)

sb8244 · on Nov 16, 2017

It's expensive, but you also get a really smart team to help out with operational or pg questions.

atombender · on Nov 16, 2017

Looks awesome! But it seems that this is AWS only. For companies on Google Cloud, will you be offering a GCP version?

craigkerstiens · on Nov 16, 2017

We're always continuing to evaluate other cloud providers to run our fully managed database as a service on (so stay tuned...) That said, because Citus the extension is open source as well as licensed for the enterprise version you can take it and run it yourself today on GCP or Azure or any other cloud of your choosing. And in fact we do have a number of customers running and managing Citus themselves on top of GCP.

atombender · on Nov 16, 2017

Thanks. I know it's open source, but Citus Cloud isn't, so presumably you don't get the extra functionality -- backups, replicas ("followers"), snapshots etc.

We are going to use CloudSQL once it exits beta, but it's a fairly conservative Postgres implementation (single node with read slaves), and you can't install third-party extensions (Citus is not available).

craigkerstiens · on Nov 16, 2017

Yeah, fully understand. With any luck we have a better option for you soon :)

samk117 · on Nov 16, 2017

Yes, GCP would be extremely desirable.

MightySCollins · on Nov 16, 2017

Please stop making your product more awesome. It would solve our scaling issues it's just above our price range right now :( I want an unlimited DevOps budget then I can also see AWS for more server.

craigkerstiens · on Nov 16, 2017

Ha, thanks for the kind words. If there ever are questions around sizing/pricing just let us know, often happy to at least try to do what we can there.

bzillins · on Nov 16, 2017

If the rebalancer leverages an update from PostgreSQL 10 does that mean new Citus formations are now PG10 based?

craigkerstiens · on Nov 16, 2017

Yep, newly created formations on Citus Cloud will be PG10 and the latest version of Citus. For any existing customers on PG 9.6 and older versions of Citus we'll be upgrading those that opt in. Our upgrade process leverages our underlying disaster recovery and high availability infrastructure[1] so we can generally upgrade an entire cluster regardless of data size in under a couple of minutes.

[1] https://www.citusdata.com/blog/2017/03/23/a-look-into-disast...

samk117 · on Nov 16, 2017

Can I generally use off-the-shelf postgresql bindings in most languages with CitusDB, or are there things about CitusDB that would require custom bindings? If the latter is true, for what languages are bindings available (ruby/node.js/crystal...)?

craigkerstiens · on Nov 16, 2017

The off the shelf Postgres drivers work. We're a pure Postgres extension so you don't need anything exotic there just a standard Postgres driver. When it comes to the application side of things it depends a little bit more there on how you may structure your queries. To help out there we have a few libraries (for Rails[1] and Django[2]) and more coming.

[1] https://www.citusdata.com/blog/2017/11/14/scale-out-your-dja...

[2] https://github.com/citusdata/activerecord-multi-tenant