PolarDB, yet another open source database system based on PostgreSQL

wanghq · on May 30, 2021

Oh, what's going on here?

> OceanBase, the database of Alibaba's fintech company Ant Group, will be open-source soon, possibly as early as June 1, according to Sina Tech.

https://cntechpost.com/2021/05/27/ant-groups-in-house-databa...

Also the team published many papers about PolarDB.

https://scholar.google.com/scholar?hl=en&as_sdt=0%2C48&q=%22...

wejick · on May 30, 2021

Looks like they're different technologies, found that oceandb has been offered in AliCloud under ApsaraDB umbrella https://www.alibabacloud.com/product/oceanbase

wanghq · on May 30, 2021

Sorry I didn't make myself clear. I was surprised because two databases from Alibaba are open sourced or to be open sourced.

foobarbazetc · on May 30, 2021

Alibaba is huge and different teams have different goals.

That one is for fintech.

yftsui · on June 1, 2021

These are just KPI projects as their performance are judged by meaningless metrics. OceanBase WAS open sourced, after the news reports and publicity, Alibaba deleted all code and replaced it with a Chinese announcement says it will no longer be an open source project.

ioltas · on May 30, 2021

Looking at their code tree, this is based on 11.2 (https://www.postgresql.org/docs/11/release-11-2.html): https://github.com/alibaba/PolarDB-for-PostgreSQL/blob/maste...

So this is code from two years ago missing stability fixes from upstream up to 11.12, at short sight.

mb4nck · on May 30, 2021

Looks like it, yeah.

Also, Postgres 12 introduced pluggable storage, which might help to implement a shared-nothing architecture without huge changes to vanilla Postgres (I haven't looked at how large their delta is)

jeff-davis · on May 30, 2021

Citus Data enables scale out, while also being a pure extension. That means you can upgrade Postgres like normal to the latest point release using whatever normal upgrade process you want (e.g. OS packages).

It has worked for a long time without the need for Postgres 12. However, the new APIs introduced in v12 did enable us to offer columnar compression as an option, which complements a lot of scale-out use cases.

See: https://www.citusdata.com/blog/2021/03/06/citus-10-columnar-...

I believe using the extension facilities of Postgres is far superior to a fork in the medium to long term.

(Disclaimer: I work for Citus, and on columnar compression.)

pella · on May 30, 2021

> I work for Citus, and on columnar compression.

:-)

I am waiting for the basic index support for columnar tables !

https://github.com/citusdata/citus/pull/4950

:-)

the-dude · on May 30, 2021

Are there any stability fixes you have in mind? In my recollection, there is not much instability in PG releases.

pella · on May 30, 2021

IMHO: with the latest fix -> more stable

search for "crash":

https://www.postgresql.org/docs/release/11.3/ ( 11x )

https://www.postgresql.org/docs/release/11.4/ ( 4x )

https://www.postgresql.org/docs/release/11.5/ ( 1x )

https://www.postgresql.org/docs/release/11.6/ ( 3x )

https://www.postgresql.org/docs/release/11.7/ ( 14x )

https://www.postgresql.org/docs/release/11.8/ ( 5x )

https://www.postgresql.org/docs/release/11.9/ ( 6x )

https://www.postgresql.org/docs/release/11.10/ ( 5x )

https://www.postgresql.org/docs/release/11.11/ ( 3x )

https://www.postgresql.org/docs/release/11.12/ ( 5x )

mb4nck · on May 30, 2021

OTOH the PolarDB specific changes seem to be contained enough that if you decide to run it in production, you can probably just apply most of the changes from the v11 branch yourself.

But I agree it's not a very good look to code-drop something on a .2 release when there's been 2,5 years of fixes.

jeff-davis · on May 30, 2021

The diffs are non-trivial (EDIT: only included changes in directories where conflicts are most likely to occur):

    $ git log --oneline REL_11_2..REL_11_12 -- src/backend src/include ':!src/backend/po' | wc -l
    536
    $ git diff --stat REL_11_2..REL_11_12 -- src/backend src/include ':!src/backend/po' | tail -1
     352 files changed, 14651 insertions(+), 7078 deletions(-)

Even if the conflicts are minor, it's going to be annoying to try to work it out. If you are hitting a specific crash, there's a good chance you can backport the fix cleanly, but I doubt you can just pull in all of the fixes proactively without some knowledge of the details of the fork.

I haven't really looked at the details... perhaps PolarDB already has many (or all) of the fixes since 11.2. Also I haven't actually tried a merge, I'm just assuming the difficulty based on the number of diffs (and my experience doing minor version merges in the past).

(Disclaimer: I work for Citus Data. Citus takes the approach of a pure extension, which means it works on unmodified Postgres, and minor upgrades typically don't interfere at all.)

joelp · on May 30, 2021

What are people/companies using currently to make their postgres database distributed these days?

Currently running my app + db on Heroku in the EU and would like to scale out since latency is abysmal for users in Australia and Asia.

wirelesspotat · on May 30, 2021

fly.io[1] supports Postgres clusters with read replicas in different cities[2]

It looks like AWS's Aurora Postgres doesn't support cross-region read replicas, but apparently their Aurora "Global Database" offering does[3]

- [1] https://fly.io

- [2] https://fly.io/docs/reference/postgres/#scaling-horizontally...

- [3] https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide...

craigkerstiens · on May 30, 2021

For fully managed (similar to Heroku Postgres) Crunchy Bridge [1] supports replicas across regions including say EU to AU. Which would actually couple really well with Fly.io for the app side.

- [1] https://www.crunchydata.com/products/crunchy-bridge/

manigandham · on May 30, 2021

The simple solution is to run Postgres replicas in those regions and do local reads for your app while directing the writes to the master region. This works well for read-heavy apps, and you can also put the master in a geographically central location to help with write latency.

If you need cross-region multi-master then I recommend something like CockroachDB or Yugabyte that have regional distribution as a core feature.

arcturus17 · on May 30, 2021

I’m glancing over the Heroku Postgres docs and they seem to offer easy to set up read replicas called “followers”. Wouldn’t this work for you?

iampims · on May 30, 2021

Heroku is only available in EU and US east. Latency from Australia is somewhat unavoidable. You can maybe reduce it by using Cloudflare or similar to terminate TLS as close to your users as possible.

sometimes666la · on May 31, 2021

Cloudflare give different Anycast IPs for each plan and they don't always hit the local ingestion point. Some people shot themselves in the foot with free/pro plan in certain markets (India, aus etc..) I don't believe in May 2021 local ingestion (TLS termination) is happening on anything less than Business plan in MEL or SYD

arthurcolle · on May 30, 2021

I think Citus is one option. Or possibly TimescaleDB

manigandham · on May 30, 2021

Those are meant for a single location, not cross-region distribution.

arthurcolle · on June 1, 2021

I don't think that's right.

Citus has a post from 2017 talking about allowing for horizontal scaling of a postgres instance. [0]

Same thing with Timescale. [1]

--------------------------------------------

[0] https://citusdata.com/blog/2017/02/16/citus61-released/

[1] https://blog.timescale.com/blog/timescaledb-2-0-a-multi-node...

mahkeiro · on May 30, 2021

It seems that Alibaba is going more open source than Amazon for their clouds services. That’s interesting and they may use this approach as a key differentiator.

unityByFreedom · on May 30, 2021

Based on what metric? Amazon took over Elastic's open source server and another major one IIRC. That was huge. PostgreSQL is not about to disappear.

greatpatton · on May 30, 2021

They did it only after ES decided to abandon the Apache license. I doubt that Amazon would have done it if they were not forced. Redshift is also Postgres based and Amazon never released something. So based on that metric Alibaba is more open.

unityByFreedom · on May 30, 2021

There would be no open source Elastic had Amazon not taken it on. It was a smart move and benefits their balance sheet to do so. The point though is that move was greater than replicating existing open source work.

funny_falcon · on May 31, 2021

? Team on my previous workplace patched ElasticSearch to add functionality for years. Could thay do it if there were no open source Elastic?

unityByFreedom · on June 2, 2021

I agree you can't patch software without an actively supported open source package. You may have meant to reply to someone else.

JoelJacobson · on May 30, 2021

No commit history? It would have been interesting to follow the development work to see what changes they've made compared to PostgreSQL.

antoncohen · on May 30, 2021

This is effectively required when open sourcing something that was previously internal. I've done it at a company. There could be company internal things that leak in old revisions of code and commit messages. Even if the latest commit on master is clean, it is really hard to know that every revisions, throughout the whole history is clean.

JoelJacobson · on May 30, 2021

Good point. In such a case, I think it would still be nice if at least trying to split the total change into a few separate commits, that builds upon each other. That way the commit log could be a good start to look at for someone who wants to understand the code base.

JoelJacobson · on May 30, 2021

With the first commit being the specific commit from the PostgreSQL repo which they base their work on, i.e. that they forked from.

llaolleh · on May 30, 2021

They are following the Gaussian approach. Show only the final result!

he0001 · on May 30, 2021

What more db’s are based on Postgres?

hashhar · on May 30, 2021

Greenplum

Hadapt

Netezza

PipelineDb

Postgres-XL

Redshift

AgensGraph

TimescaleDB

Fujitsu Enterprise Postgres

PolarDB

CrunchyBridge

craigkerstiens · on May 30, 2021

A quick note on Crunchy Bridge, it is pure Postgres, not "based" on Postgres. We haven't forked or modified code at all. And some of the above (Timescale) are extensions which do not change Postgres, but hook into the extension APIs.

michelpp · on May 30, 2021

Yes this is an important distinction between and extension for Postgres vs a fork of.

The short-sightedness of forking boggles my mind every time. Oh, you don't want a team of hundreds of literally the best Postgres programmers in the world working for you full time all the time for free? Ok dude, have fun with your fork.

webmobdev · on June 1, 2021

Forking also happens because of the license.

rad_gruchalski · on May 30, 2021

Yugabyte

zinclozenge · on May 30, 2021

It has an API compatible with the postgresql wire protocol, but I don't think it fits in with the list since it doesn't share any code.

rad_gruchalski · on May 30, 2021

Yugabyte lifts the complete top query layer from Postgres and swaps the back end for RAFT per table. It definitely qualifies.

ggordan · on May 30, 2021

I think "based on" is more like implemented on top of, similar to timescaledb

zzzeek · on May 30, 2021

EdgeDB

your_challenger · on June 6, 2021

Looks like EdgeDB hasn't done enough marketing. No one has mentioned EdgeDB except you. Well we're (I love EdgeDB) still in beta.

jeltz · on May 30, 2021

Greenplum, Postgres-XL and PipelineDB.

ziftface · on May 30, 2021

I'm still really upset that confluence bought and ruined pipeline db. It solved my problem perfectly and I haven't seen one yet that fits what I needed as well.

mst · on May 30, 2021

Looks to me like an acquihire of a team that would otherwise (probably) have ended up winding up the company, sadly.

ddorian43 · on May 30, 2021

What about http://materialize.com/ ?

mkl95 · on May 30, 2021

TimescaleDB.

helltone · on May 30, 2021

Yellowbrick

darrenbkl · on May 30, 2021

Cockroachdb

welder · on May 30, 2021

No, this one's built from scratch in Go and based off LevelDB/RocksDB. They support Postgres wire protocol, but not based off any Postgres code.

qaq · on May 30, 2021

This field is getting crowded wonder if there will be some type of consolidation.

sail0rm00n · on May 30, 2021

There’s no need for consolidation when Postgres already exists.

speedgoose · on May 30, 2021

Sometimes I dream about a highly available multi primary postgresql.

bob1029 · on May 30, 2021

Do you also dream of loss of certain ACID semantics or crippling performance issues?

In the real world, it is extremely hard to provide all the same guarantees you get out of a single instance of <database vendor> if you turn around and spread it across the internet.

If you have super deep control over the physical & temporal environment around your system, you can cheat the rules a little bit (i.e. Google).

speedgoose · on May 30, 2021

Yes I know that the perfect database cannot exist. I think some trade-off are possible though. Today I use postgresql with a primary and replicas, together with couchdb in the same application. They complement each other, but I think something good between is possible.

rad_gruchalski · on May 30, 2021

I would recommend having a look at Yugabyte.

zinclozenge · on May 30, 2021

Why yugabyte over cockroachdb?

rad_gruchalski · on May 30, 2021

Licensing. CockroachDB does not allow as-a-service.

zinclozenge · on May 30, 2021

Yugabyte does allow it though? I could take it and create my own company based around providing yugabyte as a service?

rad_gruchalski · on May 30, 2021

Yes. The core database is fully Apache 2 licensed with no strings attached: https://docs.yugabyte.com/latest/legal/. Maybe it will change if the company behind feels they're ripped off. Who knows, but for now, yes, you could.

CockroachDB has an explicit clause is the licensing: Yes, employees and contractors can use your internal CockroachDB instance as a service, but no people outside of your organization will be able to use it without purchasing a license: https://www.cockroachlabs.com/docs/v21.1/licensing-faqs.html....

speedgoose · on May 30, 2021

Thanks. I should try it out.

DaiPlusPlus · on May 30, 2021

CockroachDB?

speedgoose · on May 30, 2021

Perhaps. It's missing the 30 years of experience, the PostGreSQL reputation, an opensource license, and a better name.

DaiPlusPlus · on May 31, 2021

> a better name

NGL, I've had to explain to enough people by now that the "funny name" is because it was made by ex-Googlers with a view towards resiliency, as per the idiom that only cockroaches will survive global nuclear war.

qaq · on May 30, 2021

YugobyteDB ?

cultofmetatron · on May 30, 2021

yea, some sort of vitessdb equivalent to postgresql would be amazing!

supergirl · on May 30, 2021

there used to be citus but now it's part of microsoft Azure

mslot · on May 30, 2021

True, it is developed by Microsoft and available as a service on Azure. It is also open source, actively maintained and improved, and it's a PG13-compatible Postgres extension that adds both distributed database capabilities and columnar storage. :)

https://github.com/citusdata/citus

(Citus engineer)

jhgb · on May 30, 2021

> PG13-compatible

So not (IBM-System-)R-compatible, then?

qaq · on May 30, 2021

well some use cases need something that is distributed like PolarDB, YugobyteDB, CockroachDB etc.

rad_gruchalski · on May 30, 2021

I'd love to try this out. What I'm missing though is at least some Docker based deployment not involving building stuff from sources and reinventing distributed architecture myself.

devit · on May 30, 2021

Can anybody comment on whether this actually works properly and does what it claims to do?

raarts · on May 30, 2021

What are the differences between this and YugobyteDB?

wejick · on May 30, 2021

Yugabyte is different DB technology than postgres but with PG compatibility layer, means you can use existing postgres query/application, the rest is different beast. This one is more comparable with citus, an PG extension. Meaning it runs on top postgres. However from their github page they also offer patched version of PG which I suppose offer some tighter integration with the extension.

tzumby · on May 30, 2021

“high availability through Paxos based replication”. I thought Paxos is supposed to be a CP system.

chousuke · on May 30, 2021

HA isn't really about 100% availability. Any single system that promises such is extremely likely to be misleading you somehow. Your in-flight query is going to get interrupted no matter how fancy your clustering is, and I struggle to even come up with hypothetical use cases where this is something you can't afford to have happen, ever.

All you need is that in the event of a failure the clustered system can still recover quickly enough (to a well-defined state!) that the application layer can deal with the transient failure without significant impact on users, maintaining the illusion of availability.

chii · on May 30, 2021

> I struggle to even come up with hypothetical use cases where this is something you can't afford to have happen, ever.

a rocket is using this query to adjust their trusters ;)

chousuke · on May 30, 2021

Such a rocket would likely have two or more independent systems that would each have to agree on the adjustment, so one of them temporarily failing would not pose a problem. Though I doubt there are any rockets using database queries as part of their control system.

In those kinds of systems I suspect the approach is to enumerate every possible scenario and prove that the system behaves correctly in all of them, and if you can't do that, the system may be too complex and you need to redesign it to be simpler so that you can guarantee that it does not fail.

jlokier · on May 30, 2021

You have N >= 3 nodes, or N >= 2 and a non-compute arbitrator. One of them goes down, stops responding. You still have quorum, data processes continue just fine. That's high-availability in a CP system.

spicybright · on May 30, 2021

[flagged]

edmundsauto · on May 30, 2021

How do you think the next dominant Database platform will start? Is your expectation that innovation requires polish before hitting the (free open source) “marketplace”?

DaiPlusPlus · on May 30, 2021

My expectation is that the next dominant platform will offer something unique which creates enough value that I'm willing to overlook the unfinished parts that deduct value.

When Neo4j and other graph databases went big in the past decade, they value they created: "finally! I don't have to faff around with RDBMS tables and Cobb purists to store and query my non-relational object-graph!" - despite the fact that Neo4j then (and still?) didn't support running serving databases concurrently or schema enforcement or even transactional atomicity (I think they fixed that recently?)

So in this context, what business-value does PolarDB add or create that makes it worthwhile to deal with its expected short-life?

sa46 · on May 30, 2021

It’s distributed Postgres. The same reasons you would use Cockroach or Citus instead of stock Postgres also justify using polar.

Given that there’s multiple companies that exist to scale out Postgres, someone sees business value.

zwily · on May 30, 2021

> It extends PostgreSQL to become a share-nothing distributed database, which supports global data consistency and ACID across database nodes, distributed SQL processing, and data redundancy and high availability through Paxos based replication. PolarDB is designed to add values and new features to PostgreSQL in dimensions of high performance, scalability, high availability, and elasticity. At the same time, PolarDB remains SQL compatibility to single-node PostgreSQL with best effort.

That’s pretty compelling, if it delivers.

tluyben2 · on May 30, 2021

It is alibaba: it's life might not be so short? They are huge and have a cloud provider arm.

oger · on May 30, 2021

This! Very often the US centric community here on HN is underestimating the sheer size and expertise in Alibaba‘s cloud ecosystem.

JohnHaugeland · on May 30, 2021

> How do you think the next dominant Database platform will start?

Not this way.

colesantiago · on May 30, 2021

At least it's open source and NOT made by Google?

Also, how do you know it's a 'buggier version of postgres'? have you tried PolarDB yourself?

diminish · on May 30, 2021

you re missing open source's main power to proliferate.

killingtime74 · on May 30, 2021

Why wouldn’t I use cockroachDB?

mb4nck · on May 30, 2021

I don't know, maybe you should.

CockroachDB seems to be a distributed database system written in Go which has implemented a Postgres query/wire protocol compatibility layer.

PolarDB is a Postgres fork actually using the Postgres codebase and extending it to a distributed database system. Maybe one day they can unfork because it's possible to implement PolarDB on top of Postgres as an extension and/or they contribute/get all their changes into Postgres core.

rad_gruchalski · on May 30, 2021

Maybe because you need the as-a-service aspect and the licensing does not allow it.